add task to delete zimcheck results from s3#355
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## detect-zimcheck-errors #355 +/- ##
==========================================================
+ Coverage 80.99% 81.24% +0.25%
==========================================================
Files 61 62 +1
Lines 3309 3375 +66
Branches 354 366 +12
==========================================================
+ Hits 2680 2742 +62
+ Misses 525 520 -5
- Partials 104 113 +9 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| ) | ||
|
|
||
| local_warehouse_paths: ClassVar[dict[UUID, Path]] = _parse_local_warehouse_paths() | ||
| zimcheck_results_upload_uri: str = os.getenv( |
There was a problem hiding this comment.
zimcheck_results_upload_uri => zimcheck_results_s3_bucket_uri
| .where( | ||
| Book.zimcheck_result_url.is_not(None), | ||
| Book.zimcheck_s3_deleted.is_(False), | ||
| Book.location_kind.not_in(["quarantine"]), |
There was a problem hiding this comment.
We want to remove file from S3 only if book is in production or deleted. We wanna keep it (for further manual analysis if needed) when book is in quarantine or staging.
| default_factory=dict, server_default="{}" | ||
| ) | ||
|
|
||
| zimcheck_s3_deleted: Mapped[bool] = mapped_column( |
There was a problem hiding this comment.
We probably need a partial index on zimcheck_s3_deleted = False to efficiently retrieve these few books which still have zimcheck results on S3 (there will be very few once this PR is merged)
|
|
||
| s3 = get_kiwix_storage_client(upload_uri) | ||
|
|
||
| logger.info("Deleting zimcheck resultss from S3") |
|
|
||
| for book in books: | ||
| # Determine whether we need to fetch zimcheck results or not | ||
| missing_summary_keys = get_missing_keys( |
There was a problem hiding this comment.
Why should we care about that? I'm concerned we will end with many zimcheck results not been deleted only because we failed to compute zimcheck summary. Once in production, book should have a summary and anyway we don't care much about quality anymore.
Changes
This closes #113