-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎨Dask sidecar: use reproducible zipfile library #6571
🎨Dask sidecar: use reproducible zipfile library #6571
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. thx!
This PR introduces a new library, repro-zipfile
, which ensures that the same zip file is generated when the same content is zipped, guaranteeing reproducibility.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6571 +/- ##
=========================================
- Coverage 84.5% 65.3% -19.3%
=========================================
Files 10 620 +610
Lines 214 31498 +31284
Branches 25 265 +240
=========================================
+ Hits 181 20592 +20411
- Misses 23 10845 +10822
- Partials 10 61 +51
Flags with carried forward coverage won't be shown. Click here to find out more.
|
841f626
to
5addf47
Compare
fd8bc97
to
d80e71f
Compare
Quality Gate passedIssues Measures |
Thanks! |
What do these changes do?
as explained in #6244 this PR makes sure we use deterministic Zipping when uploading files from the dask-sidecar.
Driving test
test_push_file_to_remote_creates_reproducible_zip_archive
Important note:
The test shows that the current implementation already created deterministic zip files (e.g. hash of 2 zips containing the same files created at different time points are the same). Nevertheless, since the computational backend currently allows to pass only 1 file and the services are actually responsible for creating their own zip files, this is probably mostly useless at the moment.
When we allow to have folders to be compressed by the dask-sidecar this might prove useful.
Related issue/s
How to test
Dev-ops checklist