Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎨Dask sidecar: use reproducible zipfile library #6571

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Oct 21, 2024

What do these changes do?

as explained in #6244 this PR makes sure we use deterministic Zipping when uploading files from the dask-sidecar.

Driving test

test_push_file_to_remote_creates_reproducible_zip_archive

Important note:

The test shows that the current implementation already created deterministic zip files (e.g. hash of 2 zips containing the same files created at different time points are the same). Nevertheless, since the computational backend currently allows to pass only 1 file and the services are actually responsible for creating their own zip files, this is probably mostly useless at the moment.

When we allow to have folders to be compressed by the dask-sidecar this might prove useful.

Related issue/s

How to test

Dev-ops checklist

@sanderegg sanderegg added the a:dask-service Any of the dask services: dask-scheduler/sidecar or worker label Oct 21, 2024
@sanderegg sanderegg added this to the MartinKippenberger milestone Oct 21, 2024
@sanderegg sanderegg self-assigned this Oct 21, 2024
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. thx!

This PR introduces a new library, repro-zipfile, which ensures that the same zip file is generated when the same content is zipped, guaranteeing reproducibility.

Copy link

codecov bot commented Oct 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.3%. Comparing base (cafbf96) to head (d80e71f).
Report is 659 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #6571      +/-   ##
=========================================
- Coverage    84.5%   65.3%   -19.3%     
=========================================
  Files          10     620     +610     
  Lines         214   31498   +31284     
  Branches       25     265     +240     
=========================================
+ Hits          181   20592   +20411     
- Misses         23   10845   +10822     
- Partials       10      61      +51     
Flag Coverage Δ
integrationtests 64.7% <ø> (?)
unittests 91.3% <100.0%> (+6.7%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...car/src/simcore_service_dask_sidecar/file_utils.py 95.7% <100.0%> (ø)

... and 629 files with indirect coverage changes

@sanderegg sanderegg force-pushed the dask-sidecar/use-reproducible-zipfile branch from 841f626 to 5addf47 Compare October 22, 2024 06:00
@sanderegg sanderegg force-pushed the dask-sidecar/use-reproducible-zipfile branch from fd8bc97 to d80e71f Compare October 22, 2024 11:37
Copy link

@sanderegg sanderegg merged commit 4c5aa41 into ITISFoundation:master Oct 22, 2024
57 checks passed
@sanderegg sanderegg deleted the dask-sidecar/use-reproducible-zipfile branch October 22, 2024 12:29
@matusdrobuliak66
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:dask-service Any of the dask services: dask-scheduler/sidecar or worker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants