Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add use of zstd compression on compute services #336

Merged
merged 23 commits into from
Jan 24, 2025

Conversation

ianmkenney
Copy link
Member

This PR closes #220. It modifies the behavior of set_task_results in the compute client and api to use compressed keyed chain representations of ProtocolDAGResults instead of simple JSON serialization as the intermediate format.

- Update env files to include zstandard

- Update set_task_result in compute api and client to handle base64
encoded data. Rather than JSON serialize the ProtocolDAGResult (PDR)
and use this is a the intermediate format, instead:

1) create a keyed chain representation of the PDR

2) JSON serialize this representation

3) compress the utf-8 encoded bytes with zstandard

4) encode with base64

- Use the above base64 encoded data as the intermediate format and
reverse the operations above to recover the PDR.
Copy link

codecov bot commented Nov 25, 2024

Codecov Report

Attention: Patch coverage is 80.70175% with 11 lines in your changes missing coverage. Please review.

Project coverage is 80.49%. Comparing base (ebec59c) to head (321f5d7).
Report is 24 commits behind head on main.

Files with missing lines Patch % Lines
alchemiscale/compute/api.py 33.33% 8 Missing ⚠️
alchemiscale/interface/api.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #336      +/-   ##
==========================================
+ Coverage   80.35%   80.49%   +0.14%     
==========================================
  Files          26       27       +1     
  Lines        3711     3743      +32     
==========================================
+ Hits         2982     3013      +31     
- Misses        729      730       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Use more bytes

Move compression and decompression functions to new module

Use latin-1 decoded bytes
If a decompression error is raised, assume that the original data was
never compressed.
@ianmkenney ianmkenney force-pushed the feature/220-zstd-compression-compute-services branch from 4541b72 to cce6e8d Compare December 30, 2024 06:21
Test getting extends ProtocolDAGResults as if they were stored through
the old pdr.to_dict() -> json -> utf-8 encoded format. The new test
can be removed in the next major release that drops the old format.
@ianmkenney ianmkenney force-pushed the feature/220-zstd-compression-compute-services branch from cce6e8d to b32f62d Compare December 30, 2024 06:22
To allow for better and clearer testing of result pushing and pulling,
the act of executing a task and pushing its results were separated.
Code coverage was artificially low due to run test run order. A reset
and reinitialization of the s3os_server shows the correct results.
It's more robust to paramterize the old tests to use the legacy kwarg
for pushing results rather than writing a new test that covers less of
the codebase.
@ianmkenney ianmkenney changed the title [WIP] Add use of zstd compression on compute services Add use of zstd compression on compute services Jan 2, 2025
@ianmkenney ianmkenney marked this pull request as ready for review January 2, 2025 18:46
@ianmkenney ianmkenney requested a review from dotsdl January 2, 2025 19:27
Copy link
Member

@dotsdl dotsdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic @ianmkenney! Thanks for adding elegant compression-at-rest for ProtocolDAGResult objects!

@dotsdl
Copy link
Member

dotsdl commented Jan 24, 2025

@ianmkenney I made some changes as we discussed. If you agree with these changes and tests are passing, feel free to merge!

@dotsdl dotsdl enabled auto-merge January 24, 2025 18:11
@dotsdl dotsdl merged commit 5b5cbd6 into main Jan 24, 2025
4 of 5 checks passed
@dotsdl dotsdl deleted the feature/220-zstd-compression-compute-services branch January 24, 2025 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants