Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tools for working with WACZs #3670

Merged
merged 14 commits into from
Dec 4, 2024

Conversation

rebeccacremona
Copy link
Contributor

@rebeccacremona rebeccacremona commented Dec 3, 2024

See LIL-2863.

We recently started saving WACZs from Scoop instead of WARCs. We have long had some facility for users to download Perma Links' WARC files, both using the API and via the GUI. This PR adds identical functionality for WACZ files:

  • wacz_size and wacz_download_url fields have been added to the Link and AuthenticatedLink serializers, so that any API route that includes information about Link objects will now return that info

  • the warc_download_url field was updated: it will now be populated if a WARC or a WACZ is available for a given Perma Link, since the WARC can be extracted from the WACZ and served.

  • a "Download WACZ" button has been added to the tray next to the "Download WARC" button, when a WACZ is available:

image

For compatibility, that "button" is a link pointing to a URL of the form https://perma.test:8000/B925-DU9S?type=wacz_download, following the pattern of the WARC "button", which is a URL of the form https://perma.test:8000/B925-DU9S?type=warc_download. (This pattern predates the existence of the /download API route.)

The only change in existing functionality: now, if a Perma Link redirects to another Perma Link (as with https://perma.cc/AAAA-AAAA), the file format will always be included in the query string of the redirected API request to download it (e.g. https://api.perma.cc/v1/archives/69AE-PWJB/download?file_format=warc).

@rebeccacremona rebeccacremona requested a review from a team as a code owner December 3, 2024 22:16
@rebeccacremona rebeccacremona requested review from teovin and christiansmith and removed request for a team and teovin December 3, 2024 22:16
Copy link

codecov bot commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 85.50725% with 10 lines in your changes missing coverage. Please review.

Project coverage is 69.45%. Comparing base (07d00eb) to head (432782b).
Report is 22 commits behind head on develop.

Files with missing lines Patch % Lines
perma_web/perma/models.py 40.00% 3 Missing ⚠️
perma_web/perma/utils.py 85.71% 3 Missing ⚠️
perma_web/api/utils.py 90.47% 2 Missing ⚠️
perma_web/perma/views/playback.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3670      +/-   ##
===========================================
+ Coverage    69.01%   69.45%   +0.43%     
===========================================
  Files           54       54              
  Lines         7478     7542      +64     
===========================================
+ Hits          5161     5238      +77     
+ Misses        2317     2304      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rebeccacremona
Copy link
Contributor Author

Edge case: I should double check (and add to the test suite) what happens when you manually request to download the WACZ of a Perma Link for which there is only a WARC. The button won't appear in the tray, and the API will report wacz_download_url: null, but the request shouldn't result in a RuntimeError...

@rebeccacremona
Copy link
Contributor Author

rebeccacremona commented Dec 3, 2024

the request shouldn't result in a RuntimeError...

Ah, yep, if you manually construct URLs like https://perma.test:8000/Y3YJ-4B9G?type=wacz_download or https://perma.test:8000/api/v1/archives/Y3YJ-4B9G/download?file_format=wacz for links without WACZs, say, old ones, or user loads, that's currently a 500 w/RuntimeError.

Fixing! Fixed.

Copy link

@christiansmith christiansmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Nice work @rebeccacremona!

@rebeccacremona rebeccacremona merged commit c02c37a into harvard-lil:develop Dec 4, 2024
2 checks passed
@rebeccacremona rebeccacremona deleted the wacz-download branch December 4, 2024 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants