Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xmlquery: use compressed length when available #633

Merged
merged 1 commit into from
Apr 27, 2021

Conversation

ato
Copy link
Contributor

@ato ato commented Apr 13, 2021

Description

Use the compressed length when it's available in XmlQueryIndexSource.

The field is confusingly misnamed compressedendoffset in the XML but OpenWayback and consequently OutbackCDX actually use this for the "S" CDX field (compressed length).

Motivation and Context

Without this field when WARC files are accessed over HTTP pywb will make open byte range requests which results in a lot more data being read from disk than necessary.

CC @anjackson

Screenshots (if appropriate):

N/A

Types of changes

  • Replay fix (fixes a replay specific issue)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added or updated tests to cover my changes.
  • All new and existing tests passed. (test_force_https failures seem unrelated. I get the same failures with the master branch prior to this change.)

The field is unfortunately misnamed compressedendoffset in XML but OWB
actually uses this for the compressed length 'S' CDX field.

Without this field when WARC files are accessed over HTTP pywb will make
open byte range requests which results in a lot more data being read
from disk than necessary.
ato added a commit to nla/nla-pywb that referenced this pull request Apr 13, 2021
@ikreymer
Copy link
Member

Makes sense, I didn't know this field existed in XmlQueryIndexSource. As long as its safe to assume that compressedendoffset is always the length, and not the end offset.

@ikreymer ikreymer merged commit c5c4a54 into webrecorder:master Apr 27, 2021
@ato ato deleted the xmlquery-length-field branch April 12, 2022 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants