-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage: google-resumable-media==0.4.0 Breaks Gzipped Downloads #9188
Comments
@william-silversmith Can you achieve what you need to reloading the blob's metadata? E.g.: blob.reload() At that point, the |
Unfortunately, this strategy would significantly reduce IO performance.
Google support worked with us a few months ago to figure out how to reduce
the number of requests (hundreds of millions of files). Originally we were
using get_blob which generated an additional request.
We're handling a petabyte of 3D image data with a random access
requirement. We do this by chunking the image into a regular grid of files.
In order to make this more affordable, some lossless compression is
desirable.
The reason it would be desirable to control when decompression occurs is
that our method for transferring datasets currently requires decompressing
and recompressing which seems a waste.
I do think it's worth being a bit more explicit though:
1. The updated resumable media is breaking existing functionality for all
users that use gzip (probably a lot!) as raw bytes are now returned.
2. It was broken in the spirit of letting users decide what to do with the
data (a new feature), but blob is stripping that info away.
I would be happy with the old functionality being restored, but if there's
some option to decide when decompression occurs without additional network
overhead, that would be even better.
…On Mon, Sep 9, 2019, 12:30 PM Tres Seaver ***@***.***> wrote:
@william-silversmith <https://github.com/william-silversmith> Can you
achieve what you need to reloading the blob's metadata? E.g.:
blob.reload()
At that point, the content_encoding property will be populated from the
server.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9188?email_source=notifications&email_token=AATGQSMSJ2CBE6IZTKBPH7DQIZ2Z7A5CNFSM4IUN2UB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6IHH5A#issuecomment-529560564>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AATGQSNK2I25ENXO6BXYHEDQIZ2Z7ANCNFSM4IUN2UBQ>
.
|
@william-silversmith Thanks for clarifying. One issue here is that we would want to have the header-driven |
@crwilcox reverted changes made in googleapis/google-resumable-media-python#103 and releasing a new version to unblock this issue: googleapis/google-resumable-media-python#104 Reassigning to him. |
Hi @william-silversmith I have released v0.4.1 that backs out this change. Due to the way we have pinned this package within google-cloud-storage, we are rethinking the way we make this change to avoid disrupting folks using existing libraries. |
Thank you very much! |
Environment details
General, Core, and Other are also allowed as types
Google Storage blob.py
Ubuntu 14.04
python --version
Python 3.6.8
pip show google-<service>
orpip freeze
google-cloud-storage==1.19.0
Steps to reproduce
Per the latest release of google-resumable-media, no decompression of content-encoding gzip is performed and raw bytes are returned.
See https://github.com/googleapis/google-resumable-media-python/releases
blob.download_as_string() formerly returned decompressed bytes, and now returns compressed bytes. We are using .blob instead of .get_blob for an HPC application and thus have no way of knowing what the content encoding is as the information is erased.
We actually LIKE the new functionality as we can now decide when to decompress, but we need to know the content encoding to avoid various kinds of problems that would be introduced by speculative decompression.
Code example
Here is our desired functionality.
Adding this patch to google.cloud.storage.blob.py would solve this problem for us:
The text was updated successfully, but these errors were encountered: