Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for python3 http client when accept-encoding is gzip #5

Merged
merged 2 commits into from
Apr 8, 2024

Conversation

Fr0stM0urne
Copy link
Contributor

Overview
Added handling of gzip encoded response for python3 http client.

Problem
Current framework generates python3 code without checking the encoding of the response and handles the network response by this line of code:
print(data.decode("utf-8"))
This always results in UnicodeDecodeError when running the generated script.

Solution
This can be fixed by decompressing the gzip data:
print(gzip.decompress(data).decode("utf-8"))

Copy link
Member

@pimterry pimterry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is a good catch, it would be nice to tidy this up. I've made a couple of comments here but if you're happy to fix those up I'm definitely interested in including this.

package.json Outdated Show resolved Hide resolved


// Decode response
if (headers['accept-encoding'] == 'gzip') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is close, but it's not quite right I'm afraid.

In this code you're checking the request headers, but that doesn't mean that the response is actually using gzip. accept-encoding just means that client told the server it could use gzip - there's lots of cases where the server won't actually use gzip in the response despite this.

To do this correctly, we'll need to check the content-encoding value in the response headers, at runtime, so outputting this if into the Python code itself. Does that make sense? I think we do only need to do that if accept-encoding is set (and that'll help to keep the snippets simpler in the cases where this isn't required.

Do you want to take a look at that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it makes sense to check the content-encoding in the response. i will change to check for encoding in response headers. But in the case there is no response and accept-encoding is gzip, we will have to decide whether to decompress the response or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, I don't mean the response in the HAR! I mean the actual response the Python code receives when you run the snippet. There will always be a response, because the Python code has to wait for the response before it does the check & decodes the body.

Basically, if there is an accept-encoding header set, then we need to output python code like:

if res.headers['content-encoding'] == 'gzip':
  print(gzip.decompress(data).decode("utf-8"))
else:
  print(data.decode("utf-8"))

Copy link
Contributor Author

@Fr0stM0urne Fr0stM0urne Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. this makes more sense. i will add the check. I think the response data should still be kept while parsing the HAR file. There might be use of it in the future.

doing runtime check also means import gzip will be added to every script. might need to change test cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pimterry content-encoding check should be good now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Yes - the check looks good now! I think that approach will work nicely.

But the new change does mean that we change the output for every case, and it's a bit of extra noise & confusion that most requests won't need...

I think we should add this only if there is an accept-encoding request header that contains gzip. Does that make sense? That would keep all the other test cases the same, but add this only when we might get a gzip result. Can you add that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added and updated test case

@Fr0stM0urne
Copy link
Contributor Author

nice, maybeGziped approach with runtime check seems to be the best solution

@pimterry pimterry merged commit 74cbad4 into httptoolkit:main Apr 8, 2024
7 checks passed
@pimterry
Copy link
Member

pimterry commented Apr 8, 2024

Yep! I've simplified that a little, updated all the other tests, added support for a few other languages in here too, and squashed all these commits down a bit. Now merged! Thanks for this 👍. I'll pull it through into HTTP Toolkit itself shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants