Fix for python3 http client when accept-encoding is gzip #5

Fr0stM0urne · 2024-03-27T08:52:54Z

Overview
Added handling of gzip encoded response for python3 http client.

Problem
Current framework generates python3 code without checking the encoding of the response and handles the network response by this line of code:
print(data.decode("utf-8"))
This always results in UnicodeDecodeError when running the generated script.

Solution
This can be fixed by decompressing the gzip data:
print(gzip.decompress(data).decode("utf-8"))

pimterry

Thanks! This is a good catch, it would be nice to tidy this up. I've made a couple of comments here but if you're happy to fix those up I'm definitely interested in including this.

package.json

pimterry · 2024-03-27T20:25:15Z

src/targets/python/python3.js

-
+
+  // Decode response
+  if (headers['accept-encoding'] == 'gzip') {


This is close, but it's not quite right I'm afraid.

In this code you're checking the request headers, but that doesn't mean that the response is actually using gzip. accept-encoding just means that client told the server it could use gzip - there's lots of cases where the server won't actually use gzip in the response despite this.

To do this correctly, we'll need to check the content-encoding value in the response headers, at runtime, so outputting this if into the Python code itself. Does that make sense? I think we do only need to do that if accept-encoding is set (and that'll help to keep the snippets simpler in the cases where this isn't required.

Do you want to take a look at that?

yes it makes sense to check the content-encoding in the response. i will change to check for encoding in response headers. But in the case there is no response and accept-encoding is gzip, we will have to decide whether to decompress the response or not?

Ah, no, I don't mean the response in the HAR! I mean the actual response the Python code receives when you run the snippet. There will always be a response, because the Python code has to wait for the response before it does the check & decodes the body.

Basically, if there is an accept-encoding header set, then we need to output python code like:

if res.headers['content-encoding'] == 'gzip': print(gzip.decompress(data).decode("utf-8")) else: print(data.decode("utf-8"))

i see. this makes more sense. i will add the check. I think the response data should still be kept while parsing the HAR file. There might be use of it in the future.

doing runtime check also means import gzip will be added to every script. might need to change test cases

@pimterry content-encoding check should be good now?

Sorry for the delay. Yes - the check looks good now! I think that approach will work nicely.

But the new change does mean that we change the output for every case, and it's a bit of extra noise & confusion that most requests won't need...

I think we should add this only if there is an accept-encoding request header that contains gzip. Does that make sense? That would keep all the other test cases the same, but add this only when we might get a gzip result. Can you add that?

added and updated test case

Fr0stM0urne · 2024-04-08T14:38:21Z

nice, maybeGziped approach with runtime check seems to be the best solution

pimterry · 2024-04-08T14:47:29Z

Yep! I've simplified that a little, updated all the other tests, added support for a few other languages in here too, and squashed all these commits down a bit. Now merged! Thanks for this 👍. I'll pull it through into HTTP Toolkit itself shortly.

pimterry reviewed Mar 27, 2024

View reviewed changes

pimterry force-pushed the gzip_decode_fix branch from c38aac2 to 26b2e46 Compare April 8, 2024 10:25

Fr0stM0urne and others added 2 commits April 8, 2024 16:44

Added handling of gzip encoding for python3 http client.

d63c0c1

Add libcurl & Go compression handling too

8432fac

pimterry force-pushed the gzip_decode_fix branch from dd795bd to 8432fac Compare April 8, 2024 14:45

pimterry merged commit 74cbad4 into httptoolkit:main Apr 8, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for python3 http client when accept-encoding is gzip #5

Fix for python3 http client when accept-encoding is gzip #5

Fr0stM0urne commented Mar 27, 2024

pimterry left a comment

pimterry Mar 27, 2024

Fr0stM0urne Mar 28, 2024

pimterry Mar 28, 2024

Fr0stM0urne Mar 28, 2024 •

edited

Loading

Fr0stM0urne Apr 5, 2024

pimterry Apr 5, 2024

Fr0stM0urne Apr 5, 2024

Fr0stM0urne commented Apr 8, 2024

pimterry commented Apr 8, 2024



		// Decode response
		if (headers['accept-encoding'] == 'gzip') {

Fix for python3 http client when accept-encoding is gzip #5

Fix for python3 http client when accept-encoding is gzip #5

Conversation

Fr0stM0urne commented Mar 27, 2024

pimterry left a comment

Choose a reason for hiding this comment

pimterry Mar 27, 2024

Choose a reason for hiding this comment

Fr0stM0urne Mar 28, 2024

Choose a reason for hiding this comment

pimterry Mar 28, 2024

Choose a reason for hiding this comment

Fr0stM0urne Mar 28, 2024 • edited Loading

Choose a reason for hiding this comment

Fr0stM0urne Apr 5, 2024

Choose a reason for hiding this comment

pimterry Apr 5, 2024

Choose a reason for hiding this comment

Fr0stM0urne Apr 5, 2024

Choose a reason for hiding this comment

Fr0stM0urne commented Apr 8, 2024

pimterry commented Apr 8, 2024

Fr0stM0urne Mar 28, 2024 •

edited

Loading