Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.0.2 and older] Function _download seems half-broken with regard to following re-directs #203

Closed
hartwork opened this issue Oct 28, 2024 · 1 comment · Fixed by #205
Closed

Comments

@hartwork
Copy link
Collaborator

Hi @peterbe,

the code at …

hashin/hashin.py

Lines 67 to 72 in a64f778

location, _ = cgi.parse_header(r.headers.get("location", ""))
if not location:
raise PackageError(
"No 'Location' header on {0} ({1})".format(url, status_code)
)
return _download(location)

…caught my attention with regard to two things:

The first is that return _download(location) should probably be return _download(location, binary=binary) or that choice will just be lost.

The second is that while in the past the location of the redirect URI was defined and guaranteed to be absolute, in the sense that

absolute URIs always begin with a scheme name followed by a colon

…and hence recursing back into _download would work (if we ignore endless loops for a second) but HTTP was changed in 2014 to allow URI references e.g. Location: /People.html#tim and then to be able to recurse back into _download we would now either (1) need to do URI resolution to always end up with a scheme and a domain name — an absolute URI — or (2) make urlopen follow redirects or (3) use something else that does.

It turns out that urlopen already handles redirects for us:

# wget -O/dev/null https://httpbin.org/status/302
--2024-10-28 16:14:58--  https://httpbin.org/status/302
Resolving httpbin.org... 54.84.141.5, 34.228.248.173
Connecting to httpbin.org|54.84.141.5|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: /redirect/1 [following]
--2024-10-28 16:15:00--  https://httpbin.org/redirect/1
Reusing existing connection to httpbin.org:443.
HTTP request sent, awaiting response... 302 FOUND
Location: /get [following]
--2024-10-28 16:15:00--  https://httpbin.org/get
Reusing existing connection to httpbin.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 291 [application/json]
Saving to: ‘/dev/null’

/dev/null                                 100%[=====================================================================================>]     291  --.-KB/s    in 0s      

2024-10-28 16:15:01 (5.29 MB/s) - ‘/dev/null’ saved [291/291]

# python3 -c 'from urllib.request import urlopen; print(urlopen("https://httpbin.org/status/302").getcode())'
200

So maybe that's dead code then and we can just rip the broken redirect handling out. @peterbe what do you think? Should I make a pull request?

PS: This started out at #195 (review) .

@hartwork
Copy link
Collaborator Author

Fixed by #205 — closing…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant