Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requests removes characters from valid url #2404

Closed
NikolaiT opened this issue Jan 10, 2015 · 3 comments
Closed

requests removes characters from valid url #2404

NikolaiT opened this issue Jan 10, 2015 · 3 comments

Comments

@NikolaiT
Copy link

Requests strips the question mark when a hash tag follows. urlopen doesn't do it. Why?

Recreate bug with:

#!/usr/bin/python3

from urllib.request import urlopen
from requests import get


url = 'http://incolumitas.com?#param=value'

native = urlopen(url)
req = get(url)

assert native.url == req.url, '{} vs {}'.format(native.url, req.url)

"""
AssertionError: http://incolumitas.com?#param=value vs http://incolumitas.com/#param=value

Requests strips the question mark when a hash tag follows. urlopen doesn't do it. Why?
"""
@Lukasa
Copy link
Member

Lukasa commented Jan 10, 2015

The question mark indicates the beginning of the query portion of the URL. The octothorpe indicates the beginning of the fragment portion of the URL. We conclude that you have an empty query string and normalise it by removing the section entirely.

If the octothorpe is intended to be part of the query string you need to percent-encode it to avoid ambiguity.

On 10 Jan 2015, at 12:52, NikolaiT [email protected] wrote:

Requests strips the question mark when a hash tag follows. urlopen doesn't do it. Why?

Recreate bug with:

#!/usr/bin/python3

from urllib.request import urlopen
from requests import get

url = 'http://incolumitas.com?#param=value'

native = urlopen(url)
req = get(url)

assert native.url == req.url, '{} vs {}'.format(native.url, req.url)

"""
AssertionError: http://incolumitas.com?#param=value vs http://incolumitas.com/#param=value

Requests strips the question mark when a hash tag follows. urlopen doesn't do it. Why?
"""

Reply to this email directly or view it on GitHub.

@NikolaiT
Copy link
Author

I understand. Many thanks for the quick response. Will close.

@sigmavirus24
Copy link
Contributor

@NikolaiT just to give you some more detail about this:

The URI structure (that URLs follow) is of the form {scheme}://{authority}{/path}{?query}{#fragment}. (On Python 3) The urllib.parse module gives us the urlparse function to look at these components of a URL.

>>> import urllib.parse
>>> uri = urllib.parse.urlparse('http://incolumitas.com?#param=value')
>>> uri
ParseResult(scheme='http', netloc='incolumitas.com', path='', params='', query='', fragment='param=value')
>>> uri.geturl()
'http://incolumitas.com#param=value'

According to this module (and the RFC that defines handling of URIs) these two URLs are equivalent. This can be seen by visiting both in your browser. In essence, if we didn't modify the URL it would be as valid as our current approach is. We do normalize URLs though because servers (and sometimes users) give us some rather bizarre URLs that will only "just work" when normalized. As core developers of a library whose core design goal is to make users' lives better, we need to take this approach to satisfy that goal.

I hope that helps give you a deeper understanding of what's happening, why it is okay, and why it should happen.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants