-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct urllib.parse functions dropping the delimiters of empty URI components #82150
Comments
The Python library documentation of the
So with the <http://example.com/?\> URI:: >>> import urllib.parse
>>> urllib.parse.urlunparse(urllib.parse.urlparse("http://example.com/?"))
'http://example.com/'
>>> urllib.parse.urlunsplit(urllib.parse.urlsplit("http://example.com/?"))
'http://example.com/' But
So maybe >>> import urllib.parse
>>> urllib.parse.urlparse("http://example.com/?") == urllib.parse.urlparse("http://example.com/")
True
>>> urllib.parse.urlsplit("http://example.com/?") == urllib.parse.urlsplit("http://example.com/")
True P.-S. — Is there a syntax-based normalization function of URIs in the Python library? |
Looking at the history, the line in the docs used to say
which was changed to "the RFC" in April 2006 ad5177cf8da#diff-5b4cef771c997754f9e2feeae11d3b1eL68-R95 The original language was added in February 1995 a12ef9433baf#diff-5b4cef771c997754f9e2feeae11d3b1eR48-R51 So "the draft" probably meant the draft of RFC-1738 https://tools.ietf.org/html/rfc1738#section-3.3 which is kinda vague on it. It didn't help that rewording it as "the RFC" later when there are 3+ RFCs referenced in the lib docs, one of which obsoleted the another RFC and definitely changed the meaning of the loose "?". The draft of 2396 always seemed to have the opposite wording you point out, at least back in draft 07 (September 2004): https://tools.ietf.org/html/draft-fielding-uri-rfc2396bis-07#section-6.2.3 The draft 06 (April 2004) was silent on the matter https://tools.ietf.org/html/draft-fielding-uri-rfc2396bis-06#section-6.2.3 |
@nicktimko Thanks for the historical track. Here is a patch that solves this issue by updating the That way we get the correct behavior: >>> import urllib.parse
>>> urllib.parse.urlunsplit(urllib.parse.urlsplit("http://example.com/?"))
'http://example.com/?'
>>> urllib.parse.urlunsplit(urllib.parse.urlsplit("http://example.com/#"))
'http://example.com/#' Any feedback welcome. |
This is a duplicate of bpo-22852 ('urllib.parse wrongly strips empty #fragment, ?query, //netloc'). Also note that three alternative solutions have already proposed. (1) Add 'None' type to Result objects members like this one.
(2) Add 'has_netloc', 'has_query' and 'has_fragment' attribute. (3) like (1), but conditional on 'allow_none' argument (similar to 'allow_fragments') |
Closing as superseded by #99962, as per #15642 (comment) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: