-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-82150: Make urllib.parse.urlsplit and urllib.parse.urlunsplit preserve the '?' and '#' delimiters of empty query and fragment components #15642
Conversation
It's maybe a bit surprising to have some of the tuple fields sometimes be The other alternative I thought about was to just explicitly dump in the delimiter if it's empty (e.g. I think you need to also describe the breaking change very clearly (haven't done it before, but I think that's what bedevere/news is for, i.e. these things), and leave hints in the actual documentation about the change ("changed in 3.9") Housekeeping: I'd squash all the commits. |
Thank you for reviewing this @nicktimko! Yes the I have updated the PR description to detail the exact changes. Nice suggestion, I will make the news entry, documentation version note and commit squash. But before I would like to fix an issue: the documentation tests in Travis CI failed for an obscure reason (see below). Do you have any idea why? |
I don't know, but the docs build looks like it's installing |
Thanks @nicktimko, I have added a news entry, but documentation tests still fail in Travis-CI. |
I don't think the documentation failure is related to the code in this PR. Perhaps this PR needs to be rebased? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be a breaking change and will affect a plenty of downstream libraries and frameworks that had been relying upon the previous behavior.
- I don't have any code comments, and the code changes look good to me.
- I find the rational ok
- I will request reviews from more active core developers and want to hear their opinion on this change too.
Thanks for reviewing this @orsenthil. |
@nicktimko @orsenthil I have identified why the documentation tests and code coverage fail in Travis CI when running This is because the The
So as you said @orsenthil, this PR is a breaking change. |
Adding a note again, this breaking change can be very costly
Since a lot of packages depend upon urlparse for parsing and they suddenly geting a None value instead of expected '' empty string will break significant number of packages and applications. I am still evaluating of this is bringing any benefit or is there a backwards compatible way. If there is none, I will be -1 on this patch. |
@orsenthil Yes since this PR requires What is the result of your analysis? Do you think we should move this PR forward or we should close it because its benefits (RFC 3986 compliance) do not outweigh its costs? |
Based on Senthil's last comment, I suggest closing this PR. |
Can always engage in some fence-sitting and support both compatible and compliance options. Pass/set a flag, use a different function/method. How about a new Re: returning |
@nicktimko Yep, discussing possible solutions is always welcome, but please do so on the issue. Discussions on the PR should be focused on implementation specific issues. (Note that I only suggested to close this PR, not the linked issue.) |
Like @erlend-aasland suggested, I am closing this PR because it is backward incompatible. I am also closing the associated issue #82150 which was too restricted (the problem also affects the authority component of URIs, as well as the I have just opened a new issue describing the problem more generally and clearly with a candidate solution based on @nicktimko’s last comment (feedback is welcome): #99962 Thank you @orsenthil and @nicktimko for reviewing this PR. |
This PR provides the following changes:
update the
urlsplit
andurlunsplit
functions of theurllib.parse
submodule for making them preserve the delimiters of the query and fragment components when applied in sequence on a URI with an empty query component (i.e.'?'
) or empty fragment component (i.e.'#'
)—not to be confused with an undefined query component (i.e.''
) nor an undefined fragment component (i.e.''
):This is required by RFC 3986:
Currently delimiters are dropped:
To do so:
urlsplit
function now decodes an undefined query component asNone
and an undefined fragment component asNone
(e.g.urlsplit('http://example.com/') == ('http', 'example.com', '/', None, None)
), and still decodes an empty query component as''
and an empty fragment component as''
(e.g.urlsplit('http://example.com/?#') == ('http', 'example.com', '/', '', '')
);urlunsplit
function now encodes aNone
query component as an undefined query component and aNone
fragment component as an undefined fragment component (e.g.urlunsplit(('http', 'example.com', '/', None, None)) == 'http://example.com/'
), and now encodes a''
query component as an empty query component and a''
fragment component as an empty fragment component (e.g.urlunsplit(('http', 'example.com', '/', '', '')) == 'http://example.com/?#'
);add and update the corresponding unit tests in the
test.test_urlparse
module;update a unit test in the
test.test_urllib2
module;update the
urllib.parse
documentation accordingly.