Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC3986 compatible URL.join honoring empty segments #1039

Merged
merged 36 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
778b5ac
rfc3986 - URL.join
commonism Jun 15, 2024
aa570ff
URL.join - align tests with empty parameters
commonism Jul 18, 2024
eeacc09
tests - URL.join with empty segments
commonism Jul 19, 2024
efea232
CHANGES
commonism Jul 19, 2024
14a0b18
CHANGES - links
commonism Jul 19, 2024
0fcb819
CHANGES - intersphinx ref
commonism Jul 19, 2024
d9acbe2
CHANGES - intersphinx
commonism Jul 19, 2024
b33a959
CHANGES - intersphinx
commonism Jul 19, 2024
66b7f7f
CHANGES - no intersphinx
commonism Jul 19, 2024
d379d8f
CHANGES - intersphinx
commonism Jul 19, 2024
4979bd6
URL.join - uncrustify joining other.path
commonism Jul 20, 2024
d4b689a
tests - adding cpython urljoin cases not covered yet
commonism Jul 22, 2024
68f2e55
import at the top
commonism Jul 23, 2024
f4c52dd
query parameters - the other way around
commonism Jul 24, 2024
c1b8a54
query - using query_string instead …
commonism Jul 24, 2024
e650ccf
CHANGES - no changes to parameters
commonism Jul 24, 2024
f9d24eb
Update CHANGES/1039.bugfix.rst
commonism Jul 25, 2024
40bf571
Update CHANGES/1039.bugfix.rst
commonism Jul 25, 2024
6275035
Update CHANGES/1039.bugfix.rst
commonism Jul 25, 2024
6c0966f
tests - …
commonism Jul 25, 2024
537fe57
fix URL.join for URL without authority
commonism Aug 8, 2024
bb7881b
Merge branch 'master' into rfc3986-urljoin
commonism Aug 8, 2024
c98cd58
Merge branch 'master' into rfc3986-urljoin
bdraco Aug 14, 2024
10cf27d
Merge branch 'master' into rfc3986-urljoin
bdraco Aug 14, 2024
e8c5829
Merge branch 'master' into rfc3986-urljoin
commonism Aug 14, 2024
e136d40
codecov was right with this one
commonism Aug 14, 2024
cb0d678
codecov was right with this one … as well
commonism Aug 14, 2024
1b1c078
Update yarl/_url.py
commonism Aug 15, 2024
46daaa0
Update yarl/_url.py
commonism Aug 15, 2024
4729518
lost change
commonism Aug 15, 2024
a93abd8
as suggested
commonism Aug 15, 2024
531ff90
pycon test
commonism Aug 15, 2024
46bc760
pycon - expected result
commonism Aug 16, 2024
5ef4396
Update tests/test_url.py
bdraco Aug 30, 2024
defbc90
Merge branch 'master' into rfc3986-urljoin
bdraco Aug 30, 2024
f94589c
lint
bdraco Aug 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions CHANGES/1039.bugfix.rst
bdraco marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
:meth:`URL.join <yarl.URL.join>` has been changed to match
commonism marked this conversation as resolved.
Show resolved Hide resolved
:rfc:`3986` and align with
:meth:`/ operation <yarl.URL.__truediv__>` and :meth:`URL.joinpath() <yarl.URL.joinpath>`
when joining URLs with empty segments.
Previously :py:func:`urllib.parse.urljoin` was used,
which has known issues with empty segments
(`python/cpython#84774 <https://github.com/python/cpython/issues/84774>`_).

Due to the semantics of :meth:`URL.join <yarl.URL.join>`, joining an
commonism marked this conversation as resolved.
Show resolved Hide resolved
URL with scheme requires making it relative, prefixing with ./.
commonism marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: python

URL("https://web.archive.org/web/").join(URL("./https://github.com/aio-libs/yarl"))
commonism marked this conversation as resolved.
Show resolved Hide resolved


Empty segments are honored in the base as well as the joined part.

.. code-block:: python

URL("https://web.archive.org/web/https://").join(URL("github.com/aio-libs/yarl"))

The required changes for URL.join affect the rendering of empty URL
parameters.
Previously empty URL parameters been presented as an assignment
without value.

e.g. :

.. code-block:: text

http://b/p?a=

Now empty parameters are presented without assignment.

.. code-block:: text

http://b/p?a


-- by :user:`commonism`
72 changes: 72 additions & 0 deletions tests/test_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -1657,6 +1657,78 @@
assert base.join(url) == expected


EMPTY_SEGMENTS = [
(
"https://web.archive.org/web/",
"./https://github.com/aio-libs/yarl",
"https://web.archive.org/web/https://github.com/aio-libs/yarl",
),
(
"https://web.archive.org/web/https://github.com/",
"aio-libs/yarl",
"https://web.archive.org/web/https://github.com/aio-libs/yarl",
),
]


@pytest.mark.parametrize("base,url,expected", EMPTY_SEGMENTS)
def test_join_empty_segments(base, url, expected):
base = URL(base)
url = URL(url)
expected = URL(expected)
joined = base.join(url)
assert joined == expected

Check warning on line 1680 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1675-L1680

Added lines #L1675 - L1680 were not covered by tests


SIMPLE_BASE = "http://a/b/c/d"
URLLIB_URLJOIN = [
("", "http://a/b/c/g?y/./x", "http://a/b/c/g?y/./x"),
("", "http://a/./g", "http://a/./g"),
("svn://pathtorepo/dir1", "dir2", "svn://pathtorepo/dir2"),
("svn+ssh://pathtorepo/dir1", "dir2", "svn+ssh://pathtorepo/dir2"),
("ws://a/b", "g", "ws://a/g"),
("wss://a/b", "g", "wss://a/g"),
# test for issue22118 duplicate slashes
(SIMPLE_BASE + "/", "foo", SIMPLE_BASE + "/foo"),
# Non-RFC-defined tests, covering variations of base and trailing
# slashes
("http://a/b/c/d/e/", "../../f/g/", "http://a/b/c/f/g/"),
("http://a/b/c/d/e", "../../f/g/", "http://a/b/f/g/"),
("http://a/b/c/d/e/", "/../../f/g/", "http://a/f/g/"),
("http://a/b/c/d/e", "/../../f/g/", "http://a/f/g/"),
("http://a/b/c/d/e/", "../../f/g", "http://a/b/c/f/g"),
("http://a/b/", "../../f/g/", "http://a/f/g/"),
]


@pytest.mark.parametrize("base,url,expected", URLLIB_URLJOIN)
def test_join_cpython_urljoin(base, url, expected):

Check warning on line 1705 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1705

Added line #L1705 was not covered by tests
# tests from cpython urljoin
base = URL(base)
url = URL(url)
expected = URL(expected)
joined = base.join(url)
assert joined == expected

Check warning on line 1711 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1707-L1711

Added lines #L1707 - L1711 were not covered by tests


URLLIB_URLJOIN_FAIL = [
# FIXME? - host is None
("http:///", "..", "http:///"),
# issue 23703: don't duplicate filename
# FIXME? path without schema is not normalized
("a", "b", "b"),
]


@pytest.mark.xfail(strict=True)
commonism marked this conversation as resolved.
Show resolved Hide resolved
@pytest.mark.parametrize("base,url,expected", URLLIB_URLJOIN_FAIL)
def test_join_cpython_urljoin_fail(base, url, expected):

Check warning on line 1725 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1725

Added line #L1725 was not covered by tests
# non portable tests from cpython urljoin
base = URL(base)
url = URL(url)
base.join(url)

Check warning on line 1729 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1727-L1729

Added lines #L1727 - L1729 were not covered by tests


def test_split_result_non_decoded():
with pytest.raises(ValueError):
URL(SplitResult("http", "example.com", "path", "qs", "frag"))
Expand Down
62 changes: 54 additions & 8 deletions yarl/_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
from contextlib import suppress
from ipaddress import ip_address
from typing import Union
from urllib.parse import SplitResult, parse_qsl, quote, urljoin, urlsplit, urlunsplit
from urllib.parse import SplitResult, parse_qsl, quote, urlsplit, urlunsplit
from urllib.parse import uses_netloc as uses_authority
from urllib.parse import uses_relative

import idna
from multidict import MultiDict, MultiDictProxy
Expand Down Expand Up @@ -259,12 +261,14 @@
netloc = authority
else:
tmp = SplitResult("", authority, "", "", "")
port = None if tmp.port == cls._default_port(scheme) else tmp.port
netloc = cls._make_netloc(
tmp.username, tmp.password, tmp.hostname, tmp.port, encode=True
tmp.username, tmp.password, tmp.hostname, port, encode=True

Check warning on line 266 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L266

Added line #L266 was not covered by tests
)
elif not user and not password and not host and not port:
netloc = ""
else:
port = None if port == cls._default_port(scheme) else port
netloc = cls._make_netloc(
user, password, host, port, encode=not encoded, encode_host=not encoded
)
Expand Down Expand Up @@ -455,12 +459,15 @@
def _get_default_port(self) -> Union[int, None]:
if not self.scheme:
return None
return self._default_port(self.scheme)

@staticmethod
def _default_port(scheme: str) -> Union[int, None]:
with suppress(KeyError):
return DEFAULT_PORTS[self.scheme]
return DEFAULT_PORTS[scheme]

with suppress(OSError):
return socket.getservbyname(self.scheme)
return socket.getservbyname(scheme)

return None

Expand Down Expand Up @@ -762,9 +769,7 @@
f"Appending path {path!r} starting from slash is forbidden"
)
path = path if encoded else self._PATH_QUOTER(path)
segments = [
segment for segment in reversed(path.split("/")) if segment != "."
]
segments = list(reversed(path.split("/")))
if not segments:
continue
# remove trailing empty segment for all but the last path
Expand Down Expand Up @@ -1153,7 +1158,48 @@
# See docs for urllib.parse.urljoin
if not isinstance(url, URL):
raise TypeError("url should be URL")
return URL(urljoin(str(self), str(url)), encoded=True)
other: URL = url
scheme = other.scheme or self.scheme
parts = {
k: getattr(self, k)
for k in ["authority", "path", "query_string", "fragment"]

Check warning on line 1165 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L1164-L1165

Added lines #L1164 - L1165 were not covered by tests
commonism marked this conversation as resolved.
Show resolved Hide resolved
}
parts["scheme"] = scheme

if scheme != self.scheme or scheme not in uses_relative:
return URL(str(other))

if scheme in uses_authority:
commonism marked this conversation as resolved.
Show resolved Hide resolved
if other.authority:
parts.update(
{
k: getattr(other, k)
commonism marked this conversation as resolved.
Show resolved Hide resolved
for k in ["authority", "path", "query_string", "fragment"]

Check warning on line 1177 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L1176-L1177

Added lines #L1176 - L1177 were not covered by tests
}
)
return URL.build(**parts)

if other.path or other.fragment:
parts["fragment"] = other.fragment
if other.path or other.query:
parts["query_string"] = other.query_string

if not other.path:
return URL.build(**parts)

if other.path[0] == "/":
parts["path"] = other.path
return URL.build(**parts)

if self.path[-1] == "/":
# using an intermediate to avoid URL.joinpath dropping query & fragment
parts["path"] = URL(self.path).joinpath(other.path).path
else:
# …
# and relativizing ".."
parts["path"] = URL(self.path).joinpath("..", other.path).path

return URL.build(**parts)

def joinpath(self, *other, encoded=False):
"""Return a new URL with the elements in other appended to the path."""
Expand Down
Loading