Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC3986 compatible URL.join honoring empty segments #1039

Merged
merged 36 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
778b5ac
rfc3986 - URL.join
commonism Jun 15, 2024
aa570ff
URL.join - align tests with empty parameters
commonism Jul 18, 2024
eeacc09
tests - URL.join with empty segments
commonism Jul 19, 2024
efea232
CHANGES
commonism Jul 19, 2024
14a0b18
CHANGES - links
commonism Jul 19, 2024
0fcb819
CHANGES - intersphinx ref
commonism Jul 19, 2024
d9acbe2
CHANGES - intersphinx
commonism Jul 19, 2024
b33a959
CHANGES - intersphinx
commonism Jul 19, 2024
66b7f7f
CHANGES - no intersphinx
commonism Jul 19, 2024
d379d8f
CHANGES - intersphinx
commonism Jul 19, 2024
4979bd6
URL.join - uncrustify joining other.path
commonism Jul 20, 2024
d4b689a
tests - adding cpython urljoin cases not covered yet
commonism Jul 22, 2024
68f2e55
import at the top
commonism Jul 23, 2024
f4c52dd
query parameters - the other way around
commonism Jul 24, 2024
c1b8a54
query - using query_string instead …
commonism Jul 24, 2024
e650ccf
CHANGES - no changes to parameters
commonism Jul 24, 2024
f9d24eb
Update CHANGES/1039.bugfix.rst
commonism Jul 25, 2024
40bf571
Update CHANGES/1039.bugfix.rst
commonism Jul 25, 2024
6275035
Update CHANGES/1039.bugfix.rst
commonism Jul 25, 2024
6c0966f
tests - …
commonism Jul 25, 2024
537fe57
fix URL.join for URL without authority
commonism Aug 8, 2024
bb7881b
Merge branch 'master' into rfc3986-urljoin
commonism Aug 8, 2024
c98cd58
Merge branch 'master' into rfc3986-urljoin
bdraco Aug 14, 2024
10cf27d
Merge branch 'master' into rfc3986-urljoin
bdraco Aug 14, 2024
e8c5829
Merge branch 'master' into rfc3986-urljoin
commonism Aug 14, 2024
e136d40
codecov was right with this one
commonism Aug 14, 2024
cb0d678
codecov was right with this one … as well
commonism Aug 14, 2024
1b1c078
Update yarl/_url.py
commonism Aug 15, 2024
46daaa0
Update yarl/_url.py
commonism Aug 15, 2024
4729518
lost change
commonism Aug 15, 2024
a93abd8
as suggested
commonism Aug 15, 2024
531ff90
pycon test
commonism Aug 15, 2024
46bc760
pycon - expected result
commonism Aug 16, 2024
5ef4396
Update tests/test_url.py
bdraco Aug 30, 2024
defbc90
Merge branch 'master' into rfc3986-urljoin
bdraco Aug 30, 2024
f94589c
lint
bdraco Aug 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions CHANGES/1039.bugfix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
:meth:`URL.join <yarl.URL.join>` has been changed to match
:rfc:`3986` and align with
:meth:`/ operation <yarl.URL.__truediv__>` and :meth:`URL.joinpath() <yarl.URL.joinpath>`
when joining URLs with empty segments.
Previously :py:func:`urllib.parse.urljoin` was used,
which has known issues with empty segments
(`python/cpython#84774 <https://github.com/python/cpython/issues/84774>`_).

Due to the semantics of :meth:`URL.join <yarl.URL.join>`, joining an
URL with scheme requires making it relative, prefixing with ./.

.. code-block:: python

URL("https://web.archive.org/web/").join(URL("./https://github.com/aio-libs/yarl"))


Empty segments are honored in the base as well as the joined part.

.. code-block:: python

URL("https://web.archive.org/web/https://").join(URL("github.com/aio-libs/yarl"))

The required changes for URL.join affect the rendering of empty URL
parameters.
Previously empty URL parameters been presented as an assignment
without value.

e.g. :

.. code-block:: text

http://b/p?a=

Now empty parameters are presented without assignment.

.. code-block:: text

http://b/p?a


-- by :user:`commonism`
2 changes: 1 addition & 1 deletion tests/test_update_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@

def test_with_query_empty_value():
url = URL("http://example.com/")
assert str(url.with_query({"a": ""})) == "http://example.com/?a="
assert str(url.with_query({"a": ""})) == "http://example.com/?a"

Check warning on line 100 in tests/test_update_query.py

View check run for this annotation

Codecov / codecov/patch

tests/test_update_query.py#L100

Added line #L100 was not covered by tests


def test_with_query_str():
Expand Down
74 changes: 73 additions & 1 deletion tests/test_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -1549,7 +1549,7 @@

def test_empty_value_for_query():
url = URL("http://example.com/path").with_query({"a": ""})
assert str(url) == "http://example.com/path?a="
assert str(url) == "http://example.com/path?a"

Check warning on line 1552 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1552

Added line #L1552 was not covered by tests


def test_none_value_for_query():
Expand Down Expand Up @@ -1657,6 +1657,78 @@
assert base.join(url) == expected


EMPTY_SEGMENTS = [
(
"https://web.archive.org/web/",
"./https://github.com/aio-libs/yarl",
"https://web.archive.org/web/https://github.com/aio-libs/yarl",
),
(
"https://web.archive.org/web/https://github.com/",
"aio-libs/yarl",
"https://web.archive.org/web/https://github.com/aio-libs/yarl",
),
]


@pytest.mark.parametrize("base,url,expected", EMPTY_SEGMENTS)
def test_join_empty_segments(base, url, expected):
base = URL(base)
url = URL(url)
expected = URL(expected)
joined = base.join(url)
assert joined == expected

Check warning on line 1680 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1675-L1680

Added lines #L1675 - L1680 were not covered by tests


SIMPLE_BASE = "http://a/b/c/d"
URLLIB_URLJOIN = [
("", "http://a/b/c/g?y/./x", "http://a/b/c/g?y/./x"),
("", "http://a/./g", "http://a/./g"),
("svn://pathtorepo/dir1", "dir2", "svn://pathtorepo/dir2"),
("svn+ssh://pathtorepo/dir1", "dir2", "svn+ssh://pathtorepo/dir2"),
("ws://a/b", "g", "ws://a/g"),
("wss://a/b", "g", "wss://a/g"),
# test for issue22118 duplicate slashes
(SIMPLE_BASE + "/", "foo", SIMPLE_BASE + "/foo"),
# Non-RFC-defined tests, covering variations of base and trailing
# slashes
("http://a/b/c/d/e/", "../../f/g/", "http://a/b/c/f/g/"),
("http://a/b/c/d/e", "../../f/g/", "http://a/b/f/g/"),
("http://a/b/c/d/e/", "/../../f/g/", "http://a/f/g/"),
("http://a/b/c/d/e", "/../../f/g/", "http://a/f/g/"),
("http://a/b/c/d/e/", "../../f/g", "http://a/b/c/f/g"),
("http://a/b/", "../../f/g/", "http://a/f/g/"),
]


@pytest.mark.parametrize("base,url,expected", URLLIB_URLJOIN)
def test_join_cpython_urljoin(base, url, expected):

Check warning on line 1705 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1705

Added line #L1705 was not covered by tests
# tests from cpython urljoin
base = URL(base)
url = URL(url)
expected = URL(expected)
joined = base.join(url)
assert joined == expected

Check warning on line 1711 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1707-L1711

Added lines #L1707 - L1711 were not covered by tests


URLLIB_URLJOIN_FAIL = [
# FIXME? - host is None
("http:///", "..", "http:///"),
# issue 23703: don't duplicate filename
# FIXME? path without schema is not normalized
("a", "b", "b"),
]


@pytest.mark.xfail(strict=True)
@pytest.mark.parametrize("base,url,expected", URLLIB_URLJOIN_FAIL)
def test_join_cpython_urljoin_fail(base, url, expected):

Check warning on line 1725 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1725

Added line #L1725 was not covered by tests
# non portable tests from cpython urljoin
base = URL(base)
url = URL(url)
base.join(url)

Check warning on line 1729 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1727-L1729

Added lines #L1727 - L1729 were not covered by tests


def test_split_result_non_decoded():
with pytest.raises(ValueError):
URL(SplitResult("http", "example.com", "path", "qs", "frag"))
Expand Down
2 changes: 1 addition & 1 deletion tests/test_url_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
MultiDict(),
),
(
URL("http://example.com?a="),
URL("http://example.com?a"),
MultiDict([("a", "")]),
),
# ASCII chars
Expand Down
92 changes: 75 additions & 17 deletions yarl/_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from contextlib import suppress
from ipaddress import ip_address
from typing import Union
from urllib.parse import SplitResult, parse_qsl, quote, urljoin, urlsplit, urlunsplit
from urllib.parse import SplitResult, parse_qsl, quote, urlsplit, urlunsplit

import idna
from multidict import MultiDict, MultiDictProxy
Expand Down Expand Up @@ -259,12 +259,14 @@
netloc = authority
else:
tmp = SplitResult("", authority, "", "", "")
port = None if tmp.port == cls._default_port(scheme) else tmp.port
netloc = cls._make_netloc(
tmp.username, tmp.password, tmp.hostname, tmp.port, encode=True
tmp.username, tmp.password, tmp.hostname, port, encode=True

Check warning on line 264 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L264

Added line #L264 was not covered by tests
)
elif not user and not password and not host and not port:
netloc = ""
else:
port = None if port == cls._default_port(scheme) else port
netloc = cls._make_netloc(
user, password, host, port, encode=not encoded, encode_host=not encoded
)
Expand Down Expand Up @@ -455,12 +457,15 @@
def _get_default_port(self) -> Union[int, None]:
if not self.scheme:
return None
return self._default_port(self.scheme)

@staticmethod
def _default_port(scheme: str) -> Union[int, None]:
with suppress(KeyError):
return DEFAULT_PORTS[self.scheme]
return DEFAULT_PORTS[scheme]

with suppress(OSError):
return socket.getservbyname(self.scheme)
return socket.getservbyname(scheme)

return None

Expand Down Expand Up @@ -762,9 +767,7 @@
f"Appending path {path!r} starting from slash is forbidden"
)
path = path if encoded else self._PATH_QUOTER(path)
segments = [
segment for segment in reversed(path.split("/")) if segment != "."
]
segments = list(reversed(path.split("/")))
if not segments:
continue
# remove trailing empty segment for all but the last path
Expand Down Expand Up @@ -966,13 +969,13 @@
return URL(self._val._replace(path=path, query="", fragment=""), encoded=True)

@classmethod
def _query_seq_pairs(cls, quoter, pairs):
def _query_seq_pairs(cls, pairs):
for key, val in pairs:
if isinstance(val, (list, tuple)):
for v in val:
yield quoter(key) + "=" + quoter(cls._query_var(v))
yield key, v
else:
yield quoter(key) + "=" + quoter(cls._query_var(val))
yield key, val

@staticmethod
def _query_var(v):
Expand All @@ -993,6 +996,15 @@
"of type {}".format(v, cls)
)

def _get_str_param(self, k, v, q=None):
if q:
k = q(k)
v = q(self._query_var(v))
if v:
return f"{k}={v}"
else:
return k

def _get_str_query(self, *args, **kwargs):
if kwargs:
if len(args) > 0:
Expand All @@ -1009,7 +1021,10 @@
query = None
elif isinstance(query, Mapping):
quoter = self._QUERY_PART_QUOTER
query = "&".join(self._query_seq_pairs(quoter, query.items()))
query = "&".join(
self._get_str_param(k, v, quoter)
for k, v in self._query_seq_pairs(query.items())

Check warning on line 1026 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L1025-L1026

Added lines #L1025 - L1026 were not covered by tests
)
elif isinstance(query, str):
query = self._QUERY_QUOTER(query)
elif isinstance(query, (bytes, bytearray, memoryview)):
Expand All @@ -1022,9 +1037,7 @@
# already; only mappings like builtin `dict` which can't have the
# same key pointing to multiple values are allowed to use
# `_query_seq_pairs`.
query = "&".join(
quoter(k) + "=" + quoter(self._query_var(v)) for k, v in query
)
query = "&".join(self._get_str_param(k, v, quoter) for k, v in query)
else:
raise TypeError(
"Invalid query type: only str, mapping or "
Expand Down Expand Up @@ -1153,7 +1166,49 @@
# See docs for urllib.parse.urljoin
if not isinstance(url, URL):
raise TypeError("url should be URL")
return URL(urljoin(str(self), str(url)), encoded=True)
other: URL = url
scheme = other.scheme or self.scheme
parts = {
k: getattr(self, k) for k in ["authority", "path", "query", "fragment"]

Check warning on line 1172 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L1172

Added line #L1172 was not covered by tests
}
parts["scheme"] = scheme
from urllib.parse import uses_netloc as uses_authority
from urllib.parse import uses_relative

if scheme != self.scheme or scheme not in uses_relative:
return URL(str(other))

if scheme in uses_authority:
if other.authority:
parts.update(
{
k: getattr(other, k)
for k in ["authority", "path", "query", "fragment"]

Check warning on line 1186 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L1185-L1186

Added lines #L1185 - L1186 were not covered by tests
}
)
return URL.build(**parts)

if other.path or other.fragment:
parts["fragment"] = other.fragment
if other.path or other.query:
parts["query"] = other.query

if not other.path:
return URL.build(**parts)

if other.path[0] == "/":
parts["path"] = other.path
return URL.build(**parts)

if self.path[-1] == "/":
# using an intermediate to avoid URL.joinpath dropping query & fragment
parts["path"] = URL(self.path).joinpath(other.path).path
else:
# …
# and relativizing ".."
parts["path"] = URL(self.path).joinpath("..", other.path).path

return URL.build(**parts)

def joinpath(self, *other, encoded=False):
"""Return a new URL with the elements in other appended to the path."""
Expand All @@ -1167,9 +1222,12 @@
if host:
host = self._encode_host(self.host, human=True)
path = _human_quote(self.path, "#?")

def query_quote(x):
return _human_quote(x, "#&+;=")

query_string = "&".join(
"{}={}".format(_human_quote(k, "#&+;="), _human_quote(v, "#&+;="))
for k, v in self.query.items()
self._get_str_param(k, v, query_quote) for k, v in self.query.items()

Check warning on line 1230 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L1230

Added line #L1230 was not covered by tests
)
fragment = _human_quote(self.fragment, "")
return urlunsplit(
Expand Down
Loading