Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Improve RawHeaders type hints #14303

Merged
merged 4 commits into from
Oct 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/14303.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improve type hinting of `RawHeaders`.
8 changes: 4 additions & 4 deletions synapse/app/generic_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,13 +178,13 @@ async def on_POST(
# Proxy headers from the original request, such as the auth headers
# (in case the access token is there) and the original IP /
# User-Agent of the request.
headers = {
header: request.requestHeaders.getRawHeaders(header, [])
headers: Dict[bytes, List[bytes]] = {
header: list(request.requestHeaders.getRawHeaders(header, []))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS getRawHeaders always returns a list, so I think this is an unfortunate pointless copy. It's needed because Twisted's annotations only promise it returns a Sequence.

IMO this is a bug in Twisted's type hints: as I understand it, the rule of thumb is that functions ought to be liberal in what they accept but precise in what they return.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, that rule certainly makes sense but it's also kind of fair enough to give minimal guarantees :/.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a tuple instead? My understanding is that lists usually have extra allocated space so that you can extend them, but presumably we're not interested in growing these and we'll treat them as immutable anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, that rule certainly makes sense but it's also kind of fair enough to give minimal guarantees :/.

I sympathise. I suppose there are different trade-offs to be made at the library and at the application level.

Shout-out to Hyrum's Law while we're at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a tuple instead? My understanding is that lists usually have extra allocated space so that you can extend them, but presumably we're not interested in growing these and we'll treat them as immutable anyway.

No strong opinions. I doubt it makes a perceivable difference: this isn't a hot path, and in either case the interpreter will have to alloc a new container + underlying storage and bump the refcounts for every element.

for header in (b"Authorization", b"User-Agent")
}
# Add the previous hop to the X-Forwarded-For header.
x_forwarded_for = request.requestHeaders.getRawHeaders(
b"X-Forwarded-For", []
x_forwarded_for = list(
request.requestHeaders.getRawHeaders(b"X-Forwarded-For", [])
)
# we use request.client here, since we want the previous hop, not the
# original client (as returned by request.getClientAddress()).
Expand Down
24 changes: 19 additions & 5 deletions synapse/http/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
List,
Mapping,
Optional,
Sequence,
Tuple,
Union,
)
Expand Down Expand Up @@ -90,14 +89,29 @@
"synapse_http_client_responses", "", ["method", "code"]
)

# the type of the headers list, to be passed to the t.w.h.Headers.
# Actually we can mix str and bytes keys, but Mapping treats 'key' as invariant so
# we simplify.
# the type of the headers map, to be passed to the t.w.h.Headers.
#
# The actual type accepted by Twisted is
# Mapping[Union[str, bytes], Sequence[Union[str, bytes]] ,
# allowing us to mix and match str and bytes freely. However: any str is also a
# Sequence[str]; passing a header string value which is a
# standalone str is interpreted as a sequence of 1-codepoint strings. This is a disastrous footgun.
# We use a narrower value type (RawHeaderValue) to avoid this footgun.
#
# We also simplify the keys to be either all str or all bytes. This helps because
# Dict[K, V] is invariant in K (and indeed V).
RawHeaders = Union[Mapping[str, "RawHeaderValue"], Mapping[bytes, "RawHeaderValue"]]

# the value actually has to be a List, but List is invariant so we can't specify that
# the entries can either be Lists or bytes.
RawHeaderValue = Sequence[Union[str, bytes]]
RawHeaderValue = Union[
List[str],
List[bytes],
List[Union[str, bytes]],
Tuple[str, ...],
Tuple[bytes, ...],
Tuple[Union[str, bytes], ...],
]


def check_against_blacklist(
Expand Down