Skip to content

Commit

Permalink
Forwarded headers and otherwise improved proxy handling (#1638)
Browse files Browse the repository at this point in the history
* Added support for HTTP Forwarded header and combined parsing of other proxy headers.

- Accessible via request.forwarded that tries parse_forwarded and then parse_xforwarded
- parse_forwarded uses the Forwarded header, if config.FORWARDED_SECRET is provided and a matching header field is found
- parse_xforwarded uses X-Real-IP and X-Forwarded-* much alike the existing implementation
- This commit does not change existing request properties that still use the old code and won't make use of Forwarded headers.

* Use req.forwarded in req properties server_name, server_port, scheme and remote_addr.

X-Scheme handling moved to parse_xforwarded.

* Cleanup and fix req.server_port; no longer reports socket port if any forwards headers are used.

* Update docstrings to incidate that forwarded header is used first.

* Remove testing function.

* Fix tests and linting.

- One test removed due to change of semantics - no socket port will be used if any forwarded headers are in effect.
- Other tests augmented with X-Forwarded-For, to allow the header being tested take effect (shouldn't affect old implementation).

* Try to workaround buggy tools complaining about incorrect ordering of imports.

* Cleanup forwarded processing, add comments. secret is now also returned.

* Added tests, fixed quoted string handling, cleanup.

* Further tests for full coverage.

* Try'n make linter happy.

* Add support for multiple Forwarded headers. Unify parse_forwarded parameters with parse_xforwarded.

* Implement multiple headers support for X-Forwarded-For.

- Previously only the first header was used, so this BUGFIX may affect functionality.

* Bugfix for request.server_name: strip port and other parts.

- request.server_name docs claim that it returns the hostname only (no port).
- config.SERVER_NAME may be full URL, so strip scheme, port and path
- HTTP Host and consequently forwarded Host may include port number, so
  strip that also for forwarded hosts (previously only done for HTTP Host).
- Possible performance benefit of limiting to one split.

* Fallback to app.url_for and let it handle SERVER_NAME if defined (until a proper solution is implemented).

* Revise previous commit. Only fallback for full URL SERVER_NAMEs; allows host to be defined and proxied information still being used.

* Heil lintnazi.

* Modify testcase not to use underscores in URLs. Use hyphens which the spec allows for.

* Forwarded and Host header parsing improved.

- request.forwarded lowercases hosts, separates host:port into their own fields and lowercases addresses
- forwarded.parse_host helper function added and used for parsing all host-style headers (IPv6 cannot be simply split(":")).
- more tests fixed not to use underscores in hosts as those are no longer accepted and lead to the field being rejected

* Fixed typo in docstring.

* Added IPv6 address tests for Host header.

* Fix regex.

* Further tests and stricter forwarded handling.

* Fix merge commit

* Linter

* Linter

* Linter

* Add  to avoid re-using the  variable. Make a few raw strings non-raw.

* Remove unnecessary or

* Updated docs (work in progress).

* Enable REAL_IP_HEADER parsing irregardless of PROXIES_COUNT setting.

- Also cleanup and added comments

* New defaults for PROXIES_COUNT and REAL_IP_HEADER, updated tests.

* Remove support for PROXIES_COUNT=-1.

* Linter errors.

- This is getting ridiculous: cannot fit an URL on one line, linter requires
  splitting the string literal!

* Add support for by=_proxySecret, updated docs, updated tests.

* Forwarded headers' semantics tuning.

- Forwarded host is now preserved in original format
- request.host now returns a forwarded host if available, else the Host header
- Forwarded options are preserved in original order, and later keys override earlier ones
- Forwarded path is automatically URL-unquoted
- Forwarded 'by' and 'for' are omitted if their value is unknown
- Tests modified accordingly
- Cleanup and improved documentation

* Add ASGI test.

* Linter

* Linter #2
  • Loading branch information
Tronic authored and sjsadowski committed Sep 2, 2019
1 parent ae91852 commit 1e4b1c4
Show file tree
Hide file tree
Showing 6 changed files with 545 additions and 166 deletions.
90 changes: 66 additions & 24 deletions docs/sanic/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,37 +110,37 @@ Out of the box there are just a few predefined values which can be overwritten w

#### `REQUEST_TIMEOUT`

A request timeout measures the duration of time between the instant when a new open TCP connection is passed to the
Sanic backend server, and the instant when the whole HTTP request is received. If the time taken exceeds the
`REQUEST_TIMEOUT` value (in seconds), this is considered a Client Error so Sanic generates an `HTTP 408` response
and sends that to the client. Set this parameter's value higher if your clients routinely pass very large request payloads
A request timeout measures the duration of time between the instant when a new open TCP connection is passed to the
Sanic backend server, and the instant when the whole HTTP request is received. If the time taken exceeds the
`REQUEST_TIMEOUT` value (in seconds), this is considered a Client Error so Sanic generates an `HTTP 408` response
and sends that to the client. Set this parameter's value higher if your clients routinely pass very large request payloads
or upload requests very slowly.

#### `RESPONSE_TIMEOUT`

A response timeout measures the duration of time between the instant the Sanic server passes the HTTP request to the
Sanic App, and the instant a HTTP response is sent to the client. If the time taken exceeds the `RESPONSE_TIMEOUT`
value (in seconds), this is considered a Server Error so Sanic generates an `HTTP 503` response and sends that to the
client. Set this parameter's value higher if your application is likely to have long-running process that delay the
A response timeout measures the duration of time between the instant the Sanic server passes the HTTP request to the
Sanic App, and the instant a HTTP response is sent to the client. If the time taken exceeds the `RESPONSE_TIMEOUT`
value (in seconds), this is considered a Server Error so Sanic generates an `HTTP 503` response and sends that to the
client. Set this parameter's value higher if your application is likely to have long-running process that delay the
generation of a response.

#### `KEEP_ALIVE_TIMEOUT`

##### What is Keep Alive? And what does the Keep Alive Timeout value do?

`Keep-Alive` is a HTTP feature introduced in `HTTP 1.1`. When sending a HTTP request, the client (usually a web browser application)
can set a `Keep-Alive` header to indicate the http server (Sanic) to not close the TCP connection after it has send the response.
This allows the client to reuse the existing TCP connection to send subsequent HTTP requests, and ensures more efficient
`Keep-Alive` is a HTTP feature introduced in `HTTP 1.1`. When sending a HTTP request, the client (usually a web browser application)
can set a `Keep-Alive` header to indicate the http server (Sanic) to not close the TCP connection after it has send the response.
This allows the client to reuse the existing TCP connection to send subsequent HTTP requests, and ensures more efficient
network traffic for both the client and the server.

The `KEEP_ALIVE` config variable is set to `True` in Sanic by default. If you don't need this feature in your application,
set it to `False` to cause all client connections to close immediately after a response is sent, regardless of
The `KEEP_ALIVE` config variable is set to `True` in Sanic by default. If you don't need this feature in your application,
set it to `False` to cause all client connections to close immediately after a response is sent, regardless of
the `Keep-Alive` header on the request.

The amount of time the server holds the TCP connection open is decided by the server itself.
In Sanic, that value is configured using the `KEEP_ALIVE_TIMEOUT` value. By default, it is set to 5 seconds.
This is the same default setting as the Apache HTTP server and is a good balance between allowing enough time for
the client to send a new request, and not holding open too many connections at once. Do not exceed 75 seconds unless
The amount of time the server holds the TCP connection open is decided by the server itself.
In Sanic, that value is configured using the `KEEP_ALIVE_TIMEOUT` value. By default, it is set to 5 seconds.
This is the same default setting as the Apache HTTP server and is a good balance between allowing enough time for
the client to send a new request, and not holding open too many connections at once. Do not exceed 75 seconds unless
you know your clients are using a browser which supports TCP connections held open for that long.

For reference:
Expand All @@ -154,16 +154,58 @@ Opera 11 client hard keepalive limit = 120 seconds
Chrome 13+ client keepalive limit > 300+ seconds
```

### About proxy servers and client ip
### Proxy configuration

When you use a reverse proxy server (e.g. nginx), the value of `request.ip` will contain ip of a proxy, typically `127.0.0.1`. To determine the real client ip, `X-Forwarded-For` and `X-Real-IP` HTTP headers are used. But client can fake these headers if they have not been overridden by a proxy. Sanic has a set of options to determine the level of confidence in these headers.
When you use a reverse proxy server (e.g. nginx), the value of `request.ip` will contain ip of a proxy, typically `127.0.0.1`. Sanic may be configured to use proxy headers for determining the true client IP, available as `request.remote_addr`. The full external URL is also constructed from header fields if available.

* If you have a single proxy, set `PROXIES_COUNT` to `1`. Then Sanic will use `X-Real-IP` if available or the last ip from `X-Forwarded-For`.
Without proper precautions, a malicious client may use proxy headers to spoof its own IP. To avoid such issues, Sanic does not use any proxy headers unless explicitly enabled.

* If you have multiple proxies, set `PROXIES_COUNT` equal to their number to allow Sanic to select the correct ip from `X-Forwarded-For`.
Services behind reverse proxies must configure `FORWARDED_SECRET`, `REAL_IP_HEADER` and/or `PROXIES_COUNT`.

* If you don't use a proxy, set `PROXIES_COUNT` to `0` to ignore these headers and prevent ip falsification.
#### Forwarded header

* If you don't use `X-Real-IP` (e.g. your proxy sends only `X-Forwarded-For`), set `REAL_IP_HEADER` to an empty string.
```
Forwarded: for="1.2.3.4"; proto="https"; host="yoursite.com"; secret="Pr0xy",
for="10.0.0.1"; proto="http"; host="proxy.internal"; by="_1234proxy"
```

* Set `FORWARDED_SECRET` to an identifier used by the proxy of interest.

The secret is used to securely identify a specific proxy server. Given the above header, secret `Pr0xy` would use the information on the first line and secret `_1234proxy` would use the second line. The secret must exactly match the value of `secret` or `by`. A secret in `by` must begin with an underscore and use only characters specified in [RFC 7239 section 6.3](https://tools.ietf.org/html/rfc7239#section-6.3), while `secret` has no such restrictions.

Sanic ignores any elements without the secret key, and will not even parse the header if no secret is set.

All other proxy headers are ignored once a trusted forwarded element is found, as it already carries complete information about the client.

#### Traditional proxy headers

```
X-Real-IP: 1.2.3.4
X-Forwarded-For: 1.2.3.4, 10.0.0.1
X-Forwarded-Proto: https
X-Forwarded-Host: yoursite.com
```

* Set `REAL_IP_HEADER` to `x-real-ip`, `true-client-ip`, `cf-connecting-ip` or other name of such header.
* Set `PROXIES_COUNT` to the number of entries expected in `x-forwarded-for` (name configurable via `FORWARDED_FOR_HEADER`).

If client IP is found by one of these methods, Sanic uses the following headers for URL parts:

* `x-forwarded-proto`, `x-forwarded-host`, `x-forwarded-port`, `x-forwarded-path` and if necessary, `x-scheme`.

#### Proxy config if using ...

* a proxy that supports `forwarded`: set `FORWARDED_SECRET` to the value that the proxy inserts in the header
* Apache Traffic Server: `CONFIG proxy.config.http.insert_forwarded STRING for|proto|host|by=_secret`
* NGHTTPX: `nghttpx --add-forwarded=for,proto,host,by --forwarded-for=ip --forwarded-by=_secret`
* NGINX: after [the official instructions](https://www.nginx.com/resources/wiki/start/topics/examples/forwarded/), add anywhere in your config:

proxy_set_header Forwarded "$proxy_add_forwarded;by=\"_$server_name\";proto=$scheme;host=\"$http_host\";path=\"$request_uri\";secret=_secret";

* a custom header with client IP: set `REAL_IP_HEADER` to the name of that header
* `x-forwarded-for`: set `PROXIES_COUNT` to `1` for a single proxy, or a greater number to allow Sanic to select the correct IP
* no proxies: no configuration required!

#### Changes in Sanic 19.9

The real ip will be available in `request.remote_addr`. If HTTP headers are unavailable or untrusted, `request.remote_addr` will be an empty string; in this case use `request.ip` instead.
Earlier Sanic versions had unsafe default settings. From 19.9 onwards proxy settings must be set manually, and support for negative PROXIES_COUNT has been removed.
6 changes: 6 additions & 0 deletions sanic/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -1309,6 +1309,12 @@ def _helper(
"stop_event will be removed from future versions.",
DeprecationWarning,
)
if self.config.PROXIES_COUNT and self.config.PROXIES_COUNT < 0:
raise ValueError(
"PROXIES_COUNT cannot be negative. "
"https://sanic.readthedocs.io/en/latest/sanic/config.html"
"#proxy-configuration"
)

self.error_handler.debug = debug
self.debug = debug
Expand Down
5 changes: 3 additions & 2 deletions sanic/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,10 @@
"WEBSOCKET_WRITE_LIMIT": 2 ** 16,
"GRACEFUL_SHUTDOWN_TIMEOUT": 15.0, # 15 sec
"ACCESS_LOG": True,
"PROXIES_COUNT": -1,
"FORWARDED_SECRET": None,
"REAL_IP_HEADER": None,
"PROXIES_COUNT": None,
"FORWARDED_FOR_HEADER": "X-Forwarded-For",
"REAL_IP_HEADER": "X-Real-IP",
}


Expand Down
146 changes: 138 additions & 8 deletions sanic/headers.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,20 @@
import re
import typing

from typing import Dict, Iterable, Optional, Tuple
from urllib.parse import unquote

Options = typing.Dict[str, str] # key=value fields in various headers

token, quoted = r"([\w!#$%&'*+\-.^_`|~]+)", r'"([^"]*)"'
parameter = re.compile(fr";\s*{token}=(?:{token}|{quoted})", re.ASCII)
firefox_quote_escape = re.compile(r'\\"(?!; |\s*$)')
Options = Dict[str, str] # key=value fields in various headers
OptionsIterable = Iterable[Tuple[str, str]] # May contain duplicate keys

_token, _quoted = r"([\w!#$%&'*+\-.^_`|~]+)", r'"([^"]*)"'
_param = re.compile(fr";\s*{_token}=(?:{_token}|{_quoted})", re.ASCII)
_firefox_quote_escape = re.compile(r'\\"(?!; |\s*$)')
_ipv6 = "(?:[0-9A-Fa-f]{0,4}:){2,7}[0-9A-Fa-f]{0,4}"
_ipv6_re = re.compile(_ipv6)
_host_re = re.compile(
r"((?:\[" + _ipv6 + r"\])|[a-zA-Z0-9.\-]{1,253})(?::(\d{1,5}))?"
)

# RFC's quoted-pair escapes are mostly ignored by browsers. Chrome, Firefox and
# curl all have different escaping, that we try to handle as well as possible,
Expand All @@ -15,7 +23,7 @@
# For more information, consult ../tests/test_requests.py


def parse_content_header(value: str) -> typing.Tuple[str, Options]:
def parse_content_header(value: str) -> Tuple[str, Options]:
"""Parse content-type and content-disposition header values.
E.g. 'form-data; name=upload; filename=\"file.txt\"' to
Expand All @@ -24,14 +32,136 @@ def parse_content_header(value: str) -> typing.Tuple[str, Options]:
Mostly identical to cgi.parse_header and werkzeug.parse_options_header
but runs faster and handles special characters better. Unescapes quotes.
"""
value = firefox_quote_escape.sub("%22", value)
value = _firefox_quote_escape.sub("%22", value)
pos = value.find(";")
if pos == -1:
options = {}
else:
options = {
m.group(1).lower(): m.group(2) or m.group(3).replace("%22", '"')
for m in parameter.finditer(value[pos:])
for m in _param.finditer(value[pos:])
}
value = value[:pos]
return value.strip().lower(), options


# https://tools.ietf.org/html/rfc7230#section-3.2.6 and
# https://tools.ietf.org/html/rfc7239#section-4
# This regex is for *reversed* strings because that works much faster for
# right-to-left matching than the other way around. Be wary that all things are
# a bit backwards! _rparam matches forwarded pairs alike ";key=value"
_rparam = re.compile(f"(?:{_token}|{_quoted})={_token}\\s*($|[;,])", re.ASCII)


def parse_forwarded(headers, config) -> Optional[Options]:
"""Parse RFC 7239 Forwarded headers.
The value of `by` or `secret` must match `config.FORWARDED_SECRET`
:return: dict with keys and values, or None if nothing matched
"""
header = headers.getall("forwarded", None)
secret = config.FORWARDED_SECRET
if header is None or not secret:
return None
header = ",".join(header) # Join multiple header lines
if secret not in header:
return None
# Loop over <separator><key>=<value> elements from right to left
sep = pos = None
options = []
found = False
for m in _rparam.finditer(header[::-1]):
# Start of new element? (on parser skips and non-semicolon right sep)
if m.start() != pos or sep != ";":
# Was the previous element (from right) what we wanted?
if found:
break
# Clear values and parse as new element
del options[:]
pos = m.end()
val_token, val_quoted, key, sep = m.groups()
key = key.lower()[::-1]
val = (val_token or val_quoted.replace('"\\', '"'))[::-1]
options.append((key, val))
if key in ("secret", "by") and val == secret:
found = True
# Check if we would return on next round, to avoid useless parse
if found and sep != ";":
break
# If secret was found, return the matching options in left-to-right order
return fwd_normalize(reversed(options)) if found else None


def parse_xforwarded(headers, config) -> Optional[Options]:
"""Parse traditional proxy headers."""
real_ip_header = config.REAL_IP_HEADER
proxies_count = config.PROXIES_COUNT
addr = real_ip_header and headers.get(real_ip_header)
if not addr and proxies_count:
assert proxies_count > 0
try:
# Combine, split and filter multiple headers' entries
forwarded_for = headers.getall(config.FORWARDED_FOR_HEADER)
proxies = (p.strip() for h in forwarded_for for p in h.split(","))
proxies = [p for p in proxies if p]
addr = proxies[-proxies_count]
except (KeyError, IndexError):
pass
# No processing of other headers if no address is found
if not addr:
return None

def options():
yield "for", addr
for key, header in (
("proto", "x-scheme"),
("proto", "x-forwarded-proto"), # Overrides X-Scheme if present
("host", "x-forwarded-host"),
("port", "x-forwarded-port"),
("path", "x-forwarded-path"),
):
yield key, headers.get(header)

return fwd_normalize(options())


def fwd_normalize(fwd: OptionsIterable) -> Options:
"""Normalize and convert values extracted from forwarded headers."""
ret = {}
for key, val in fwd:
if val is not None:
try:
if key in ("by", "for"):
ret[key] = fwd_normalize_address(val)
elif key in ("host", "proto"):
ret[key] = val.lower()
elif key == "port":
ret[key] = int(val)
elif key == "path":
ret[key] = unquote(val)
else:
ret[key] = val
except ValueError:
pass
return ret


def fwd_normalize_address(addr: str) -> str:
"""Normalize address fields of proxy headers."""
if addr == "unknown":
raise ValueError() # omit unknown value identifiers
if addr.startswith("_"):
return addr # do not lower-case obfuscated strings
if _ipv6_re.fullmatch(addr):
addr = f"[{addr}]" # bracket IPv6
return addr.lower()


def parse_host(host: str) -> Tuple[Optional[str], Optional[int]]:
"""Split host:port into hostname and port.
:return: None in place of missing elements
"""
m = _host_re.fullmatch(host)
if not m:
return None, None
host, port = m.groups()
return host.lower(), port and int(port)
Loading

0 comments on commit 1e4b1c4

Please sign in to comment.