-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forwarded headers and otherwise improved proxy handling #1638
Conversation
… proxy headers. - Accessible via request.forwarded that tries parse_forwarded and then parse_xforwarded - parse_forwarded uses the Forwarded header, if config.FORWARDED_SECRET is provided and a matching header field is found - parse_xforwarded uses X-Real-IP and X-Forwarded-* much alike the existing implementation - This commit does not change existing request properties that still use the old code and won't make use of Forwarded headers.
…and remote_addr. X-Scheme handling moved to parse_xforwarded.
… forwards headers are used.
- One test removed due to change of semantics - no socket port will be used if any forwarded headers are in effect. - Other tests augmented with X-Forwarded-For, to allow the header being tested take effect (shouldn't affect old implementation).
Codecov Report
@@ Coverage Diff @@
## master #1638 +/- ##
==========================================
+ Coverage 91.61% 91.75% +0.13%
==========================================
Files 19 20 +1
Lines 2099 2134 +35
Branches 391 399 +8
==========================================
+ Hits 1923 1958 +35
Misses 138 138
Partials 38 38
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #1638 +/- ##
==========================================
+ Coverage 91.65% 91.96% +0.31%
==========================================
Files 22 22
Lines 2120 2202 +82
Branches 394 414 +20
==========================================
+ Hits 1943 2025 +82
Misses 138 138
Partials 39 39
Continue to review full report at Codecov.
|
…ameters with parse_xforwarded.
- Previously only the first header was used, so this BUGFIX may affect functionality.
- request.server_name docs claim that it returns the hostname only (no port). - config.SERVER_NAME may be full URL, so strip scheme, port and path - HTTP Host and consequently forwarded Host may include port number, so strip that also for forwarded hosts (previously only done for HTTP Host). - Possible performance benefit of limiting to one split.
…il a proper solution is implemented).
…ws host to be defined and proxied information still being used.
… spec allows for.
- request.forwarded lowercases hosts, separates host:port into their own fields and lowercases addresses - forwarded.parse_host helper function added and used for parsing all host-style headers (IPv6 cannot be simply split(":")). - more tests fixed not to use underscores in hosts as those are no longer accepted and lead to the field being rejected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work, couple aesthetic changes and questions but this is mostly good with me.
or self.headers.get("x-forwarded-host") | ||
or self.host.split(":")[0] | ||
) | ||
server_name = self.app.config.get("SERVER_NAME") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a blocker here because this is existing logic but general question, does this make sense? SERVER_NAME seems like its relevant to the application server but is being used to override request specific headers which seems weird. @harshanarayana I noticed you were on the review for #1465 is there something I'm missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gather that SERVER_NAME is supposed to be a rarely specified override, not a fallback. Not sure how useful that actually is. One use would be to actively prevent any headers being used for URL construction, but this is also limited in that it cannot handle the same Sanic instance serving multiple hosts. Also worth noting is that this is supposed to be in the same format as Host, and thus includes any non-default port.
Another consideration is what to do if none of these match. One could possibly still match against SSL SNI but if that is missing, I would not go further. In particular, sockname (ip/port or unix socket path) should not be exposed in external URLs. OTOH, all reasonable clients include Host or its HTTP/2 equivalent :authority header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abuckenheimer I think SERVER_NAME
here should be fine. I see how it might be a bit tricky to look at but this is generally never used like @Tronic mentioned. At least in all of my work so far, never had to use a parameter like this to override anything.
As for the SSL SNI goes, do we really need to do that in the core of sanic
at all ? Considering it's a client supported entity and depending on the browsers you are using it might or might not work. I think at this point, we might be better off not worrying about the SNIs. This is just a personal opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that getting an SNI is practically certain at this point, whenever SSL is used, as vhosts would not work without. Getting the value is quite simple with a callback in SSL context. How that would be used in Sanic is another question, especially since it doesn't contain port (which is present in host/:authority, and is presumed to exist in SERVER_NAME as well, whenever a non-standard port is being used).
What ever is going to be done with SERVER_NAME
, I propose being postponed to another pull request.
sanic/request.py
Outdated
self._remote_addr = "" | ||
else: | ||
self._remote_addr = "" | ||
self._remote_addr = self.forwarded.get("for") or "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first of all 👏 much cleaner moving this logic out into headers.py, second I have a slight preference for the more straight forward self.forwarded.get("for", "")
. But I get this is a bit of a bikesheding exercise and as is this matches the convention of the module so I defer to whatever you prefer
@@ -372,86 +379,70 @@ def _get_address(self): | |||
def server_name(self): | |||
""" | |||
Attempt to get the server's hostname in this order: | |||
`config.SERVER_NAME`, `x-forwarded-host` header, :func:`Request.host` | |||
`config.SERVER_NAME`, `forwarded` header, `x-forwarded-host` header, | |||
:func:`Request.host` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | ||
header = headers.getall("forwarded", None) | ||
secret = config.FORWARDED_SECRET | ||
if header is None or not secret: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with the RFC but why can we only accept forwaded headers with matching secrets? Seems like a decision that could be left to the application?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is to use PROXIES_COUNT, as is done with X-Forwarded headers in the existing as well as the new implementation. If this same setting was applied to Forwarded as well, any existing installed applications using X-Forwarded would instantly become exploitable (send a request with fake Forwarded header), so if we wanted to use that, we'd have to add a new config option for that.
Using a secret key, as suggested by Nginx folks, solves the entire issue in a smart and secure manner. This key is not part of RFC but such extensions are explicitly allowed by https://tools.ietf.org/html/rfc7239#section-5.5
app.config.PROXIES_COUNT = -1 | ||
headers = { | ||
"Forwarded": ( | ||
'for=1.1.1.1, for=injected;host="' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is a bit sketchy to me, again im not super familiar with the RFC but how did you come up with the behavior for the test given that the dangling double quote makes the header seem malformed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each proxy will append its entry to the end, comma separated or as an additional header row. If proxies are not careful in examining the prior entries for correct syntax (and you can be sure that many of them will just blindly append to anything), malformed results like this may occur. We mitigate such problems by:
- Parsing right-to-left, starting with the more trustworthy most-local proxies.
- Parse errors clear any keys found (and then continue on the next-to-left correctly formed element) so that malicious for=injected is not combined with secret=mySecret added by the true proxy and accidentally accepted.
tests/test_requests.py
Outdated
assert response.json == {"for": "test", "secret": "mySecret"} | ||
|
||
# Secret insulated by malformed field #2 | ||
headers = {"Forwarded": r'for=test;b0rked;secret=mySecret;proto=wss'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't need to be a raw string
tests/test_requests.py
Outdated
assert response.json == {"proto": "wss", "secret": "mySecret"} | ||
|
||
# Unexpected termination should not lose existing acceptable values | ||
headers = {"Forwarded": r'b0rked;secret=mySecret;proto=wss'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't need to be a raw string
@@ -26,6 +26,7 @@ | |||
"WEBSOCKET_WRITE_LIMIT": 2 ** 16, | |||
"GRACEFUL_SHUTDOWN_TIMEOUT": 15.0, # 15 sec | |||
"ACCESS_LOG": True, | |||
"FORWARDED_SECRET": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add documentation to https://github.com/huge-success/sanic/blob/master/docs/sanic/config.md
# Secret insulated by malformed field #2 | ||
headers = {"Forwarded": r'for=test;b0rked;secret=mySecret;proto=wss'} | ||
request, response = app.test_client.get("/", headers=headers) | ||
assert response.json == {"proto": "wss", "secret": "mySecret"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I'm a bit confused here, why am I able to read secret=mySecret
in the test for 481 even though it is on the left hand side of ;b0rked;
while in this test we are not supposed to parse for=test
because its on the left hand side of ;b0rked;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parse errors clear any keys found and then continue on the next (to left) correctly formed option as if it was another comma-separated element. This insulation is important because we cannot know if there was supposed to be a comma within what the parse error made us skip. Other options would be:
- Keep skipping until the next comma (somewhat harder to implement in this approach)
- Ignore everything on the left side and quit parsing
The current solution is more resilient against accidental errors and I don't see security issues with it.
sanic/headers.py
Outdated
secret = True | ||
if secret is True and sep != ";": | ||
return normalize(ret) | ||
return normalize(ret) if secret is True else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having 3 return sites is a bit complicated, I buy that this is necessary for the way the parsing is done but can you add some comments on what state leads to each type of exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to use backticks in a git commit -m "..." is a bad idea. That was supposed to say "add found
to avoid re-using the secret
variable".
I am updating the docs right now. It is becoming ever more clear that PROXIES_COUNT should default to 0 to avoid a security hole (clients can easily fake their IPs). It has not been previously fixed because of backward compatibility concerns that affect services running behind proxies using the default value. If the default was changed, such services would see their proxy as client IP, which is actually less dangerous than allowing the client to specify ANY IP. Also, they would begin to see the Host header as server_name, instead of X-Forwarded-Host. This may or may not change anything, depending on how their proxy is configured. My recommendation is to change PROXIES_COUNT default to 0 (disable X-Forwarded-*, X-Real-IP etc), and to remove the feature that allows negative values (because it is always subject to client spoofing). Proxy headers would then be used only if configured and only in a secure manner. Request for comment: how to proceed with this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tronic This is some solid work. Absolutely love the usage of RegEx over the regular methods. Thanks a lot of this work. I have a few minor comments, otherwise, I am good.
Please make sure you update the documents in the docs/
folder as applicable with right details and would you terribly mind me asking for a few example cases or right and wrong headers in the document as well so that the end-users can get a better grasp of why sanic would drop certain tings from the forward headers.
# This regex is for *reversed* strings because that works much faster for | ||
# right-to-left matching than the other way around. Be wary that all things are | ||
# a bit backwards! _rparam matches forwarded pairs alike ";key=value" | ||
_rparam = re.compile(f"(?:{_token}|{_quoted})={_token}\\s*($|[;,])", re.ASCII) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I like this one.
sanic/headers.py
Outdated
proxies_count = config.PROXIES_COUNT | ||
if not proxies_count: | ||
return None | ||
h1, h2 = config.REAL_IP_HEADER, config.FORWARDED_FOR_HEADER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please use a better name for the variables instead of h1
and h2
? Sorry for being so picky.
sanic/headers.py
Outdated
addr = h1 and headers.get(h1) | ||
forwarded_for = h2 and headers.getall(h2, None) | ||
if not addr and forwarded_for: | ||
assert proxies_count == -1 or proxies_count > 0, config.PROXIES_COUNT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a good idea to have a bit of a clearly defined error message with the assert ?
Or would it be easier to handle it the way the addr
extraction is handled instead? Return None
even if the proxies_count
matching fails.
sanic/headers.py
Outdated
return normalize({"for": addr, **{k: v for k, v in other if v}}) | ||
|
||
|
||
_ipv6 = "(?:[0-9A-Fa-f]{0,4}:){2,7}[0-9A-Fa-f]{0,4}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to make this regex a bit stronger if we are to rely on the regex. (I am sure you have thought of this already) This would be a good to have. Otherwise, we might as well use the ipaddress
module itself.
xref: http://vernon.mauery.com/content/projects/linux/ipv6_regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use ipaddress but its implementation is awfully slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tronic Really? I wasn't aware of that. Internally at work, I have a custom class called IP
which does most of the things for me and it's based out of RegEx. But since ipaddress
was part of python core I thought might as well try. But if it's really slow, then we can stick to regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not entirely comparable as they do different things, but for reference:
%timeit ipaddress.ip_address("2606:4700:4700::1001")
8.76 µs ± 48.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit ipv6_re.match("2606:4700:4700::1001")
588 ns ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit parse_forwarded(headers, config)
# returns {'secret': 'mySecret', 'for': '[2606:4700:4700::1001]'}
5.75 µs ± 26.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tronic That's interesting. Well, if we are using the RegEx, can we please have a bit more strict version of this RegEx? This regex can easily return false positives
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any practical way to make it more strict than it already is. Also, this is only meant to tell IPv4 and IPv6 addresses and hostnames apart, and as far as I can tell, there cannot be any false positives there. Neither IPv4 or hostnames can have 2-7 colons in them. The closest that I can think of is a hexadecimal-looking hostname in a host header like cafe:8000
which still only has one colon, and IPv6 host should anyway be bracketed like [f00d::cafe]:8000
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A somewhat bigger concern is that IPv6 are not normalized to compressed form. For this it might be justified to use ipaddress module even though running an address through it takes more time than parsing the entire headers.
- Also cleanup and added comments
- This is getting ridiculous: cannot fit an URL on one line, linter requires splitting the string literal!
I believe that all concerns raised are now addressed. A rather big and breaking change resulted in form of new REAL_IP_HEADER/PROXIES_COUNT default values and semantics, and all new proxy configuration documentation was written. Both of these aspects require further review. |
@Tronic Can you please add a file under xref : https://sanic.readthedocs.io/en/latest/sanic/contributing.html#changelog |
- Forwarded host is now preserved in original format - request.host now returns a forwarded host if available, else the Host header - Forwarded options are preserved in original order, and later keys override earlier ones - Forwarded path is automatically URL-unquoted - Forwarded 'by' and 'for' are omitted if their value is unknown - Tests modified accordingly - Cleanup and improved documentation
I don't really know how to do it, and there don't seem to be any existing examples. It's all documented in git commits and in updated config documentation. Especially I do not know how you wish to announce the security changes that fix gaping security holes of earlier releases but that also may break some existing installations. |
All work done for now, and I'm going to be off keyboard. See you next week! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to approve this to get it in. I'll submit a PR to handle the changelog file.
sorry folks was out of town over the long weekend, I see this got merged which I'm fine with. I do have one lingering question here about changing the PROXIES_COUNT default, namely the new claim in the docs: I can't really speak towards how the community uses this but I assume this feature was their for a reason so I'd be inclined to change the default but let proxies_count=-1 continue to be a feature preserving backwards compatibility. Obviously we should add some documentation on its ability to be manipulated but I think allowing the developer to make their own call seems more right. That said I may be underestimating the security risk here and am fine being overruled if you folks think otherwise. |
It is a very common practice to rely on IP addresses for rate limitation (failed logins etc), temporary bans and reporting to police from which address something was posted. This being so easily spoofed is completely irresponsible, as that would also allow creating false records that would get someone else in trouble. I am not aware of other ways to spoof your IP. Of course you can have a large IP block, switch dynamic IPs or use VPN/Tor, but even then the IP registered at a site more or less links back to where the request came, and won't point to any computer of your choosing. If using The easy migration in most cases is |
I'm 100% for changing the default to |
Internally at work, we have a custom webserver extended out of However, in this context, I do like the default value of |
@abuckenheimer @harshanarayana Is there anything in those setups stopping you from using PROXIES_COUNT=1? Sounds like a small setup that cannot have more than one proxy. Or if you have two, use PROXIES_COUNT=2 or REAL_IP_HEADER, which ever suits you better. It is very hard to think of actual use cases where (1) you need access to remote_addr, (2) it doesn't need to be trusted (or you can be absolutely sure that your proxy deletes any such headers from client), and (3) it wouldn't be trivially solved by setting one or two of these proxy settings If you have one, I will reconsider, but otherwise I am strongly against adding that security hole back. |
It might be helpful to add the options needed by Cloudflare, AWS Cloudfront and other big ones to our docs. What others there are that should be included? |
It doesn't sound hard to work around the new restriction I'm just arguing on the behalf of those that aren't in this thread that we should preserve backwards compatibility and move it into deprecation mode. This is mostly predicated on my analysis that calling this a "security hole" is a bit harsh, really the security hole is assuming you can trust the client (when you can't). If this was more definitively a security hole by itself then we could forgo the deprecation process. Happy to post the follow up PR to move this into deprecation mode, just want to see if I'm missing something. |
This is a work in progress regarding issue #1636.