Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forwarded headers and otherwise improved proxy handling #1638

Merged
merged 41 commits into from
Sep 2, 2019

Conversation

Tronic
Copy link
Member

@Tronic Tronic commented Jul 20, 2019

This is a work in progress regarding issue #1636.

Tronic added 7 commits July 20, 2019 15:46
… proxy headers.

- Accessible via request.forwarded that tries parse_forwarded and then parse_xforwarded
- parse_forwarded uses the Forwarded header, if config.FORWARDED_SECRET is provided and a matching header field is found
- parse_xforwarded uses X-Real-IP and X-Forwarded-* much alike the existing implementation
- This commit does not change existing request properties that still use the old code and won't make use of Forwarded headers.
…and remote_addr.

X-Scheme handling moved to parse_xforwarded.
- One test removed due to change of semantics - no socket port will be used if any forwarded headers are in effect.
- Other tests augmented with X-Forwarded-For, to allow the header being tested take effect (shouldn't affect old implementation).
@codecov
Copy link

codecov bot commented Jul 21, 2019

Codecov Report

Merging #1638 into master will increase coverage by 0.13%.
The diff coverage is 96.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1638      +/-   ##
==========================================
+ Coverage   91.61%   91.75%   +0.13%     
==========================================
  Files          19       20       +1     
  Lines        2099     2134      +35     
  Branches      391      399       +8     
==========================================
+ Hits         1923     1958      +35     
  Misses        138      138              
  Partials       38       38
Impacted Files Coverage Δ
sanic/config.py 100% <ø> (ø) ⬆️
sanic/request.py 98.35% <100%> (+0.77%) ⬆️
sanic/forwarded.py 95% <95%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84b4112...a04617f. Read the comment docs.

@codecov
Copy link

codecov bot commented Jul 21, 2019

Codecov Report

Merging #1638 into master will increase coverage by 0.31%.
The diff coverage is 98.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1638      +/-   ##
==========================================
+ Coverage   91.65%   91.96%   +0.31%     
==========================================
  Files          22       22              
  Lines        2120     2202      +82     
  Branches      394      414      +20     
==========================================
+ Hits         1943     2025      +82     
  Misses        138      138              
  Partials       39       39
Impacted Files Coverage Δ
sanic/config.py 100% <ø> (ø) ⬆️
sanic/app.py 92.25% <0%> (-0.43%) ⬇️
sanic/request.py 98.36% <100%> (+0.78%) ⬆️
sanic/headers.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2011f3a...3548d6a. Read the comment docs.

Tronic added 11 commits July 21, 2019 14:12
- Previously only the first header was used, so this BUGFIX may affect functionality.
- request.server_name docs claim that it returns the hostname only (no port).
- config.SERVER_NAME may be full URL, so strip scheme, port and path
- HTTP Host and consequently forwarded Host may include port number, so
  strip that also for forwarded hosts (previously only done for HTTP Host).
- Possible performance benefit of limiting to one split.
…ws host to be defined and proxied information still being used.
- request.forwarded lowercases hosts, separates host:port into their own fields and lowercases addresses
- forwarded.parse_host helper function added and used for parsing all host-style headers (IPv6 cannot be simply split(":")).
- more tests fixed not to use underscores in hosts as those are no longer accepted and lead to the field being rejected
Copy link
Contributor

@abuckenheimer abuckenheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work, couple aesthetic changes and questions but this is mostly good with me.

or self.headers.get("x-forwarded-host")
or self.host.split(":")[0]
)
server_name = self.app.config.get("SERVER_NAME")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker here because this is existing logic but general question, does this make sense? SERVER_NAME seems like its relevant to the application server but is being used to override request specific headers which seems weird. @harshanarayana I noticed you were on the review for #1465 is there something I'm missing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gather that SERVER_NAME is supposed to be a rarely specified override, not a fallback. Not sure how useful that actually is. One use would be to actively prevent any headers being used for URL construction, but this is also limited in that it cannot handle the same Sanic instance serving multiple hosts. Also worth noting is that this is supposed to be in the same format as Host, and thus includes any non-default port.

Another consideration is what to do if none of these match. One could possibly still match against SSL SNI but if that is missing, I would not go further. In particular, sockname (ip/port or unix socket path) should not be exposed in external URLs. OTOH, all reasonable clients include Host or its HTTP/2 equivalent :authority header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abuckenheimer I think SERVER_NAME here should be fine. I see how it might be a bit tricky to look at but this is generally never used like @Tronic mentioned. At least in all of my work so far, never had to use a parameter like this to override anything.

As for the SSL SNI goes, do we really need to do that in the core of sanic at all ? Considering it's a client supported entity and depending on the browsers you are using it might or might not work. I think at this point, we might be better off not worrying about the SNIs. This is just a personal opinion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that getting an SNI is practically certain at this point, whenever SSL is used, as vhosts would not work without. Getting the value is quite simple with a callback in SSL context. How that would be used in Sanic is another question, especially since it doesn't contain port (which is present in host/:authority, and is presumed to exist in SERVER_NAME as well, whenever a non-standard port is being used).

What ever is going to be done with SERVER_NAME, I propose being postponed to another pull request.

sanic/request.py Outdated
self._remote_addr = ""
else:
self._remote_addr = ""
self._remote_addr = self.forwarded.get("for") or ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first of all 👏 much cleaner moving this logic out into headers.py, second I have a slight preference for the more straight forward self.forwarded.get("for", ""). But I get this is a bit of a bikesheding exercise and as is this matches the convention of the module so I defer to whatever you prefer

@@ -372,86 +379,70 @@ def _get_address(self):
def server_name(self):
"""
Attempt to get the server's hostname in this order:
`config.SERVER_NAME`, `x-forwarded-host` header, :func:`Request.host`
`config.SERVER_NAME`, `forwarded` header, `x-forwarded-host` header,
:func:`Request.host`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
header = headers.getall("forwarded", None)
secret = config.FORWARDED_SECRET
if header is None or not secret:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with the RFC but why can we only accept forwaded headers with matching secrets? Seems like a decision that could be left to the application?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to use PROXIES_COUNT, as is done with X-Forwarded headers in the existing as well as the new implementation. If this same setting was applied to Forwarded as well, any existing installed applications using X-Forwarded would instantly become exploitable (send a request with fake Forwarded header), so if we wanted to use that, we'd have to add a new config option for that.

Using a secret key, as suggested by Nginx folks, solves the entire issue in a smart and secure manner. This key is not part of RFC but such extensions are explicitly allowed by https://tools.ietf.org/html/rfc7239#section-5.5

app.config.PROXIES_COUNT = -1
headers = {
"Forwarded": (
'for=1.1.1.1, for=injected;host="'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is a bit sketchy to me, again im not super familiar with the RFC but how did you come up with the behavior for the test given that the dangling double quote makes the header seem malformed?

Copy link
Member Author

@Tronic Tronic Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each proxy will append its entry to the end, comma separated or as an additional header row. If proxies are not careful in examining the prior entries for correct syntax (and you can be sure that many of them will just blindly append to anything), malformed results like this may occur. We mitigate such problems by:

  • Parsing right-to-left, starting with the more trustworthy most-local proxies.
  • Parse errors clear any keys found (and then continue on the next-to-left correctly formed element) so that malicious for=injected is not combined with secret=mySecret added by the true proxy and accidentally accepted.

assert response.json == {"for": "test", "secret": "mySecret"}

# Secret insulated by malformed field #2
headers = {"Forwarded": r'for=test;b0rked;secret=mySecret;proto=wss'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need to be a raw string

assert response.json == {"proto": "wss", "secret": "mySecret"}

# Unexpected termination should not lose existing acceptable values
headers = {"Forwarded": r'b0rked;secret=mySecret;proto=wss'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need to be a raw string

@@ -26,6 +26,7 @@
"WEBSOCKET_WRITE_LIMIT": 2 ** 16,
"GRACEFUL_SHUTDOWN_TIMEOUT": 15.0, # 15 sec
"ACCESS_LOG": True,
"FORWARDED_SECRET": None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Secret insulated by malformed field #2
headers = {"Forwarded": r'for=test;b0rked;secret=mySecret;proto=wss'}
request, response = app.test_client.get("/", headers=headers)
assert response.json == {"proto": "wss", "secret": "mySecret"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I'm a bit confused here, why am I able to read secret=mySecret in the test for 481 even though it is on the left hand side of ;b0rked; while in this test we are not supposed to parse for=test because its on the left hand side of ;b0rked;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parse errors clear any keys found and then continue on the next (to left) correctly formed option as if it was another comma-separated element. This insulation is important because we cannot know if there was supposed to be a comma within what the parse error made us skip. Other options would be:

  • Keep skipping until the next comma (somewhat harder to implement in this approach)
  • Ignore everything on the left side and quit parsing

The current solution is more resilient against accidental errors and I don't see security issues with it.

sanic/headers.py Outdated
secret = True
if secret is True and sep != ";":
return normalize(ret)
return normalize(ret) if secret is True else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having 3 return sites is a bit complicated, I buy that this is necessary for the way the parsing is done but can you add some comments on what state leads to each type of exit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to use backticks in a git commit -m "..." is a bad idea. That was supposed to say "add found to avoid re-using the secret variable".

@Tronic
Copy link
Member Author

Tronic commented Aug 30, 2019

I am updating the docs right now.

It is becoming ever more clear that PROXIES_COUNT should default to 0 to avoid a security hole (clients can easily fake their IPs). It has not been previously fixed because of backward compatibility concerns that affect services running behind proxies using the default value. If the default was changed, such services would see their proxy as client IP, which is actually less dangerous than allowing the client to specify ANY IP. Also, they would begin to see the Host header as server_name, instead of X-Forwarded-Host. This may or may not change anything, depending on how their proxy is configured.

My recommendation is to change PROXIES_COUNT default to 0 (disable X-Forwarded-*, X-Real-IP etc), and to remove the feature that allows negative values (because it is always subject to client spoofing). Proxy headers would then be used only if configured and only in a secure manner.

Request for comment: how to proceed with this?

Copy link
Contributor

@harshanarayana harshanarayana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tronic This is some solid work. Absolutely love the usage of RegEx over the regular methods. Thanks a lot of this work. I have a few minor comments, otherwise, I am good.

Please make sure you update the documents in the docs/folder as applicable with right details and would you terribly mind me asking for a few example cases or right and wrong headers in the document as well so that the end-users can get a better grasp of why sanic would drop certain tings from the forward headers.

# This regex is for *reversed* strings because that works much faster for
# right-to-left matching than the other way around. Be wary that all things are
# a bit backwards! _rparam matches forwarded pairs alike ";key=value"
_rparam = re.compile(f"(?:{_token}|{_quoted})={_token}\\s*($|[;,])", re.ASCII)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I like this one.

sanic/headers.py Outdated
proxies_count = config.PROXIES_COUNT
if not proxies_count:
return None
h1, h2 = config.REAL_IP_HEADER, config.FORWARDED_FOR_HEADER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please use a better name for the variables instead of h1 and h2? Sorry for being so picky.

sanic/headers.py Outdated
addr = h1 and headers.get(h1)
forwarded_for = h2 and headers.getall(h2, None)
if not addr and forwarded_for:
assert proxies_count == -1 or proxies_count > 0, config.PROXIES_COUNT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea to have a bit of a clearly defined error message with the assert ?

Or would it be easier to handle it the way the addr extraction is handled instead? Return None even if the proxies_count matching fails.

sanic/headers.py Outdated
return normalize({"for": addr, **{k: v for k, v in other if v}})


_ipv6 = "(?:[0-9A-Fa-f]{0,4}:){2,7}[0-9A-Fa-f]{0,4}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to make this regex a bit stronger if we are to rely on the regex. (I am sure you have thought of this already) This would be a good to have. Otherwise, we might as well use the ipaddress module itself.

xref: http://vernon.mauery.com/content/projects/linux/ipv6_regex

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use ipaddress but its implementation is awfully slow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tronic Really? I wasn't aware of that. Internally at work, I have a custom class called IPwhich does most of the things for me and it's based out of RegEx. But since ipaddress was part of python core I thought might as well try. But if it's really slow, then we can stick to regex.

Copy link
Member Author

@Tronic Tronic Aug 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not entirely comparable as they do different things, but for reference:

%timeit ipaddress.ip_address("2606:4700:4700::1001")                   
8.76 µs ± 48.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit ipv6_re.match("2606:4700:4700::1001")                            
588 ns ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit parse_forwarded(headers, config)
# returns {'secret': 'mySecret', 'for': '[2606:4700:4700::1001]'}
5.75 µs ± 26.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tronic That's interesting. Well, if we are using the RegEx, can we please have a bit more strict version of this RegEx? This regex can easily return false positives

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any practical way to make it more strict than it already is. Also, this is only meant to tell IPv4 and IPv6 addresses and hostnames apart, and as far as I can tell, there cannot be any false positives there. Neither IPv4 or hostnames can have 2-7 colons in them. The closest that I can think of is a hexadecimal-looking hostname in a host header like cafe:8000 which still only has one colon, and IPv6 host should anyway be bracketed like [f00d::cafe]:8000.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A somewhat bigger concern is that IPv6 are not normalized to compressed form. For this it might be justified to use ipaddress module even though running an address through it takes more time than parsing the entire headers.

sanic/headers.py Outdated Show resolved Hide resolved
sanic/request.py Outdated Show resolved Hide resolved
tests/test_requests.py Outdated Show resolved Hide resolved
@Tronic
Copy link
Member Author

Tronic commented Aug 31, 2019

I believe that all concerns raised are now addressed.

A rather big and breaking change resulted in form of new REAL_IP_HEADER/PROXIES_COUNT default values and semantics, and all new proxy configuration documentation was written. Both of these aspects require further review.

@harshanarayana
Copy link
Contributor

@Tronic Can you please add a file under changelogs directory with the name 1638.feature.rst and a few details about the PR so that it can be pulled into the change log during the next routine release?

xref : https://sanic.readthedocs.io/en/latest/sanic/contributing.html#changelog

- Forwarded host is now preserved in original format
- request.host now returns a forwarded host if available, else the Host header
- Forwarded options are preserved in original order, and later keys override earlier ones
- Forwarded path is automatically URL-unquoted
- Forwarded 'by' and 'for' are omitted if their value is unknown
- Tests modified accordingly
- Cleanup and improved documentation
@Tronic
Copy link
Member Author

Tronic commented Sep 1, 2019

@Tronic Can you please add a file under changelogs directory with the name 1638.feature.rst and a few details about the PR so that it can be pulled into the change log during the next routine release?

xref : https://sanic.readthedocs.io/en/latest/sanic/contributing.html#changelog

I don't really know how to do it, and there don't seem to be any existing examples. It's all documented in git commits and in updated config documentation. Especially I do not know how you wish to announce the security changes that fix gaping security holes of earlier releases but that also may break some existing installations.

@Tronic
Copy link
Member Author

Tronic commented Sep 1, 2019

All work done for now, and I'm going to be off keyboard. See you next week!

Copy link
Contributor

@sjsadowski sjsadowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to approve this to get it in. I'll submit a PR to handle the changelog file.

@sjsadowski sjsadowski merged commit 1e4b1c4 into sanic-org:master Sep 2, 2019
@abuckenheimer
Copy link
Contributor

sorry folks was out of town over the long weekend, I see this got merged which I'm fine with. I do have one lingering question here about changing the PROXIES_COUNT default, namely the new claim in the docs: Earlier Sanic versions had unsafe default settings. From 19.9 onwards proxy settings must be set manually, and support for negative PROXIES_COUNT has been removed. I buy that forcing the user to evaluate these options explicitly now probably improves the default state of affairs but I am not convinced that there is no place for a -1 option or that the default was even "unsafe" by itself. I think in reality an attacker can spoof their actual src IP address anyway so relying on an ip address alone to do security critical work never seemed to be the right solution, whether it was from the headers or spoofed tcp fields.

I can't really speak towards how the community uses this but I assume this feature was their for a reason so I'd be inclined to change the default but let proxies_count=-1 continue to be a feature preserving backwards compatibility. Obviously we should add some documentation on its ability to be manipulated but I think allowing the developer to make their own call seems more right. That said I may be underestimating the security risk here and am fine being overruled if you folks think otherwise.

@Tronic
Copy link
Member Author

Tronic commented Sep 3, 2019

It is a very common practice to rely on IP addresses for rate limitation (failed logins etc), temporary bans and reporting to police from which address something was posted. This being so easily spoofed is completely irresponsible, as that would also allow creating false records that would get someone else in trouble.

I am not aware of other ways to spoof your IP. Of course you can have a large IP block, switch dynamic IPs or use VPN/Tor, but even then the IP registered at a site more or less links back to where the request came, and won't point to any computer of your choosing.

If using PROXIES_COUNT=-1 behind any standard proxy (that append rather than replace), or directly on Internet, browser extensions and curl -H "x-forwarded-for: 1.2.3.4" make it too easy to spoof any address.

The easy migration in most cases is PROXIES_COUNT=1. Multi-proxy setups aren't that much harder, (e.g. Cloudflare REAL_IP_HEADER="CF-Connecting-IP").

@abuckenheimer
Copy link
Contributor

I'm 100% for changing the default to None, it just seems like there are plenty of cases where you know you can trust the client so PROXIES_COUNT=-1 may be a reasonable thing for a developer to explicitly set. This is mostly a "were all consenting adults here" approach, I get how this can go wrong in practice but I'm hesitant to go further than a stern warning since this breaks backwards compatibility. @harshanarayana @vltr any thoughts?

@harshanarayana
Copy link
Contributor

@abuckenheimer @Tronic

Internally at work, we have a custom webserver extended out of aiophttop and we use proxy_count behavior that defaults to -1. In our case, the deployment is completely in a private network and guarded by the internal proxy + firewalls. So using -1 works perfectly.

However, in this context, I do like the default value of None over -1. But keeping the backward-compatibility in the next release does sound like a good idea. That also falls in line with the idea of giving enough deprecation message so that users can migrate things first.

@Tronic
Copy link
Member Author

Tronic commented Sep 3, 2019

@abuckenheimer @harshanarayana Is there anything in those setups stopping you from using PROXIES_COUNT=1? Sounds like a small setup that cannot have more than one proxy. Or if you have two, use PROXIES_COUNT=2 or REAL_IP_HEADER, which ever suits you better.

It is very hard to think of actual use cases where (1) you need access to remote_addr, (2) it doesn't need to be trusted (or you can be absolutely sure that your proxy deletes any such headers from client), and (3) it wouldn't be trivially solved by setting one or two of these proxy settings

If you have one, I will reconsider, but otherwise I am strongly against adding that security hole back.

@Tronic
Copy link
Member Author

Tronic commented Sep 4, 2019

It might be helpful to add the options needed by Cloudflare, AWS Cloudfront and other big ones to our docs. What others there are that should be included?

@abuckenheimer
Copy link
Contributor

abuckenheimer commented Sep 4, 2019

It doesn't sound hard to work around the new restriction I'm just arguing on the behalf of those that aren't in this thread that we should preserve backwards compatibility and move it into deprecation mode.

This is mostly predicated on my analysis that calling this a "security hole" is a bit harsh, really the security hole is assuming you can trust the client (when you can't). If this was more definitively a security hole by itself then we could forgo the deprecation process.

Happy to post the follow up PR to move this into deprecation mode, just want to see if I'm missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants