-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http
silently corrupts the request URL when it contains non-Latin-1 codepoints
#13296
Comments
This test currently fails. It illustrates that Unicode in the URL does not arrive intact to the server, there is silent data corruption along the way at some point. This test is for the issue nodejs#13296.
For sake of performance the impl does not transcode or escape the input into valid ASCII. It assumes the user has done the necessary escaping. That is unlikely to change but can be documented. |
@jasnell That's great, but that's not what this issue is about. I'm not asking in this issue to transcode or escape the input into valid ASCII. All I'm asking is that silent data corruption not happen. There are solutions for that, one of them is to pass the data without corrupting it. Another would be to throw an exception when Unicode strings are passed in of course. I'll leave it up to core contributors to decide on the best solution to fix the bug that data is getting corrupted silently. When So when is the silent data corruption happening? When (It's worth noting that |
I agree, this behavior is unnecessarily unintuitive. It should either:
Options 1 or 2 could be breaking if someone is deliberately using "binary" encoding to send UTF-8 encoded paths, e.g. They could also be breaking if someone actually wants Latin-1 encoded paths (or the server they are talking to is smart enough to recognize it's not UTF-8) and they just happen to use only the Latin-1 range of characters. So Here's my proposal to fix this mess:
In a later major version of Node.js, we could consider one of the following:
cc @nodejs/collaborators feel free to criticize. |
@seishun See #3062 (comment). Allow me to paraphrase myself:
IOW, the following statement:
Is not true (or only conditionally true.) As well, UTF-8 URLs are used widely enough that rejecting them outright is probably not going to fly. I think the first order of business is to untangle the conflation of headers and body somehow. Unfortunately, the naive approach is riddled with performance pitfalls and some backwards compatibility concerns. |
@bnoordhuis
I think both of these could be fixed without touching the conflation of headers and body.
So it boils down to two questions:
I think most would agree that defaulting to/using the encoding of the first data chunk or 'latin1' is broken. |
Indeed. Encoding of the headers should have absolutely nothing to do with the encoding of the payload. |
Great, if we agree on that, then the next question is which assumptions we can make.
|
http
silently corrupts the request URL when it contains Unicodehttp
silently corrupts the request URL when it contains non-Latin-1 codepoints
This test currently fails. It illustrates that Unicode in the URL does not arrive intact to the server, there is silent data corruption along the way at some point. This test is for the issue #13296. PR-URL: #13297 Reviewed-By: James M Snell <[email protected]>
If you aren't sending any payload, then it should be encoded in Latin-1. Could you re-check? |
This test currently fails. It illustrates that Unicode in the URL does not arrive intact to the server, there is silent data corruption along the way at some point. This test is for the issue #13296. PR-URL: #13297 Reviewed-By: James M Snell <[email protected]>
@seishun Sorry, that was just an assumption on my part. I've edited my previous comment to add "I assume". |
This test currently fails. It illustrates that Unicode in the URL does not arrive intact to the server, there is silent data corruption along the way at some point. This test is for the issue #13296. PR-URL: #13297 Reviewed-By: James M Snell <[email protected]>
assert.strictEqual message argument removed to replace with default assert message to show the expected vs actual values Refs: nodejs#13296
assert.strictEqual message argument removed to replace with default assert message to show the expected vs actual values PR-URL: nodejs#18259 Refs: nodejs#13296 Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Shingo Inoue <[email protected]> Reviewed-By: Jon Moss <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]>
assert.strictEqual message argument removed to replace with default assert message to show the expected vs actual values PR-URL: #18259 Refs: #13296 Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Shingo Inoue <[email protected]> Reviewed-By: Jon Moss <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]>
assert.strictEqual message argument removed to replace with default assert message to show the expected vs actual values PR-URL: #18259 Refs: #13296 Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Shingo Inoue <[email protected]> Reviewed-By: Jon Moss <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]>
assert.strictEqual message argument removed to replace with default assert message to show the expected vs actual values PR-URL: #18259 Refs: #13296 Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Shingo Inoue <[email protected]> Reviewed-By: Jon Moss <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]>
@nodejs/http Thoughts on what to do with this? Doc update? Code change? Neither? Something else? |
This has more or less been fixed with #20270, which will be out in v11.x. |
assert.strictEqual message argument removed to replace with default assert message to show the expected vs actual values PR-URL: nodejs#18259 Refs: nodejs#13296 Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Shingo Inoue <[email protected]> Reviewed-By: Jon Moss <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]>
I experience the bug on Node commit 399cb25
Darwin Tephs-Mac-Pro.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64 .
Subsystem: http
Issue description
When making a request using
http.get
with the path set to/café🐶
, the server receives/café=6
. This is not the URL that was sent, it's not even the precent-encoded version of the URL (withencodeURI
), which would be/caf%C3%A9%F0%9F%90%B6
. I expected the URL to be passed along without data corruption.I've created a pull request which contains a test case that currently fails, illustrating this bug. #13297
The text was updated successfully, but these errors were encountered: