Skip to content
This repository has been archived by the owner on Nov 6, 2022. It is now read-only.

Can no longer parse header "X-Custom-Header-Bytes: …" despite browsers doing this fine #281

Closed
domenic opened this issue Feb 14, 2016 · 15 comments

Comments

@domenic
Copy link

domenic commented Feb 14, 2016

This is causing jsdom to fail several web-platform-tests where it is tested that its behavior emulates XMLHttpRequest (jsdom/jsdom#1380). I believe this is a regression as of Node.js 5.6.0.

The test is a combination of https://github.com/w3c/web-platform-tests/blob/master/XMLHttpRequest/resources/headers.py and https://github.com/w3c/web-platform-tests/blob/master/XMLHttpRequest/getresponseheader-special-characters.htm where it tests that the server setting

X-Custom-Header-Bytes: …

results in the client receiving "\xE2\x80\xA6" (i.e. "�").

We would like to be able to achieve this behavior in jsdom just like browsers are specified to do.

/cc @annevk for the standards-based perspective. I imagine it's something like "HTTP underspecifies interoperable behavior" but couldn't find anything in Fetch that specifically addresses the issue of what to do when headers are not valid according to HTTP's field-content production. Maybe the test is wrong?

@indutny
Copy link
Member

indutny commented Feb 14, 2016

Quote from RFC 7230:

field-value    = *( field-content / obs-fold )
field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar    = VCHAR / obs-text
obs-text       = %x80-FF
VCHAR (any visible [USASCII] character)

Thus only ASCII characters are allowed in header values. We were more permissive in the past, but it turns out that this could be used as an attack vector against intermediate proxies.

IMO, this is won't fix, since it is implemented according to spec.

Thanks for asking, though!

@indutny indutny closed this as completed Feb 14, 2016
@indutny
Copy link
Member

indutny commented Feb 14, 2016

More from the spec:

A recipient MUST parse an HTTP message as a sequence of octets in an
   encoding that is a superset of US-ASCII [USASCII].  Parsing an HTTP
   message as a stream of Unicode characters, without regard for the
   specific encoding, creates security vulnerabilities due to the
   varying ways that string processing libraries handle invalid
   multibyte character sequences that contain the octet LF (%x0A).

@domenic
Copy link
Author

domenic commented Feb 14, 2016

I think we should endeavor to be as permissive as browsers. I don't suggest parsing it as a sequence of Unicode characters, but we should at least be interoperable with the most common HTTP processing libraries on the planet. Otherwise, people can't write software (like jsdom) which has the same capabilities as browsers.

@domenic
Copy link
Author

domenic commented Feb 14, 2016

At the very least, I need some sort of option. Is there a way to plug in browser-like header parsing to Node.js, without reimplementing all of require("http")?

@domenic
Copy link
Author

domenic commented Feb 14, 2016

It's also not clear to me that the section you quoted supports erroring. All it says is that you should process the byte sequence 0xE2 0x80 0xA6 as a sequence of octets in a superset of US-ASCII. The correct way to do that is to treat it as three single characters, as the test requires.

So I think the current behavior of erroring is wrong per spec, and the spec actually is fairly clear about what should happen here. I agree the field-value production does not allow this, but that's the section for HTTP server writers to make sure they write compliant servers. Your second quote is about HTTP response processors, who MUST accept any value and process it as a superset of US-ASCII.

@indutny
Copy link
Member

indutny commented Feb 14, 2016

@domenic idk, maybe @mscdex or @bnoordhuis has an opinion on this.

I wonder if it should be fixed in browsers instead? It is really strange that browsers implement behavior that is strictly prohibited by HTTP spec.

@indutny
Copy link
Member

indutny commented Feb 14, 2016

@domenic yeah, I agree, superset quote was kind of off-topic. Anyway, field-value is strict enough to be a reason for fatal error and connection termination.

@domenic
Copy link
Author

domenic commented Feb 14, 2016

(EDIT: upon further reading of the error handling section of the RFC, this post is mostly incorrect.)

Mid-air comment collision, but to reiterate my main point:

It is really strange that browsers implement behavior that is strictly prohibited by HTTP spec.

The HTTP spec is pretty clear that clients are required to implement this behavior. http-parser's current behavior is not spec-compliant, and in fact violates the MUST in

A recipient MUST parse an HTTP message as a sequence of octets in an
encoding that is a superset of US-ASCII [USASCII].

Anyway, field-value is strict enough to be a reason for fatal error and connection termination.

No, the HTTP RFC specifically disallows this kind of behavior. A server must only emit valid field-value values. A client MUST process everything as a byte sequence in a superset of US-ASCII.

@indutny
Copy link
Member

indutny commented Feb 14, 2016

@domenic one more question, could you please quote the part of spec that says:

...that's the section for HTTP server writers to make sure they write compliant servers

I didn't really see anything like this there, but maybe I am missing something.

@domenic
Copy link
Author

domenic commented Feb 14, 2016

In https://tools.ietf.org/html/rfc7230#section-2.5,

A sender MUST NOT generate protocol elements that do not match the grammar defined by the corresponding ABNF rules.

In contrast, the only applicability of ABNF to the receiver is

When a received protocol element is parsed, the recipient MUST be able to parse any value of reasonable length that is applicable to the recipient's role and that matches the grammar defined by the corresponding ABNF rules.

(i.e. you MUST be able to parse valid header values, but nothing is said about invalid header values.)

Later:

HTTP does not define specific error handling mechanisms except when they have a direct impact on security, since different applications of the protocol require different error handling strategies. For example, a Web browser might wish to transparently recover from a response where the Location header field doesn't parse according to the ABNF, whereas a systems control client might consider any form of error recovery to be dangerous.

So I agree that it is within Node.js's rights to reject invalid values. (My above post is wrong; I've edited it with a disclaimer at the top.) But I'd really appreciate some ability to be able to do browser-like header parsing in Node.js, so that I can interface with real-world web servers that work in all browsers today.

@bnoordhuis
Copy link
Member

Untested but perhaps https://github.com/creationix/http-parser-js works for you? It's more lenient than node's parser.

@domenic
Copy link
Author

domenic commented Feb 15, 2016

@bnoordhuis that seems like it would be an option, if there were a way to scope it to a particular HTTP request instead of monkeypatching private APIs. Is that something Node.js could consider allowing, in order to provide a mitigation for those of us affected by this regression?

@domenic
Copy link
Author

domenic commented Feb 15, 2016

Even better of course would be a targeted fix that allows per-HTTP-request changes to the header validation policy, since then I wouldn't have to parse the entire request.

@bnoordhuis
Copy link
Member

I think an undocumented { parser: ... } option to http.request() is acceptable but it's TBD who is responsible for cleanup. The parser API is not stable either and I'm not really willing to commit to that.

Even better of course would be a targeted fix that allows per-HTTP-request changes to the header validation policy, since then I wouldn't have to parse the entire request.

That's not so easy without a major http_parser bump. I'm also a little worried about performance regressions; http_parser_execute() is agonizingly sensitive to register spillage.

@craig65535
Copy link

Could be fixed in #283?

domenic added a commit to jsdom/jsdom that referenced this issue Feb 16, 2016
This is due to a regression in Node's HTTP parser that makes it less capable than browsers: nodejs/http-parser#281. Closes #1380. We probably won't be able to fix this until Node either fixes the regression or makes HTTP header parsing customizable.
domenic added a commit to jsdom/jsdom that referenced this issue Feb 16, 2016
This is due to a regression in Node's HTTP parser that makes it less capable than browsers: nodejs/http-parser#281. Closes #1380. We probably won't be able to fix this until Node either fixes the regression or makes HTTP header parsing customizable.
domenic added a commit to jsdom/jsdom that referenced this issue Feb 16, 2016
This is due to a regression in Node's HTTP parser that makes it less capable than browsers: nodejs/http-parser#281. Closes #1380. We probably won't be able to fix this until Node either fixes the regression or makes HTTP header parsing customizable.
domenic added a commit to jsdom/jsdom that referenced this issue Feb 16, 2016
This is due to a regression in Node's HTTP parser that makes it less capable than browsers: nodejs/http-parser#281. Closes #1380. We probably won't be able to fix this until Node either fixes the regression or makes HTTP header parsing customizable.
lehni pushed a commit to lehni/jsdom that referenced this issue Mar 27, 2016
This is due to a regression in Node's HTTP parser that makes it less capable than browsers: nodejs/http-parser#281. Closes jsdom#1380. We probably won't be able to fix this until Node either fixes the regression or makes HTTP header parsing customizable.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants