Can no longer parse header "X-Custom-Header-Bytes: …" despite browsers doing this fine #281

domenic · 2016-02-14T22:16:22Z

This is causing jsdom to fail several web-platform-tests where it is tested that its behavior emulates XMLHttpRequest (jsdom/jsdom#1380). I believe this is a regression as of Node.js 5.6.0.

The test is a combination of https://github.com/w3c/web-platform-tests/blob/master/XMLHttpRequest/resources/headers.py and https://github.com/w3c/web-platform-tests/blob/master/XMLHttpRequest/getresponseheader-special-characters.htm where it tests that the server setting

X-Custom-Header-Bytes: …

results in the client receiving "\xE2\x80\xA6" (i.e. "â�¦").

We would like to be able to achieve this behavior in jsdom just like browsers are specified to do.

/cc @annevk for the standards-based perspective. I imagine it's something like "HTTP underspecifies interoperable behavior" but couldn't find anything in Fetch that specifically addresses the issue of what to do when headers are not valid according to HTTP's field-content production. Maybe the test is wrong?

The text was updated successfully, but these errors were encountered:

indutny · 2016-02-14T22:30:12Z

Quote from RFC 7230:

field-value    = *( field-content / obs-fold )
field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar    = VCHAR / obs-text
obs-text       = %x80-FF
VCHAR (any visible [USASCII] character)

Thus only ASCII characters are allowed in header values. We were more permissive in the past, but it turns out that this could be used as an attack vector against intermediate proxies.

IMO, this is won't fix, since it is implemented according to spec.

Thanks for asking, though!

indutny · 2016-02-14T22:31:41Z

More from the spec:

A recipient MUST parse an HTTP message as a sequence of octets in an
   encoding that is a superset of US-ASCII [USASCII].  Parsing an HTTP
   message as a stream of Unicode characters, without regard for the
   specific encoding, creates security vulnerabilities due to the
   varying ways that string processing libraries handle invalid
   multibyte character sequences that contain the octet LF (%x0A).

domenic · 2016-02-14T22:32:55Z

I think we should endeavor to be as permissive as browsers. I don't suggest parsing it as a sequence of Unicode characters, but we should at least be interoperable with the most common HTTP processing libraries on the planet. Otherwise, people can't write software (like jsdom) which has the same capabilities as browsers.

domenic · 2016-02-14T22:34:06Z

At the very least, I need some sort of option. Is there a way to plug in browser-like header parsing to Node.js, without reimplementing all of require("http")?

domenic · 2016-02-14T22:38:25Z

It's also not clear to me that the section you quoted supports erroring. All it says is that you should process the byte sequence 0xE2 0x80 0xA6 as a sequence of octets in a superset of US-ASCII. The correct way to do that is to treat it as three single characters, as the test requires.

So I think the current behavior of erroring is wrong per spec, and the spec actually is fairly clear about what should happen here. I agree the field-value production does not allow this, but that's the section for HTTP server writers to make sure they write compliant servers. Your second quote is about HTTP response processors, who MUST accept any value and process it as a superset of US-ASCII.

indutny · 2016-02-14T22:38:27Z

@domenic idk, maybe @mscdex or @bnoordhuis has an opinion on this.

I wonder if it should be fixed in browsers instead? It is really strange that browsers implement behavior that is strictly prohibited by HTTP spec.

indutny · 2016-02-14T22:39:29Z

@domenic yeah, I agree, superset quote was kind of off-topic. Anyway, field-value is strict enough to be a reason for fatal error and connection termination.

domenic · 2016-02-14T22:40:28Z

(EDIT: upon further reading of the error handling section of the RFC, this post is mostly incorrect.)

Mid-air comment collision, but to reiterate my main point:

It is really strange that browsers implement behavior that is strictly prohibited by HTTP spec.

The HTTP spec is pretty clear that clients are required to implement this behavior. http-parser's current behavior is not spec-compliant, and in fact violates the MUST in

A recipient MUST parse an HTTP message as a sequence of octets in an
encoding that is a superset of US-ASCII [USASCII].

Anyway, field-value is strict enough to be a reason for fatal error and connection termination.

No, the HTTP RFC specifically disallows this kind of behavior. A server must only emit valid field-value values. A client MUST process everything as a byte sequence in a superset of US-ASCII.

indutny · 2016-02-14T22:40:32Z

@domenic one more question, could you please quote the part of spec that says:

...that's the section for HTTP server writers to make sure they write compliant servers

I didn't really see anything like this there, but maybe I am missing something.

domenic · 2016-02-14T22:47:52Z

In https://tools.ietf.org/html/rfc7230#section-2.5,

A sender MUST NOT generate protocol elements that do not match the grammar defined by the corresponding ABNF rules.

In contrast, the only applicability of ABNF to the receiver is

When a received protocol element is parsed, the recipient MUST be able to parse any value of reasonable length that is applicable to the recipient's role and that matches the grammar defined by the corresponding ABNF rules.

(i.e. you MUST be able to parse valid header values, but nothing is said about invalid header values.)

Later:

HTTP does not define specific error handling mechanisms except when they have a direct impact on security, since different applications of the protocol require different error handling strategies. For example, a Web browser might wish to transparently recover from a response where the Location header field doesn't parse according to the ABNF, whereas a systems control client might consider any form of error recovery to be dangerous.

So I agree that it is within Node.js's rights to reject invalid values. (My above post is wrong; I've edited it with a disclaimer at the top.) But I'd really appreciate some ability to be able to do browser-like header parsing in Node.js, so that I can interface with real-world web servers that work in all browsers today.

bnoordhuis · 2016-02-15T08:23:42Z

Untested but perhaps https://github.com/creationix/http-parser-js works for you? It's more lenient than node's parser.

domenic · 2016-02-15T17:43:10Z

@bnoordhuis that seems like it would be an option, if there were a way to scope it to a particular HTTP request instead of monkeypatching private APIs. Is that something Node.js could consider allowing, in order to provide a mitigation for those of us affected by this regression?

domenic · 2016-02-15T17:43:47Z

Even better of course would be a targeted fix that allows per-HTTP-request changes to the header validation policy, since then I wouldn't have to parse the entire request.

bnoordhuis · 2016-02-15T18:02:45Z

I think an undocumented { parser: ... } option to http.request() is acceptable but it's TBD who is responsible for cleanup. The parser API is not stable either and I'm not really willing to commit to that.

Even better of course would be a targeted fix that allows per-HTTP-request changes to the header validation policy, since then I wouldn't have to parse the entire request.

That's not so easy without a major http_parser bump. I'm also a little worried about performance regressions; http_parser_execute() is agonizingly sensitive to register spillage.

craig65535 · 2016-02-15T21:50:10Z

Could be fixed in #283?

This is due to a regression in Node's HTTP parser that makes it less capable than browsers: nodejs/http-parser#281. Closes #1380. We probably won't be able to fix this until Node either fixes the regression or makes HTTP header parsing customizable.

This is due to a regression in Node's HTTP parser that makes it less capable than browsers: nodejs/http-parser#281. Closes jsdom#1380. We probably won't be able to fix this until Node either fixes the regression or makes HTTP header parsing customizable.

indutny closed this as completed Feb 14, 2016

This was referenced Feb 14, 2016

"A value is a byte sequence that matches the field-content token production." whatwg/fetch#213

Closed

Web platform tests for XHR have started failing jsdom/jsdom#1380

Closed

domenic mentioned this issue Feb 16, 2016

Disable tests that now fail on Node 5.6.0 jsdom/jsdom#1384

Merged

nicolashenry mentioned this issue Mar 3, 2016

add initial cors support && add strictSSL option && various fixes jsdom/jsdom#1410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can no longer parse header "X-Custom-Header-Bytes: …" despite browsers doing this fine #281

Can no longer parse header "X-Custom-Header-Bytes: …" despite browsers doing this fine #281

domenic commented Feb 14, 2016

indutny commented Feb 14, 2016

indutny commented Feb 14, 2016

domenic commented Feb 14, 2016

domenic commented Feb 14, 2016

domenic commented Feb 14, 2016

indutny commented Feb 14, 2016

indutny commented Feb 14, 2016

domenic commented Feb 14, 2016

indutny commented Feb 14, 2016

domenic commented Feb 14, 2016

bnoordhuis commented Feb 15, 2016

domenic commented Feb 15, 2016

domenic commented Feb 15, 2016

bnoordhuis commented Feb 15, 2016

craig65535 commented Feb 15, 2016

Can no longer parse header "X-Custom-Header-Bytes: …" despite browsers doing this fine #281

Can no longer parse header "X-Custom-Header-Bytes: …" despite browsers doing this fine #281

Comments

domenic commented Feb 14, 2016

indutny commented Feb 14, 2016

indutny commented Feb 14, 2016

domenic commented Feb 14, 2016

domenic commented Feb 14, 2016

domenic commented Feb 14, 2016

indutny commented Feb 14, 2016

indutny commented Feb 14, 2016

domenic commented Feb 14, 2016

indutny commented Feb 14, 2016

domenic commented Feb 14, 2016

bnoordhuis commented Feb 15, 2016

domenic commented Feb 15, 2016

domenic commented Feb 15, 2016

bnoordhuis commented Feb 15, 2016

craig65535 commented Feb 15, 2016