httparse should skip invalid headers #61

Coder-256 · 2019-09-10T02:44:38Z

I am trying to use reqwest to parse a response from a server I don't control. reqwest uses hyper which uses httparse for parsing HTTP/1.x headers. Anyway, this server has a weird bug where it consistently returns a single corrupted header line in an otherwise completely valid response (the header contains unescaped non-token characters). Specifically, for some reason it tries to send the DOCTYPE as a header. The bug is unlikely to be fixed (this is old software), but it isn't really a problem because the page displays fine in all major browsers.

It seems that all major browsers simply ignore invalid header lines. However, httparse returns an error that aborts the entire parsing process. IMO this is a problem and should be fixed.

Here's a screenshot from Chrome that shows the invalid header being ignored:

In fact, Chrome's behavior is commented as: "skip malformed header".

Although technically changing this could be breaking, in this case, I can't imagine that any code would rely on response parsing to fail in this particular case.

Here are the relevant lines:

httparse/src/lib.rs

Lines 594 to 595 in 6f696f5

    
           } else if !is_header_name_token(b) { 
        
               return Err(Error::HeaderName);

httparse/src/lib.rs

Lines 613 to 614 in 6f696f5

    
           } else if !is_header_name_token(b) { 
        
               return Err(Error::HeaderName);

httparse/src/lib.rs

Line 672 in 6f696f5

expect!(bytes.next() == b'\n' => Err(Error::HeaderValue));

httparse/src/lib.rs

Lines 613 to 614 in 6f696f5

    
           } else if !is_header_name_token(b) { 
        
               return Err(Error::HeaderName);

I think all of these would be fixed by consuming b until the next newline then continue 'headers. I would open a PR but I just want to check that you agree that this change should be made.

The text was updated successfully, but these errors were encountered:

seanmonstar · 2019-09-10T17:24:17Z

Hm, I'm surprised at the results, but then again, I shouldn't be, the HTTP1 spec has been ignored at every turn :D

Curious, do you know what other "libraries" do, like curl or golang?

Coder-256 · 2019-09-10T19:32:00Z

I haven't tested this, but it looks like golang errors on headers missing a colon but otherwise just parses it as a valid header

I also have confirmed that curl actually parses it as a valid header:

$ curl "$URL" -s -o /dev/null -D -
HTTP/1.1 200 OK
Date: Tue, 10 Sep 2019 18:50:02 GMT
Server: Apache/2.4.xx (Debian)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http: //www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Transfer-Encoding: chunked

I took a quick look but its code is very dense. The header parsing logic is around Curl_http_readwrite_headers if you want to look further. It should be noted that curl doesn't actually do anything with headers except for important ones (ex. Transfer-Encoding). In fact AFAIK the most it will do in terms of parsing arbitrary headers is give you the entire message header with -D like above.

It was a good idea to check these libraries, I didn't realize that they would handle things differently. I guess there are 3 behaviors to choose from:

Ignore invalid headers (like Chrome, Firefox, etc.)
Pass on invalid headers (like curl and golang)
Error on invalid headers (like httparse)

I'll need to think a bit more about which would be better. What do you think?

Edit: After thinking about it a bit more, I think that especially since this project is used by apps like servo, it should definitely mimic Chrome/Firefox. This is a real example of a page that wouldn't load even though it works in major browsers.

Coder-256 · 2019-10-09T14:43:35Z

@seanmonstar Have you had a chance to look into this? I'll start working on a pull request, I just wanted to check if you agree that this is the correct behavior.

seanmonstar · 2019-10-10T16:04:16Z

Oh my, I completely forgot about this issue, sorry!

I suppose it makes sense to do what browsers do. I have a nagging worry that if ignore invalid headers, is there a point where we're parsing things that really aren't HTTP and saying it is?

Coder-256 · 2019-10-10T19:19:00Z

Well if the status line is intact, including HTTP version and all, I think it's definitely supposed to be HTTP. The headers are only parsed if the status line is valid.

I'll start making the PR.

nox · 2021-04-03T09:19:22Z

FWIW this behaviour was completely killed from Squid last year.

squid-cache/squid@b453677

nox · 2022-04-04T10:31:55Z

@Coder-256 Did you ever make a patch for this?

Found another website affected by this.

curl -I https://dbegoals.azdot.gov/

00000000  58 e2 80 90 46 72 61 6d  65 e2 80 90 4f 70 74 69  |X...Frame...Opti|
00000010  6f 6e 73 3a 20 44 45 4e  59                       |ons: DENY|

The hyphens aren't ASCII in the first X-Frame-Options entry.

Coder-256 · 2022-04-04T14:27:12Z

@nox I completely forgot! Also incredible catch, U+2010 strikes again. Anyway, after looking at this further, I am convinced that simply skipping invalid headers is the right option. In this case, Chromium and Firefox both ignore the invalid header while it looks like Safari actually treats the header name as a valid Latin 1-encoded header for whatever reason; none of them fail parsing entirely, as httparse does. I'll open a PR.

…r#83)

nox · 2022-04-22T11:12:37Z

#114 will let you ignore the invalid header.

Coder-256 · 2022-04-25T01:44:22Z

@nox Thank you very much, lgtm! Sorry I didn’t get to this sooner.

nox · 2022-04-25T08:05:28Z

@Coder-256 No worries.

…r#83)

nox added a commit to nox/httparse that referenced this issue Apr 22, 2022

Allow ignoring invalid header lines (fixes seanmonstar#61, seanmonsta…

96cda55

…r#83)

nox added a commit to nox/httparse that referenced this issue Apr 22, 2022

Allow ignoring invalid header lines (fixes seanmonstar#61, seanmonsta…

7244d92

…r#83)

nox added a commit to nox/httparse that referenced this issue Apr 26, 2022

Allow ignoring invalid header lines (fixes seanmonstar#61, seanmonsta…

cbc6f1f

…r#83)

nox added a commit to nox/httparse that referenced this issue Apr 26, 2022

Allow ignoring invalid header lines (fixes seanmonstar#61, seanmonsta…

67a2898

…r#83)

nox added a commit to nox/httparse that referenced this issue Apr 27, 2022

Allow ignoring invalid header lines (fixes seanmonstar#61, seanmonsta…

81cf158

…r#83)

nox added a commit to nox/httparse that referenced this issue Apr 27, 2022

Allow ignoring invalid header lines (fixes seanmonstar#61, seanmonsta…

e19ef40

…r#83)

nox added a commit to nox/httparse that referenced this issue Apr 27, 2022

Allow ignoring invalid header lines (fixes seanmonstar#61, seanmonsta…

a3bd3ae

…r#83)

nox added a commit to nox/httparse that referenced this issue May 6, 2022

Allow ignoring invalid header lines (fixes seanmonstar#61, seanmonsta…

413475b

…r#83)

nox added a commit to nox/httparse that referenced this issue May 6, 2022

Allow ignoring invalid header lines (fixes seanmonstar#61, seanmonsta…

808963b

…r#83)

seanmonstar closed this as completed in 0e7d95c May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

httparse should skip invalid headers #61

httparse should skip invalid headers #61

Coder-256 commented Sep 10, 2019

seanmonstar commented Sep 10, 2019

Coder-256 commented Sep 10, 2019 •

edited

Loading

Coder-256 commented Oct 9, 2019 •

edited

Loading

seanmonstar commented Oct 10, 2019

Coder-256 commented Oct 10, 2019

nox commented Apr 3, 2021

nox commented Apr 4, 2022

Coder-256 commented Apr 4, 2022

nox commented Apr 22, 2022

Coder-256 commented Apr 25, 2022

nox commented Apr 25, 2022

httparse should skip invalid headers #61

httparse should skip invalid headers #61

Comments

Coder-256 commented Sep 10, 2019

seanmonstar commented Sep 10, 2019

Coder-256 commented Sep 10, 2019 • edited Loading

Coder-256 commented Oct 9, 2019 • edited Loading

seanmonstar commented Oct 10, 2019

Coder-256 commented Oct 10, 2019

nox commented Apr 3, 2021

nox commented Apr 4, 2022

Coder-256 commented Apr 4, 2022

nox commented Apr 22, 2022

Coder-256 commented Apr 25, 2022

nox commented Apr 25, 2022

Coder-256 commented Sep 10, 2019 •

edited

Loading

Coder-256 commented Oct 9, 2019 •

edited

Loading