Bad Request when sending utf-8 encoded http path under python3 #1577

rslinckx · 2017-08-24T16:02:14Z

This is the sent data:

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('127.0.0.1', 8080))
print(s.send('GET /à%20k HTTP/1.0\r\n\r\n'.encode('utf-8')))
print(s.recv(100000).decode('ascii'))

This is the response

HTTP/1.1 400 Bad Request
Connection: close
Content-Type: text/html
Content-Length: 181

<html>
  <head>
    <title>Bad Request</title>
  </head>
  <body>
    <h1><p>Bad Request</p></h1>
    Invalid HTTP Version 'Invalid HTTP Version: '%20k HTTP/1.0''
  </body>
</html>

This is because the request line is first decoded as latin1 using the _compat:bytes_to_str this causes the "à" to be returned as "\xc3\xa0", then the request line is split using line.split(None, 2) which will consider the \xa0 (non breaking space) as whitespace and strip it, thus rendering the request line invalid.

A first attempt would be to use line.split(' ', 2) but then the split will no longer eat up all consecutive whitespaces and may introduce other bugs.

I'm not sure what would be the best solution here.

The text was updated successfully, but these errors were encountered:

rslinckx · 2017-08-24T16:05:06Z

Note that python2 is unaffected because bytes_to_str is a no-op in this case:

python2:

>>> 'GET /\xc3\xa0%20 HTTP'.split(None, 2)
['GET', '/\xc3\xa0%20', 'HTTP']

python3:

>>> b'GET /\xc3\xa0%20 HTTP'.decode('latin1').split(None, 2)
['GET', '/Ã', '%20 HTTP']

tilgovi · 2017-08-24T18:35:17Z

Could we split before decoding? [part.decode('latin1') for part in b'GET /\xc3\xa0%20 HTTP'.split(None, 2)]?

rslinckx · 2017-08-24T19:58:40Z

That would at least preserve the python2 behavior, i don't know if there's an encoding that might break if split on space, i guess not.

tilgovi · 2017-08-24T20:54:44Z

I would think not or else the HTTP request line would be malformed, since the encoding there is pretty strict.

tilgovi · 2017-08-24T20:55:01Z

Thanks for raising this issue. Would you be willing to make a PR for it?

benoitc · 2017-08-24T21:03:08Z

mmm shouldn't the start line be encoded in us-ascii though ?

http://httpwg.org/specs/rfc7230.html#rfc.section.3

rslinckx · 2017-08-25T07:47:07Z

This section describe exactly the problem gunicorn is having in this case:

A recipient MUST parse an HTTP message as a sequence of octets in an encoding that is a superset of US-ASCII [USASCII]. Parsing an HTTP message as a stream of Unicode characters, without regard for the specific encoding, creates security vulnerabilities due to the varying ways that string processing libraries handle invalid multibyte character sequences that contain the octet LF (%x0A). String-based parsers can only be safely used within protocol elements after the element has been extracted from the message, such as within a header field-value after message parsing has delineated the individual fields.

In any case the parsing of the request line should be done on bytes.

It's debatable if the request line should ascii-only. The spec is a bit unclear. However there are clients in the wild that just send those request lines and I'd rather handle them in the application which can do further processing to handle them properly.

rslinckx · 2017-08-25T08:52:28Z

I made a pull request with a test case which failed under py3 and now passes.

I'm not sure about bytes_to_str() when raising InvalidRequestLine. I fear that having byte strings as message is not expected hence the conversion.

http/message: Split request line as bytes to avoid splitting on 0x0A. Fixes #1577

…ixes benoitc#1577

http/message: Split request line as bytes to avoid splitting on 0x0A. Fixes benoitc#1577

tilgovi added - Bugs - Feature/Http labels Aug 24, 2017

benoitc closed this as completed in 15e901a Sep 2, 2017

benoitc added a commit that referenced this issue Sep 2, 2017

Merge pull request #1578 from rslinckx/master

c171c15

http/message: Split request line as bytes to avoid splitting on 0x0A. Fixes #1577

mjjbell pushed a commit to mjjbell/gunicorn that referenced this issue Mar 16, 2018

http/message: Split request line as bytes to avoid splitting on 0x0A. F…

e9810c8

…ixes benoitc#1577

mjjbell pushed a commit to mjjbell/gunicorn that referenced this issue Mar 16, 2018

Merge pull request benoitc#1578 from rslinckx/master

c8f7233

http/message: Split request line as bytes to avoid splitting on 0x0A. Fixes benoitc#1577

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad Request when sending utf-8 encoded http path under python3 #1577

Bad Request when sending utf-8 encoded http path under python3 #1577

rslinckx commented Aug 24, 2017

rslinckx commented Aug 24, 2017

tilgovi commented Aug 24, 2017

rslinckx commented Aug 24, 2017

tilgovi commented Aug 24, 2017

tilgovi commented Aug 24, 2017

benoitc commented Aug 24, 2017

rslinckx commented Aug 25, 2017

rslinckx commented Aug 25, 2017

Bad Request when sending utf-8 encoded http path under python3 #1577

Bad Request when sending utf-8 encoded http path under python3 #1577

Comments

rslinckx commented Aug 24, 2017

rslinckx commented Aug 24, 2017

tilgovi commented Aug 24, 2017

rslinckx commented Aug 24, 2017

tilgovi commented Aug 24, 2017

tilgovi commented Aug 24, 2017

benoitc commented Aug 24, 2017

rslinckx commented Aug 25, 2017

rslinckx commented Aug 25, 2017