-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad Request when sending utf-8 encoded http path under python3 #1577
Comments
Note that python2 is unaffected because bytes_to_str is a no-op in this case: python2:
python3:
|
Could we split before decoding? |
That would at least preserve the python2 behavior, i don't know if there's an encoding that might break if split on space, i guess not. |
I would think not or else the HTTP request line would be malformed, since the encoding there is pretty strict. |
Thanks for raising this issue. Would you be willing to make a PR for it? |
mmm shouldn't the start line be encoded in us-ascii though ? |
This section describe exactly the problem gunicorn is having in this case:
In any case the parsing of the request line should be done on bytes. It's debatable if the request line should ascii-only. The spec is a bit unclear. However there are clients in the wild that just send those request lines and I'd rather handle them in the application which can do further processing to handle them properly. |
I made a pull request with a test case which failed under py3 and now passes. I'm not sure about bytes_to_str() when raising InvalidRequestLine. I fear that having byte strings as message is not expected hence the conversion. |
http/message: Split request line as bytes to avoid splitting on 0x0A. Fixes #1577
http/message: Split request line as bytes to avoid splitting on 0x0A. Fixes benoitc#1577
This is the sent data:
This is the response
This is because the request line is first decoded as latin1 using the _compat:bytes_to_str this causes the "à" to be returned as "\xc3\xa0", then the request line is split using line.split(None, 2) which will consider the \xa0 (non breaking space) as whitespace and strip it, thus rendering the request line invalid.
A first attempt would be to use line.split(' ', 2) but then the split will no longer eat up all consecutive whitespaces and may introduce other bugs.
I'm not sure what would be the best solution here.
The text was updated successfully, but these errors were encountered: