-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect line splitting in HttpRequestParser #97
Comments
we had the same issue with |
problem with split lines and then decode is parser performance degradation. i tested that in early days of aiohttp development. @asvetlov @popravich @kxepal ideas? |
I've done few simple performance tests and here are the results: >>> raw = b'\r\n'.join([b'part1\x1c_part2\r\nline2\r\nline3'] * 50000)
>>> raw2 = b'\r\n'.join([b'some-not-very-short-header: and_its_verryyyyyy_looooooooong_value'+b'e'*100] * 10000)
>>> len(raw), len(raw2)
... (1399998, 1669998)
# short lines
>>> %timeit raw.decode('ascii', 'surrogateescape').splitlines(True)
100 loops, best of 3: 14.5 ms per loop
>>> %timeit list(map(lambda b: b.decode('ascii', 'surogateescape'), raw.splitlines(1)))
10 loops, best of 3: 81.2 ms per loop
>>> %timeit next(map(lambda b: b.decode('ascii', 'surogateescape'), raw.splitlines(1)))
100 loops, best of 3: 7.68 ms per loop
>>> %timeit raw.decode('ascii', 'surogateescape').split('\r\n')
100 loops, best of 3: 11.3 ms per loop
# longer lines
>>> %timeit raw2.decode('ascii', 'surrogateescape').splitlines(True)
100 loops, best of 3: 2.68 ms per loop
>>> %timeit list(map(lambda b: b.decode('ascii', 'surogateescape'), raw2.splitlines(1)))
100 loops, best of 3: 7.97 ms per loop
>>> %timeit next(map(lambda b: b.decode('ascii', 'surogateescape'), raw2.splitlines(1)))
100 loops, best of 3: 2.22 ms per loop
>>> %timeit raw2.decode('ascii', 'surogateescape').split('\r\n')
100 loops, best of 3: 3.25 ms per loop So maybe it makes sense to use |
def test_next():
try:
while True:
next(map(...))
except StopIteration:
pass |
@fafhrd91 |
Sorry for confusion. I meant using iterator that But I think that splitting bytes and then decoding lines in place where its used might not hit performance too much. Any way I will do some tests. |
@popravich what about regex splitting? May be it'll both fix split logic and preserve line endings. Without performance penalty. |
how often headers encoding is broken?
|
It doesn't raise decode error, it just splits extra lines by try:
name, value = line.split(':', 1)
except ValueError:
raise ValueError('Invalid header: {}'.format(line)) from None Good idea but hard to implement it. PS. Exception happens for less than 0.1% of requests for me. |
ah! i think this is bug in .splitlines() |
@tumb1er could you tes fix @a6a179c5ad1e011a73610588de3046487244bed1 |
@asvetlov could you fill python bug report for .splitlines() |
@fafhrd91, I've tested, it works now. BTW, |
maybe:) |
@fafhrd91 I don't understand clean what exactly is wrong with .splitlines() ? |
.splitlines() treats this chars |
well, |
We are receiving some HTTP requests that cause an Invalid Header error in aiohttp.
That's because of an issue in splitlines in
aiohttp.procotol.HttpRequestParser
:For example, this code produces 4 lines instead of 3:
In my case it was invalid user-agent header for UCBrowser, but any thoughts how to fix it?
The text was updated successfully, but these errors were encountered: