-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmarking script to time the different asyncio write options #2179
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2179 +/- ##
=======================================
Coverage 97.12% 97.12%
=======================================
Files 39 39
Lines 7884 7884
Branches 1366 1366
=======================================
Hits 7657 7657
Misses 101 101
Partials 126 126 Continue to review full report at Codecov.
|
That is awesome! we can optimize writer according this benches. Btw I have similar result for async-tokio and one of the reason why uvloop faster. |
At lastly, the "10 chunks" lines are just to see how fast performance is degrading. If I understand things correctly, the http_writer.PayloadWriter class only does buffering until the transport is set. So, keeping aside the headers chunk, we only get multiple chunks here when http pipelining is happening and the user application does a lot a separate write in the handler. So imo, the only use case that really matters is 2 chunks, one for the headers, and one for the body. @fafhrd91 <https://github.com/fafhrd91> can you confirm this?
Yes, that is right. And that is only case when we make decision. I’d simplify this but that should happen on web response level.
Also consider #2109, it can reduce StreamWriter and PayloadWriter implementations.
Subsequent writes do not really matter because developer can optimize writes.
|
Interesting. So, ideally, we can choose the best algorithm depending on data size to avoid unwanted overhead, right? |
In theory, probably. But in practice, imo, it would make the code way to complicated for almost no improvements. The use case are as follow:
use case 3 being probably way more rare that the other two. So [everywhere in this I am using "body" to really mean "first chunk of the body written by the user", but I think in most cases it's actually the only chunk so the difference does not matter]:
Option 1 is the current implementation ; option 2 is #2126 ; option 3 could be added to #2126 if you think it's a necessary optimisation. |
Worth to have it, the benchmark is still relevant |
As discussed in #2126, I tried putting together a small benchmarking script to test the different write options in
http_writer
and their impacts on different workloads.The script is in the PR (just a convenient way to link the file, i'm not sure at all it should be merged)
My results are bellow, I'm running a
_UnixSelectorEventLoop
: it would be interesting to get results from other configs.The use cases I used are:
The really useful numbers here are the 2 chunks: this is what is happening when you have a server writing the headers first, then the body in one go.
The current implementation is waiting to join the headers with the first chunk of the body, so it correspond to the
b''.join
lines.With #2126, the headers would be send in a separate
Transport.write
call, so it corresponds to themultiple writes
lines.The
bytearray
lines are just for comparaison, trying to see where it would be preferable to b''.joinAt lastly, the "10 chunks" lines are just to see how fast performance is degrading. If I understand things correctly, the
http_writer.PayloadWriter
class only does buffering until the transport is set. So, keeping aside the headers chunk, we only get multiple chunks here when http pipelining is happening and the user application does a lot a separatewrite
in the handler. So imo, the only use case that really matters is 2 chunks, one for the headers, and one for the body. @fafhrd91 can you confirm this?Here what i take from those results:
multiple write
option is of course faster, since it is avoiding a ~large copy.