-
-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanup: remove Bytes8 abstraction #120
Conversation
Also improves performance, mainly `parse_version` ~10% improvement on req_short: ``` req_short/req_short time: [29.214 ns 29.296 ns 29.391 ns] thrpt: [2.1548 GiB/s 2.1617 GiB/s 2.1678 GiB/s] change: time: [-10.968% -9.6366% -7.9937%] (p = 0.00 < 0.05) thrpt: [+8.6882% +10.664% +12.319%] Performance has improved. ```
Thanks for the PR! That's cool to see such an improvement! I wonder what it is, is it removing the increment on each access? I intuitively would have assumed this would be slower in the longer requests, since it's copying a lot of the bytes into a stack array... (I also notice that using const generics increases the MSRV significantly.) |
I'll tweak it to avoid bumping MSRV (const generics aren't really core). Regarding performance, the previous code probably had ~7 instructions per byte (bounds checking, modifying The current code definitely isn't optimal, the PR started as a "drive by" cleanup, I'll possibly do a performance centric follow up PR |
To avoid bumping MSRV
I briefly played around with other perf improvements, and have so far achieved a ~20-25% improvement over
(using I have a few other ideas for further improvements, happy to throw up PRs if you're interested |
Improved throughput by +50-60% relative to
Don't have too much time today to explore further, but a 50% reduction in time and thus a 2x in throughput might be possible |
This is phenomenal work, thank you! |
@seanmonstar we'd love to try this change in our new server that uses |
Yep! |
@bartlomieju This doesn't include all the improvements mentioned in the latest comment, those were additional improvements I've implemented locally. This PR mainly optimizes I've tested further changes that allow me to achieve over 5 GiB/s and roughly 2x faster than the previous |
Ah makes sense, @seanmonstar I'm fine waiting another few days for more changes like this! |
Though, it doesn't cost much to publish a release. I can publish one now, and merge more improvements later. Or if you nearly have them ready, I can hold off, whichever you prefer. |
@seanmonstar thanks, if that's not a big deal then I'd kindly ask to release now :) |
Also improves performance, mainly
parse_version
, we observe a ~10% improvement onreq_short
: