-
-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another microbenchmark compared to Ruby #106
Comments
Interesting, I will look into this. |
This line appears to be the offender: Line 360 in 7bcb8fb
Flamegraph (from an unoptimized build): https://sfackler.com/static/flamegraph.svg |
I believe this commit should fix the issue: 9cd413d I'm seeing ~.66 second runtimes after that change:
Thanks for pointing this out! What a weird place to find such a huge performance issue. I've published a new release to crates.io with this fix. I'll leave this issue open until you can confirm that everything looks good on your end as well. |
I’ve tested on a different machine, the Ruby version runs for 0.82s (avg) and the updated Rust version runs for 0.43s. This is awesome, thank you for the fix. I look forward to using rust-postgres in production. |
Perfect! |
with_end_to_cap is enormously expensive now that it's initializing memory since it involves 64k allocation + memset on every call. This is most noticable when calling read_to_end on very small readers, where the new version if **4 orders of magnitude** faster. BufReader also depended on with_end_to_cap so I've rewritten it in its original form. As a bonus, converted the buffered IO struct Debug impls to use the debug builders. I first came across this in sfackler/rust-postgres#106 where a user reported a 10x performance regression. A call to read_to_end turned out to be the culprit: sfackler/rust-postgres@9cd413d. The new version differs from the old in a couple of ways. The buffer size used is now adaptive. It starts at 32 bytes and doubles each time EOF hasn't been reached up to a limit of 64k. In addition, the buffer is only truncated when EOF or an error has been reached, rather than after every call to read as was the case for the old implementation. I wrote up a benchmark to compare the old version and new version: https://gist.github.com/sfackler/e979711b0ee2f2063462 It tests a couple of different cases: a high bandwidth reader, a low bandwidth reader, and a low bandwidth reader that won't return more than 10k per call to `read`. The high bandwidth reader should be analagous to use cases when reading from e.g. a `BufReader` or `Vec`, and the low bandwidth readers should be analogous to reading from something like a `TcpStream`. Of special note, reads from a high bandwith reader containing 4 bytes are now *4,495 times faster*. ``` ~/foo ❯ cargo bench Compiling foo v0.0.1 (file:///home/sfackler/foo) Running target/release/foo-7498d7dd7faecf5c running 13 tests test test_new ... ignored test new_delay_4 ... bench: 230768 ns/iter (+/- 14812) test new_delay_4_cap ... bench: 231421 ns/iter (+/- 7211) test new_delay_5m ... bench: 14495370 ns/iter (+/- 4008648) test new_delay_5m_cap ... bench: 73127954 ns/iter (+/- 59908587) test new_nodelay_4 ... bench: 83 ns/iter (+/- 2) test new_nodelay_5m ... bench: 12527237 ns/iter (+/- 335243) test std_delay_4 ... bench: 373095 ns/iter (+/- 12613) test std_delay_4_cap ... bench: 374190 ns/iter (+/- 19611) test std_delay_5m ... bench: 17356012 ns/iter (+/- 15906588) test std_delay_5m_cap ... bench: 883555035 ns/iter (+/- 205559857) test std_nodelay_4 ... bench: 144937 ns/iter (+/- 2448) test std_nodelay_5m ... bench: 16095893 ns/iter (+/- 3315116) test result: ok. 0 passed; 0 failed; 1 ignored; 12 measured ``` r? @alexcrichton
A couple months ago I had a benchmark which compared the Ruby pg gem’s, the Go library’s and the rust-postgres crate’s insert performance. The speed was Rust < Go < Ruby, probably because the well-optimized Ruby gem written in C. Now this benchmark runs with almost the same speed as the Ruby version and it runs faster than the Go one.
However, I also had a select benchmark with rust-postgres 0.4. It used to run slightly slower than the Ruby one. (I still have the old binary and I could confirm this). However, the recent version is magnitudes slower. Since I am a beginner with Rust I can’t tell whether this is an upstream issue or not.
The database has ten thousands of records.
$ time cargo run --release ... Done: 10000 Cycle: 9 real 0m21.885s user 0m18.290s sys 0m3.544s
Of course the compile time is not counted, I have compiled it already.
The Ruby version:
$ time ruby postgres2.rb Result: 9999 Result: 10000 Done: 9 real 0m1.232s user 0m0.783s sys 0m0.349s
As for the Rust version, displaying the results from 1 to 10000 is superfast. There is, however, a couple of seconds spent between each cycle.
The old binary (which used rust-postgres version 0.4) does’t have this pause between the cycles. (Note that the old binary uses unix_socket but I don’t think that’s the reason).
This is the old version (compiled on Dec 29, 2014).
$ time ./target/postgres2 Done: 10000 Cycle: 100 real 0m15.154s user 0m13.401s sys 0m1.489s
The cycles is set to 100, so it would run for 1.5s with 10 cycles.
The text was updated successfully, but these errors were encountered: