Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another microbenchmark compared to Ruby #106

Closed
PeterMozesMerl opened this issue Mar 26, 2015 · 5 comments
Closed

Another microbenchmark compared to Ruby #106

PeterMozesMerl opened this issue Mar 26, 2015 · 5 comments

Comments

@PeterMozesMerl
Copy link

A couple months ago I had a benchmark which compared the Ruby pg gem’s, the Go library’s and the rust-postgres crate’s insert performance. The speed was Rust < Go < Ruby, probably because the well-optimized Ruby gem written in C. Now this benchmark runs with almost the same speed as the Ruby version and it runs faster than the Go one.

However, I also had a select benchmark with rust-postgres 0.4. It used to run slightly slower than the Ruby one. (I still have the old binary and I could confirm this). However, the recent version is magnitudes slower. Since I am a beginner with Rust I can’t tell whether this is an upstream issue or not.

The database has ten thousands of records.

extern crate postgres;

use postgres::{Connection, SslMode};

fn main() {
    println!("Hello, world!");

    // I could not figure out how to use the unix_socket with v0.7 (and did’t try hard)
    let conn = Connection::connect("postgres://mage@localhost/xxx", &SslMode::None).unwrap();

    let cycles = 10;

    let query = "select id from users order by id limit 10000";
    let stmt = conn.prepare(query).unwrap();

    let mut bar: i32;

    for cycle in 0..cycles {
        for row in stmt.query(&[]).unwrap() {
            bar = row.get(0);
            println!("Done: {}", bar);
        }
        println!("Cycle: {}", cycle);
    }
}
$ time cargo run --release
...
Done: 10000
Cycle: 9

real    0m21.885s
user    0m18.290s
sys 0m3.544s

Of course the compile time is not counted, I have compiled it already.

The Ruby version:

require 'pg'

conn = PG.connect(dbname: 'xxx', user: 'mage', host: 'localhost')
stmt = conn.prepare('hoi', "select id from users order by id limit 10000") 

0.upto(9) do |cycle|
    results = conn.exec_prepared("hoi")
    results.each do |row|
        puts "Result: #{row['id']}"
    end
    puts "Done: #{cycle}"
end
$ time ruby postgres2.rb
Result: 9999
Result: 10000
Done: 9

real    0m1.232s
user    0m0.783s
sys 0m0.349s

As for the Rust version, displaying the results from 1 to 10000 is superfast. There is, however, a couple of seconds spent between each cycle.

The old binary (which used rust-postgres version 0.4) does’t have this pause between the cycles. (Note that the old binary uses unix_socket but I don’t think that’s the reason).

$ rustc --version
rustc 1.0.0-nightly (27901849e 2015-03-25) (built 2015-03-26)

This is the old version (compiled on Dec 29, 2014).

$ time ./target/postgres2
Done: 10000
Cycle: 100

real    0m15.154s
user    0m13.401s
sys 0m1.489s

The cycles is set to 100, so it would run for 1.5s with 10 cycles.

@sfackler
Copy link
Owner

Interesting, I will look into this.

@sfackler
Copy link
Owner

This line appears to be the offender:

try!(buf.take(len as u64).read_to_end(&mut data));

Flamegraph (from an unoptimized build): https://sfackler.com/static/flamegraph.svg

@sfackler
Copy link
Owner

I believe this commit should fix the issue: 9cd413d

I'm seeing ~.66 second runtimes after that change:

Done: 10000
Cycle: 9
0.03user 0.34system 0:00.66elapsed 57%CPU (0avgtext+0avgdata 7824maxresident)k
0inputs+0outputs (0major+2621minor)pagefaults 0swaps

Thanks for pointing this out! What a weird place to find such a huge performance issue. I've published a new release to crates.io with this fix. I'll leave this issue open until you can confirm that everything looks good on your end as well.

@PeterMozesMerl
Copy link
Author

I’ve tested on a different machine, the Ruby version runs for 0.82s (avg) and the updated Rust version runs for 0.43s. This is awesome, thank you for the fix. I look forward to using rust-postgres in production.

@sfackler
Copy link
Owner

Perfect!

bors added a commit to rust-lang/rust that referenced this issue Mar 29, 2015
with_end_to_cap is enormously expensive now that it's initializing
memory since it involves 64k allocation + memset on every call. This is
most noticable when calling read_to_end on very small readers, where the
new version if **4 orders of magnitude** faster.

BufReader also depended on with_end_to_cap so I've rewritten it in its
original form.

As a bonus, converted the buffered IO struct Debug impls to use the
debug builders.

I first came across this in sfackler/rust-postgres#106 where a user reported a 10x performance regression. A call to read_to_end turned out to be the culprit: sfackler/rust-postgres@9cd413d.

The new version differs from the old in a couple of ways. The buffer size used is now adaptive. It starts at 32 bytes and doubles each time EOF hasn't been reached up to a limit of 64k. In addition, the buffer is only truncated when EOF or an error has been reached, rather than after every call to read as was the case for the old implementation.

I wrote up a benchmark to compare the old version and new version: https://gist.github.com/sfackler/e979711b0ee2f2063462

It tests a couple of different cases: a high bandwidth reader, a low bandwidth reader, and a low bandwidth reader that won't return more than 10k per call to `read`. The high bandwidth reader should be analagous to use cases when reading from e.g. a `BufReader` or `Vec`, and the low bandwidth readers should be analogous to reading from something like a `TcpStream`.

Of special note, reads from a high bandwith reader containing 4 bytes are now *4,495 times faster*. 
```
~/foo ❯ cargo bench
   Compiling foo v0.0.1 (file:///home/sfackler/foo)
     Running target/release/foo-7498d7dd7faecf5c

running 13 tests
test test_new ... ignored
test new_delay_4      ... bench:    230768 ns/iter (+/- 14812)
test new_delay_4_cap  ... bench:    231421 ns/iter (+/- 7211)
test new_delay_5m     ... bench:  14495370 ns/iter (+/- 4008648)
test new_delay_5m_cap ... bench:  73127954 ns/iter (+/- 59908587)
test new_nodelay_4    ... bench:        83 ns/iter (+/- 2)
test new_nodelay_5m   ... bench:  12527237 ns/iter (+/- 335243)
test std_delay_4      ... bench:    373095 ns/iter (+/- 12613)
test std_delay_4_cap  ... bench:    374190 ns/iter (+/- 19611)
test std_delay_5m     ... bench:  17356012 ns/iter (+/- 15906588)
test std_delay_5m_cap ... bench: 883555035 ns/iter (+/- 205559857)
test std_nodelay_4    ... bench:    144937 ns/iter (+/- 2448)
test std_nodelay_5m   ... bench:  16095893 ns/iter (+/- 3315116)

test result: ok. 0 passed; 0 failed; 1 ignored; 12 measured
```

r? @alexcrichton
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants