Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memtrace is very slow due to dr_fprintf not being buffered #1929

Closed
derekbruening opened this issue Apr 24, 2016 · 1 comment
Closed

memtrace is very slow due to dr_fprintf not being buffered #1929

derekbruening opened this issue Apr 24, 2016 · 1 comment

Comments

@derekbruening
Copy link
Contributor

Even though memtrace_x86.c only prints data to the file every 8K memrefs, it still prints them one at time with a separate dr_fprintf call for each one.

Native is instantaneous to gzip this 1.5MB file (created by concatening 10 copies of this build dir's Makefile together):

# /usr/bin/time gzip /tmp/z ; gunzip /tmp/z.gz
0.01user 0.00system 0:00.01elapsed 94%CPU (0avgtext+0avgdata 1684maxresident)k

Here's memtrace_x86 from TOT:

# /usr/bin/time bin64/drrun -c api/bin/libmemtrace_x86.so -- gzip /tmp/z ; gunzip /tmp/z.gz
18.65user 18.61system 0:40.09elapsed 92%CPU (0avgtext+0avgdata 6684maxresident)k

And here's with buffering, basically calling fdopen(data->log, "w") and calling fprintf on the resulting FILE* instead of using dr_fprintf:

# /usr/bin/time bin64/drrun -c api/bin/libmemtrace_x86.so -- gzip /tmp/z ; gunzip /tmp/z.gz
5.81user 0.47system 0:08.66elapsed 72%CPU (0avgtext+0avgdata 7176maxresident)k

That is a big difference, 4.6x. The trace log file created is 1.5GB.

Here's the client with the printing removed, showing all the time spent is there and not in the instrumentation:

# /usr/bin/time bin64/drrun -c api/bin/libmemtrace_x86.so -- gzip /tmp/z ; gunzip /tmp/z.gz          
0.32user 0.00system 0:00.33elapsed 97%CPU (0avgtext+0avgdata 6604maxresident)k

In some cases the perf difference is 20x or more between using dr_fprintf or using fprintf: for a naive implementation of instruction tracing that prints on every single instruction.

We do document that dr_fprintf is not buffered, but then we use it in examples like memtrace. We should either add buffering to memtrace, switch it to use fprintf from libc, or add buffering to dr_fprintf. We should also review all other places where dr_fprintf is used and we may want to beef up warnings in the docs if we decide not to add buffering.

@derekbruening
Copy link
Contributor Author

In memtrace(), using dr_snprintf() to write to a buffer inside the loop and then printing the buffer with dr_write_file() is better than dr_fprintf() on each iteration but still slower than libc fprintf:

# /usr/bin/time bin64/drrun -c api/bin/libmemtrace_x86.so -- gzip /tmp/z ; gunzip /tmp/z.gz
15.93user 0.43system 0:18.06elapsed 90%CPU (0avgtext+0avgdata 7024maxresident)k

Of course, turning off READABLE_TRACE and avoiding any printf (just a raw dr_write_file) results in a much, much faster runtime and a 639MB file instead of 1.5GB:

/usr/bin/time bin64/drrun -c api/bin/libmemtrace_x86.so -- gzip /tmp/z ; gunzip /tmp/z.gz
0.33user 0.18system 0:00.52elapsed 97%CPU (0avgtext+0avgdata 6800maxresident)k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant