-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CPU time to the results table #7
Comments
Hello again @homm, I think the problem is that user time can be very misleading. If the CPU has hyperthreading (and the library is threaded), it will include the time that paired threads spent stalled on the shared ALU. This means it can overestimate the actual computation done by the library by a factor of two. For example, on this terrible two-core, four-thread laptop I see:
Then if I set the size of the libvips threadpool to 1:
User time roughly halves, even though the same amount of calculation is happening. My idea instead was to include a one-thread time for libvips -- this should give a more accurate measure of actual computation time. It would be interesting to add similar figures for the other libraries, but it would need some work to figure out how to turn off threading for all of them. |
I believe this number is still interesting. For example, If I have 2-core CPU with multithreading and I see that CPU time is 3.7x more than the real time, I can see that whole CPU is busy and no room to run things faster. On the other hand, you are perfectly right that CPU time 2.9s compared to 1.66s looks 1.75x slower, while in practice this could be the same time. Why we ever want to know execution time on the single core? Because other cores will be free for other tasks. So, the actual interesting metric is: "I have a CPU, and I have a lot of tasks (which are the same for the benchmark simplicity). Which maximum throughput I can get using this library?"
This may be close to the correct indication, but far from the answer to the question, because there are additional factors:
So, in general, the formula "performance on single core × number of cores" doesn't work. The only solution I see is run the "number of cores" tasks simultaneously. For example, on 4-cores i5-4430: # Three sequential runs
$ time ./vips.py x.tif && time ./vips.py x.tif && time ./vips.py x.tif
real 0m0.318s
user 0m0.752s
sys 0m0.076s
# Three parallel runs
$ time sh -c "./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & wait" &&\
time sh -c "./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & wait" &&\
time sh -c "./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & wait"
real 0m0.895s
user 0m3.184s
sys 0m0.300s
# Three sequential runs
$ time ./pillow.py x.tif && time ./pillow.py x.tif && time ./pillow.py x.tif
real 0m0.221s
user 0m0.156s
sys 0m0.060s
# Three parallel runs
$ time sh -c "./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & wait" &&\
time sh -c "./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & wait" &&\
time sh -c "./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & wait"
real 0m0.388s
user 0m1.112s
sys 0m0.404s |
On the benchmark results test page there is a "Run time (secs real)" column. The
time
utility shows not only the real time but also user and sys time. Sum of user and sys time is total CPU time spent to the task. You can add this value to the table. I believe this could be useful for many users.The text was updated successfully, but these errors were encountered: