-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DF][PyROOT] Add AsNumpy benchmarks #195
Conversation
* Benchmark reading scalars and vectors from a NanoAOD file * Benchmark reading many branches from a flat ntuple (taken from the ATLAS Open Data files)
Just realize that we don't see any scaling with IMT ... I guess that means we spent the time in jitting the RDF related code. We can also run with all data from the flat ntuple (it's 400k vs now 100k events). Then we get following runtimes:
|
What do we prefer? Unfortunately, the overall runtime with all events for the |
Do we need such long benchmarks to measure the performance of AsNumpy? The runtime should be pretty much linear w.r.t. the number of datapoints (with an offset given by setup and jitting), and regressions should be visible even if the benchmark took ~30 seconds (if jitting takes 30 seconds, we have a problem :D ) Also this benchmark will probably be hit by #154 (like #186 ). @vgvassilev 's suggestion to move rootbench-datafiles in RB_TEMP_FS might be a solution. |
We have the issue that for many branches the jitting takes up to 12 seconds and you don't see any scaling with IMT (see the |
Actually, we are also restricted by the number of clusters. The So we have one test for the scaling (NanoAOD) and one for reading many things and the associated overhead introduced by RDF (flat ntuple example). |
I think I'll slim down the number of benchmarks. It's more useful than restricting the input data! |
Can we live with the following set of benchmarks? I would love to keep the one reading vectors from NanoAOD since it's a rather typical case we are really bad in doing. The issue is the PyROOT wrapping of the vector objects, which takes at least half of the runtime, and more importantly, does not scale with IMT. Further, the
|
good for me, can you explain one last time why we can't slice the input data (and therefore runtime) of the last three benchmarks in half without losing meaningful information? |
I would love to see the scaling for 8 cores for which we need a good bunch of clusters. The NanoAOD one has 80 clusters, which makes sure that we can scale in principle. I could trim the |
There's now a comprehensive benchmark suite regarding
RDataFrame.AsNumpy
. There are two benchmark running almost a minute, however, I would love to see huge amounts of data read into memory. That's what we have to optimize for ML applications and friends. Here the runtimes: