Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DF][PyROOT] Add AsNumpy benchmarks #195

Merged
merged 3 commits into from
Sep 30, 2020

Conversation

stwunsch
Copy link
Collaborator

There's now a comprehensive benchmark suite regarding RDataFrame.AsNumpy. There are two benchmark running almost a minute, however, I would love to see huge amounts of data read into memory. That's what we have to optimize for ML applications and friends. Here the runtimes:

--------------------------------------------------------------------------------------------------------- benchmark: 13 tests ----------------------------------------------------------------------------------------------------------
12: Name (time in ms)                                                Min                    Max                   Mean            StdDev                 Median               IQR            Outliers      OPS            Rounds  Iterations
12: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12: test_rdataframe_asnumpy_simple                               21.6676 (1.0)          23.1801 (1.0)          22.1736 (1.0)      0.6036 (inf)          21.8962 (1.0)      0.6904 (inf)           1;0  45.0987 (1.0)           5           1
12: test_rdataframe_asnumpy_manybranches_booleans_imt         2,698.8600 (124.56)    2,698.8600 (116.43)    2,698.8600 (121.72)   0.0000 (1.0)       2,698.8600 (123.26)   0.0000 (1.0)           0;0   0.3705 (0.01)          1           1
12: test_rdataframe_asnumpy_manybranches_booleans_noimit      3,404.0005 (157.10)    3,404.0005 (146.85)    3,404.0005 (153.52)   0.0000 (1.0)       3,404.0005 (155.46)   0.0000 (1.0)           0;0   0.2938 (0.01)          1           1
12: test_rdataframe_asnumpy_nanoaod_scalar_imt                3,576.4014 (165.06)    3,576.4014 (154.29)    3,576.4014 (161.29)   0.0000 (1.0)       3,576.4014 (163.33)   0.0000 (1.0)           0;0   0.2796 (0.01)          1           1
12: test_rdataframe_asnumpy_nanoaod_scalar_noimt              7,675.9396 (354.26)    7,675.9396 (331.14)    7,675.9396 (346.17)   0.0000 (1.0)       7,675.9396 (350.56)   0.0000 (1.0)           0;0   0.1303 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_imt          9,195.2222 (424.38)    9,195.2222 (396.69)    9,195.2222 (414.69)   0.0000 (1.0)       9,195.2222 (419.95)   0.0000 (1.0)           0;0   0.1088 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_vectors_noimit       9,340.0237 (431.06)    9,340.0237 (402.93)    9,340.0237 (421.22)   0.0000 (1.0)       9,340.0237 (426.56)   0.0000 (1.0)           0;0   0.1071 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_noimit       9,744.4360 (449.72)    9,744.4360 (420.38)    9,744.4360 (439.46)   0.0000 (1.0)       9,744.4360 (445.03)   0.0000 (1.0)           0;0   0.1026 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_vectors_imt          9,786.9802 (451.69)    9,786.9802 (422.22)    9,786.9802 (441.38)   0.0000 (1.0)       9,786.9802 (446.97)   0.0000 (1.0)           0;0   0.1022 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_all_noimit          12,603.9666 (581.70)   12,603.9666 (543.74)   12,603.9666 (568.42)   0.0000 (1.0)      12,603.9666 (575.62)   0.0000 (1.0)           0;0   0.0793 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_all_imt             13,550.1253 (625.36)   13,550.1253 (584.56)   13,550.1253 (611.09)   0.0000 (1.0)      13,550.1253 (618.83)   0.0000 (1.0)           0;0   0.0738 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_imt               49,896.4716 (>1000.0)  49,896.4716 (>1000.0)  49,896.4716 (>1000.0)  0.0000 (1.0)      49,896.4716 (>1000.0)  0.0000 (1.0)           0;0   0.0200 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_noimt             67,270.0734 (>1000.0)  67,270.0734 (>1000.0)  67,270.0734 (>1000.0)  0.0000 (1.0)      67,270.0734 (>1000.0)  0.0000 (1.0)           0;0   0.0149 (0.00)          1           1
12: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

* Benchmark reading scalars and vectors from a NanoAOD file
* Benchmark reading many branches from a flat ntuple (taken from the
  ATLAS Open Data files)
@stwunsch stwunsch self-assigned this Sep 28, 2020
@stwunsch
Copy link
Collaborator Author

Just realize that we don't see any scaling with IMT ... I guess that means we spent the time in jitting the RDF related code. We can also run with all data from the flat ntuple (it's 400k vs now 100k events). Then we get following runtimes:

---------------------------------------------------------------------------------------------------------- benchmark: 13 tests ----------------------------------------------------------------------------------------------------------
12: Name (time in ms)                                                Min                    Max                   Mean            StdDev                 Median                IQR            Outliers      OPS            Rounds  Iterations
12: -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12: test_rdataframe_asnumpy_simple                               30.3714 (1.0)          49.9651 (1.0)          40.9204 (1.0)      7.7771 (inf)          42.9754 (1.0)      11.9714 (inf)           2;0  24.4377 (1.0)           5           1
12: test_rdataframe_asnumpy_nanoaod_scalar_imt                3,914.7300 (128.90)    3,914.7300 (78.35)     3,914.7300 (95.67)    0.0000 (1.0)       3,914.7300 (91.09)     0.0000 (1.0)           0;0   0.2554 (0.01)          1           1
12: test_rdataframe_asnumpy_manybranches_booleans_imt         8,623.8617 (283.95)    8,623.8617 (172.60)    8,623.8617 (210.75)   0.0000 (1.0)       8,623.8617 (200.67)    0.0000 (1.0)           0;0   0.1160 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_scalar_noimt              9,677.2970 (318.63)    9,677.2970 (193.68)    9,677.2970 (236.49)   0.0000 (1.0)       9,677.2970 (225.18)    0.0000 (1.0)           0;0   0.1033 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_booleans_noimit     11,550.2162 (380.30)   11,550.2162 (231.17)   11,550.2162 (282.26)   0.0000 (1.0)      11,550.2162 (268.76)    0.0000 (1.0)           0;0   0.0866 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_imt         28,585.8677 (941.21)   28,585.8677 (572.12)   28,585.8677 (698.57)   0.0000 (1.0)      28,585.8677 (665.17)    0.0000 (1.0)           0;0   0.0350 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_vectors_imt         30,517.0549 (>1000.0)  30,517.0549 (610.77)   30,517.0549 (745.77)   0.0000 (1.0)      30,517.0549 (710.10)    0.0000 (1.0)           0;0   0.0328 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_all_imt             34,743.8215 (>1000.0)  34,743.8215 (695.36)   34,743.8215 (849.06)   0.0000 (1.0)      34,743.8215 (808.46)    0.0000 (1.0)           0;0   0.0288 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_noimit      43,317.7511 (>1000.0)  43,317.7511 (866.96)   43,317.7511 (>1000.0)  0.0000 (1.0)      43,317.7511 (>1000.0)   0.0000 (1.0)           0;0   0.0231 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_vectors_noimit      46,245.4529 (>1000.0)  46,245.4529 (925.55)   46,245.4529 (>1000.0)  0.0000 (1.0)      46,245.4529 (>1000.0)   0.0000 (1.0)           0;0   0.0216 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_imt               50,331.7447 (>1000.0)  50,331.7447 (>1000.0)  50,331.7447 (>1000.0)  0.0000 (1.0)      50,331.7447 (>1000.0)   0.0000 (1.0)           0;0   0.0199 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_all_noimit          52,286.7003 (>1000.0)  52,286.7003 (>1000.0)  52,286.7003 (>1000.0)  0.0000 (1.0)      52,286.7003 (>1000.0)   0.0000 (1.0)           0;0   0.0191 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_noimt             66,942.8085 (>1000.0)  66,942.8085 (>1000.0)  66,942.8085 (>1000.0)  0.0000 (1.0)      66,942.8085 (>1000.0)   0.0000 (1.0)           0;0   0.0149 (0.00)          1           1
12: -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@stwunsch
Copy link
Collaborator Author

What do we prefer? Unfortunately, the overall runtime with all events for the manybranches benchmarks is long (like 400s for all 13 benchmarks).

@eguiraud
Copy link
Member

Do we need such long benchmarks to measure the performance of AsNumpy? The runtime should be pretty much linear w.r.t. the number of datapoints (with an offset given by setup and jitting), and regressions should be visible even if the benchmark took ~30 seconds (if jitting takes 30 seconds, we have a problem :D )

Also this benchmark will probably be hit by #154 (like #186 ). @vgvassilev 's suggestion to move rootbench-datafiles in RB_TEMP_FS might be a solution.

@stwunsch
Copy link
Collaborator Author

We have the issue that for many branches the jitting takes up to 12 seconds and you don't see any scaling with IMT (see the manybranches discussion above). However, let me see how I can speedup the NanoAOD based example!

@stwunsch
Copy link
Collaborator Author

stwunsch commented Sep 29, 2020

Actually, we are also restricted by the number of clusters. The data_A.GamGam.root file has just 3 clusters and the NanoAOD like has 74 clusters. If we use much less clusters than 74, we should not be able to judge reliably the scaling on 8 cores.

So we have one test for the scaling (NanoAOD) and one for reading many things and the associated overhead introduced by RDF (flat ntuple example).

@stwunsch
Copy link
Collaborator Author

I think I'll slim down the number of benchmarks. It's more useful than restricting the input data!

@stwunsch
Copy link
Collaborator Author

stwunsch commented Sep 30, 2020

Can we live with the following set of benchmarks? I would love to keep the one reading vectors from NanoAOD since it's a rather typical case we are really bad in doing. The issue is the PyROOT wrapping of the vector objects, which takes at least half of the runtime, and more importantly, does not scale with IMT. Further, the manybranches_scalar show that we have quite some overhead due to RDF jitting and setup times. The boolean one is needed because we cannot memory adopt booleans since the layout is different in std::vectors and numpy arrays.

---------------------------------------------------------------------------------------------------------- benchmark: 9 tests ----------------------------------------------------------------------------------------------------------
12: Name (time in ms)                                                Min                    Max                   Mean            StdDev                 Median               IQR            Outliers      OPS            Rounds  Iterations
12: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12: test_rdataframe_asnumpy_simple                               21.5091 (1.0)          23.0830 (1.0)          22.0754 (1.0)      0.6081 (inf)          21.9336 (1.0)      0.6900 (inf)           1;0  45.2993 (1.0)           5           1
12: test_rdataframe_asnumpy_nanoaod_scalar_imt                4,974.3151 (231.27)    4,974.3151 (215.50)    4,974.3151 (225.33)   0.0000 (1.0)       4,974.3151 (226.79)   0.0000 (1.0)           0;0   0.2010 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_scalar_noimt              7,971.1868 (370.60)    7,971.1868 (345.33)    7,971.1868 (361.09)   0.0000 (1.0)       7,971.1868 (363.42)   0.0000 (1.0)           0;0   0.1255 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_booleans_imt         8,933.3834 (415.33)    8,933.3834 (387.01)    8,933.3834 (404.68)   0.0000 (1.0)       8,933.3834 (407.29)   0.0000 (1.0)           0;0   0.1119 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_booleans_noimit     13,532.1889 (629.14)   13,532.1889 (586.24)   13,532.1889 (613.00)   0.0000 (1.0)      13,532.1889 (616.96)   0.0000 (1.0)           0;0   0.0739 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_imt         25,323.1163 (>1000.0)  25,323.1163 (>1000.0)  25,323.1163 (>1000.0)  0.0000 (1.0)      25,323.1163 (>1000.0)  0.0000 (1.0)           0;0   0.0395 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_noimit      41,991.1258 (>1000.0)  41,991.1258 (>1000.0)  41,991.1258 (>1000.0)  0.0000 (1.0)      41,991.1258 (>1000.0)  0.0000 (1.0)           0;0   0.0238 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_imt               60,467.0337 (>1000.0)  60,467.0337 (>1000.0)  60,467.0337 (>1000.0)  0.0000 (1.0)      60,467.0337 (>1000.0)  0.0000 (1.0)           0;0   0.0165 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_noimt             67,533.3687 (>1000.0)  67,533.3687 (>1000.0)  67,533.3687 (>1000.0)  0.0000 (1.0)      67,533.3687 (>1000.0)  0.0000 (1.0)           0;0   0.0148 (0.00)          1           1
12: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12: 
12: Legend:
12:   Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
12:   OPS: Operations Per Second, computed as 1 / Mean
12: ================== 9 passed, 3 warnings in 233.84s (0:03:53) ===================
2/2 Test #12: rootbench-pytest-rdataframe-asnumpy ....................   Passed  235.17 sec

@eguiraud
Copy link
Member

good for me, can you explain one last time why we can't slice the input data (and therefore runtime) of the last three benchmarks in half without losing meaningful information?

@stwunsch
Copy link
Collaborator Author

I would love to see the scaling for 8 cores for which we need a good bunch of clusters. The NanoAOD one has 80 clusters, which makes sure that we can scale in principle. I could trim the test_rdataframe_asnumpy_manybranches_scalars_* in principle, but it's already only 400k events spanning 3 clusters, so it's not really a lot of work but represents the typical scale of conversion I would expect from users.

@stwunsch stwunsch merged commit 4fbdf35 into root-project:master Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants