[DF][PyROOT] Add AsNumpy benchmarks #195

stwunsch · 2020-09-28T12:04:59Z

There's now a comprehensive benchmark suite regarding RDataFrame.AsNumpy. There are two benchmark running almost a minute, however, I would love to see huge amounts of data read into memory. That's what we have to optimize for ML applications and friends. Here the runtimes:

--------------------------------------------------------------------------------------------------------- benchmark: 13 tests ----------------------------------------------------------------------------------------------------------
12: Name (time in ms)                                                Min                    Max                   Mean            StdDev                 Median               IQR            Outliers      OPS            Rounds  Iterations
12: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12: test_rdataframe_asnumpy_simple                               21.6676 (1.0)          23.1801 (1.0)          22.1736 (1.0)      0.6036 (inf)          21.8962 (1.0)      0.6904 (inf)           1;0  45.0987 (1.0)           5           1
12: test_rdataframe_asnumpy_manybranches_booleans_imt         2,698.8600 (124.56)    2,698.8600 (116.43)    2,698.8600 (121.72)   0.0000 (1.0)       2,698.8600 (123.26)   0.0000 (1.0)           0;0   0.3705 (0.01)          1           1
12: test_rdataframe_asnumpy_manybranches_booleans_noimit      3,404.0005 (157.10)    3,404.0005 (146.85)    3,404.0005 (153.52)   0.0000 (1.0)       3,404.0005 (155.46)   0.0000 (1.0)           0;0   0.2938 (0.01)          1           1
12: test_rdataframe_asnumpy_nanoaod_scalar_imt                3,576.4014 (165.06)    3,576.4014 (154.29)    3,576.4014 (161.29)   0.0000 (1.0)       3,576.4014 (163.33)   0.0000 (1.0)           0;0   0.2796 (0.01)          1           1
12: test_rdataframe_asnumpy_nanoaod_scalar_noimt              7,675.9396 (354.26)    7,675.9396 (331.14)    7,675.9396 (346.17)   0.0000 (1.0)       7,675.9396 (350.56)   0.0000 (1.0)           0;0   0.1303 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_imt          9,195.2222 (424.38)    9,195.2222 (396.69)    9,195.2222 (414.69)   0.0000 (1.0)       9,195.2222 (419.95)   0.0000 (1.0)           0;0   0.1088 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_vectors_noimit       9,340.0237 (431.06)    9,340.0237 (402.93)    9,340.0237 (421.22)   0.0000 (1.0)       9,340.0237 (426.56)   0.0000 (1.0)           0;0   0.1071 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_noimit       9,744.4360 (449.72)    9,744.4360 (420.38)    9,744.4360 (439.46)   0.0000 (1.0)       9,744.4360 (445.03)   0.0000 (1.0)           0;0   0.1026 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_vectors_imt          9,786.9802 (451.69)    9,786.9802 (422.22)    9,786.9802 (441.38)   0.0000 (1.0)       9,786.9802 (446.97)   0.0000 (1.0)           0;0   0.1022 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_all_noimit          12,603.9666 (581.70)   12,603.9666 (543.74)   12,603.9666 (568.42)   0.0000 (1.0)      12,603.9666 (575.62)   0.0000 (1.0)           0;0   0.0793 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_all_imt             13,550.1253 (625.36)   13,550.1253 (584.56)   13,550.1253 (611.09)   0.0000 (1.0)      13,550.1253 (618.83)   0.0000 (1.0)           0;0   0.0738 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_imt               49,896.4716 (>1000.0)  49,896.4716 (>1000.0)  49,896.4716 (>1000.0)  0.0000 (1.0)      49,896.4716 (>1000.0)  0.0000 (1.0)           0;0   0.0200 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_noimt             67,270.0734 (>1000.0)  67,270.0734 (>1000.0)  67,270.0734 (>1000.0)  0.0000 (1.0)      67,270.0734 (>1000.0)  0.0000 (1.0)           0;0   0.0149 (0.00)          1           1
12: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

* Benchmark reading scalars and vectors from a NanoAOD file * Benchmark reading many branches from a flat ntuple (taken from the ATLAS Open Data files)

stwunsch · 2020-09-28T12:32:10Z

Just realize that we don't see any scaling with IMT ... I guess that means we spent the time in jitting the RDF related code. We can also run with all data from the flat ntuple (it's 400k vs now 100k events). Then we get following runtimes:

---------------------------------------------------------------------------------------------------------- benchmark: 13 tests ----------------------------------------------------------------------------------------------------------
12: Name (time in ms)                                                Min                    Max                   Mean            StdDev                 Median                IQR            Outliers      OPS            Rounds  Iterations
12: -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12: test_rdataframe_asnumpy_simple                               30.3714 (1.0)          49.9651 (1.0)          40.9204 (1.0)      7.7771 (inf)          42.9754 (1.0)      11.9714 (inf)           2;0  24.4377 (1.0)           5           1
12: test_rdataframe_asnumpy_nanoaod_scalar_imt                3,914.7300 (128.90)    3,914.7300 (78.35)     3,914.7300 (95.67)    0.0000 (1.0)       3,914.7300 (91.09)     0.0000 (1.0)           0;0   0.2554 (0.01)          1           1
12: test_rdataframe_asnumpy_manybranches_booleans_imt         8,623.8617 (283.95)    8,623.8617 (172.60)    8,623.8617 (210.75)   0.0000 (1.0)       8,623.8617 (200.67)    0.0000 (1.0)           0;0   0.1160 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_scalar_noimt              9,677.2970 (318.63)    9,677.2970 (193.68)    9,677.2970 (236.49)   0.0000 (1.0)       9,677.2970 (225.18)    0.0000 (1.0)           0;0   0.1033 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_booleans_noimit     11,550.2162 (380.30)   11,550.2162 (231.17)   11,550.2162 (282.26)   0.0000 (1.0)      11,550.2162 (268.76)    0.0000 (1.0)           0;0   0.0866 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_imt         28,585.8677 (941.21)   28,585.8677 (572.12)   28,585.8677 (698.57)   0.0000 (1.0)      28,585.8677 (665.17)    0.0000 (1.0)           0;0   0.0350 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_vectors_imt         30,517.0549 (>1000.0)  30,517.0549 (610.77)   30,517.0549 (745.77)   0.0000 (1.0)      30,517.0549 (710.10)    0.0000 (1.0)           0;0   0.0328 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_all_imt             34,743.8215 (>1000.0)  34,743.8215 (695.36)   34,743.8215 (849.06)   0.0000 (1.0)      34,743.8215 (808.46)    0.0000 (1.0)           0;0   0.0288 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_noimit      43,317.7511 (>1000.0)  43,317.7511 (866.96)   43,317.7511 (>1000.0)  0.0000 (1.0)      43,317.7511 (>1000.0)   0.0000 (1.0)           0;0   0.0231 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_vectors_noimit      46,245.4529 (>1000.0)  46,245.4529 (925.55)   46,245.4529 (>1000.0)  0.0000 (1.0)      46,245.4529 (>1000.0)   0.0000 (1.0)           0;0   0.0216 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_imt               50,331.7447 (>1000.0)  50,331.7447 (>1000.0)  50,331.7447 (>1000.0)  0.0000 (1.0)      50,331.7447 (>1000.0)   0.0000 (1.0)           0;0   0.0199 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_all_noimit          52,286.7003 (>1000.0)  52,286.7003 (>1000.0)  52,286.7003 (>1000.0)  0.0000 (1.0)      52,286.7003 (>1000.0)   0.0000 (1.0)           0;0   0.0191 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_noimt             66,942.8085 (>1000.0)  66,942.8085 (>1000.0)  66,942.8085 (>1000.0)  0.0000 (1.0)      66,942.8085 (>1000.0)   0.0000 (1.0)           0;0   0.0149 (0.00)          1           1
12: -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

stwunsch · 2020-09-28T12:33:29Z

What do we prefer? Unfortunately, the overall runtime with all events for the manybranches benchmarks is long (like 400s for all 13 benchmarks).

eguiraud · 2020-09-28T17:44:48Z

Do we need such long benchmarks to measure the performance of AsNumpy? The runtime should be pretty much linear w.r.t. the number of datapoints (with an offset given by setup and jitting), and regressions should be visible even if the benchmark took ~30 seconds (if jitting takes 30 seconds, we have a problem :D )

Also this benchmark will probably be hit by #154 (like #186 ). @vgvassilev 's suggestion to move rootbench-datafiles in RB_TEMP_FS might be a solution.

stwunsch · 2020-09-29T07:15:49Z

We have the issue that for many branches the jitting takes up to 12 seconds and you don't see any scaling with IMT (see the manybranches discussion above). However, let me see how I can speedup the NanoAOD based example!

stwunsch · 2020-09-29T07:23:00Z

Actually, we are also restricted by the number of clusters. The data_A.GamGam.root file has just 3 clusters and the NanoAOD like has 74 clusters. If we use much less clusters than 74, we should not be able to judge reliably the scaling on 8 cores.

So we have one test for the scaling (NanoAOD) and one for reading many things and the associated overhead introduced by RDF (flat ntuple example).

stwunsch · 2020-09-30T09:37:34Z

I think I'll slim down the number of benchmarks. It's more useful than restricting the input data!

stwunsch · 2020-09-30T09:53:56Z

Can we live with the following set of benchmarks? I would love to keep the one reading vectors from NanoAOD since it's a rather typical case we are really bad in doing. The issue is the PyROOT wrapping of the vector objects, which takes at least half of the runtime, and more importantly, does not scale with IMT. Further, the manybranches_scalar show that we have quite some overhead due to RDF jitting and setup times. The boolean one is needed because we cannot memory adopt booleans since the layout is different in std::vectors and numpy arrays.

---------------------------------------------------------------------------------------------------------- benchmark: 9 tests ----------------------------------------------------------------------------------------------------------
12: Name (time in ms)                                                Min                    Max                   Mean            StdDev                 Median               IQR            Outliers      OPS            Rounds  Iterations
12: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12: test_rdataframe_asnumpy_simple                               21.5091 (1.0)          23.0830 (1.0)          22.0754 (1.0)      0.6081 (inf)          21.9336 (1.0)      0.6900 (inf)           1;0  45.2993 (1.0)           5           1
12: test_rdataframe_asnumpy_nanoaod_scalar_imt                4,974.3151 (231.27)    4,974.3151 (215.50)    4,974.3151 (225.33)   0.0000 (1.0)       4,974.3151 (226.79)   0.0000 (1.0)           0;0   0.2010 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_scalar_noimt              7,971.1868 (370.60)    7,971.1868 (345.33)    7,971.1868 (361.09)   0.0000 (1.0)       7,971.1868 (363.42)   0.0000 (1.0)           0;0   0.1255 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_booleans_imt         8,933.3834 (415.33)    8,933.3834 (387.01)    8,933.3834 (404.68)   0.0000 (1.0)       8,933.3834 (407.29)   0.0000 (1.0)           0;0   0.1119 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_booleans_noimit     13,532.1889 (629.14)   13,532.1889 (586.24)   13,532.1889 (613.00)   0.0000 (1.0)      13,532.1889 (616.96)   0.0000 (1.0)           0;0   0.0739 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_imt         25,323.1163 (>1000.0)  25,323.1163 (>1000.0)  25,323.1163 (>1000.0)  0.0000 (1.0)      25,323.1163 (>1000.0)  0.0000 (1.0)           0;0   0.0395 (0.00)          1           1
12: test_rdataframe_asnumpy_manybranches_scalars_noimit      41,991.1258 (>1000.0)  41,991.1258 (>1000.0)  41,991.1258 (>1000.0)  0.0000 (1.0)      41,991.1258 (>1000.0)  0.0000 (1.0)           0;0   0.0238 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_imt               60,467.0337 (>1000.0)  60,467.0337 (>1000.0)  60,467.0337 (>1000.0)  0.0000 (1.0)      60,467.0337 (>1000.0)  0.0000 (1.0)           0;0   0.0165 (0.00)          1           1
12: test_rdataframe_asnumpy_nanoaod_vector_noimt             67,533.3687 (>1000.0)  67,533.3687 (>1000.0)  67,533.3687 (>1000.0)  0.0000 (1.0)      67,533.3687 (>1000.0)  0.0000 (1.0)           0;0   0.0148 (0.00)          1           1
12: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12: 
12: Legend:
12:   Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
12:   OPS: Operations Per Second, computed as 1 / Mean
12: ================== 9 passed, 3 warnings in 233.84s (0:03:53) ===================
2/2 Test #12: rootbench-pytest-rdataframe-asnumpy ....................   Passed  235.17 sec

eguiraud · 2020-09-30T10:02:29Z

good for me, can you explain one last time why we can't slice the input data (and therefore runtime) of the last three benchmarks in half without losing meaningful information?

stwunsch · 2020-09-30T10:15:04Z

I would love to see the scaling for 8 cores for which we need a good bunch of clusters. The NanoAOD one has 80 clusters, which makes sure that we can scale in principle. I could trim the test_rdataframe_asnumpy_manybranches_scalars_* in principle, but it's already only 400k events spanning 3 clusters, so it's not really a lot of work but represents the typical scale of conversion I would expect from users.

[DF][PyROOT] Add AsNumpy benchmarks

c2acb8a

* Benchmark reading scalars and vectors from a NanoAOD file * Benchmark reading many branches from a flat ntuple (taken from the ATLAS Open Data files)

stwunsch requested review from oshadura and eguiraud September 28, 2020 12:04

stwunsch self-assigned this Sep 28, 2020

[DF][PyROOT] Use larger file for AsNumpy benchmark 'manybranches'

778a673

stwunsch mentioned this pull request Sep 29, 2020

[PyROOT][DF] Improve RDataFrame.AsNumpy performance root-project/root#6496

Merged

[DF][PyROOT] Reduce the number of AsNumpy benchmarks

d34ac8c

stwunsch merged commit 4fbdf35 into root-project:master Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DF][PyROOT] Add AsNumpy benchmarks #195

[DF][PyROOT] Add AsNumpy benchmarks #195

stwunsch commented Sep 28, 2020

stwunsch commented Sep 28, 2020

stwunsch commented Sep 28, 2020

eguiraud commented Sep 28, 2020

stwunsch commented Sep 29, 2020

stwunsch commented Sep 29, 2020 •

edited

Loading

stwunsch commented Sep 30, 2020

stwunsch commented Sep 30, 2020 •

edited

Loading

eguiraud commented Sep 30, 2020

stwunsch commented Sep 30, 2020

[DF][PyROOT] Add AsNumpy benchmarks #195

[DF][PyROOT] Add AsNumpy benchmarks #195

Conversation

stwunsch commented Sep 28, 2020

stwunsch commented Sep 28, 2020

stwunsch commented Sep 28, 2020

eguiraud commented Sep 28, 2020

stwunsch commented Sep 29, 2020

stwunsch commented Sep 29, 2020 • edited Loading

stwunsch commented Sep 30, 2020

stwunsch commented Sep 30, 2020 • edited Loading

eguiraud commented Sep 30, 2020

stwunsch commented Sep 30, 2020

stwunsch commented Sep 29, 2020 •

edited

Loading

stwunsch commented Sep 30, 2020 •

edited

Loading