-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce ML inference time in b-tag and related jet taggers: focus on ParticleNet #32883
Comments
A new Issue was created by @slava77 Slava Krutelyov. @Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign reconstruction |
Analyzing inputs with LRP/Integrated Gradient and removing some 40% lowest-scoring variables reduced the inference time for DDX V2 by half (not counting model load/initiation). |
please clarify if V2 already has this reduction or if it is a possible improvement. |
to avoid potential improvements from the last few months, I updated the timing values using 11_3_0_pre2 (instead of 11_2_0_pre9). The results did not really change. |
V2 already has this. The V2 is similar in time to the V1 (after the ONNX update) even though it considered more inputs |
ONNXRuntime was updated from 1.3.0 to 1.6.0 yesterday (cms-sw/cmsdist#6649), and should be available with IB |
11_3_0_pre2 -> CMSSW_11_3_X_2021-02-17-2300 in ms/ev
There is about 5% reduction, which kind of looks correlated with use of ONNX rather than the job running generally faster or some other changes between the releases. I would not consider the 5% reduction a significant enough effect to resolve this issue. |
Do we plan to enable AVX/AVX2 support in ONNXRuntime at some point, either explicitly or implicitly via the |
What is the range of the "Dynamic"? Is it smart enough to stay with AVX2 or will it push for AVX512 wherever available regardless of possible frequency scaling implications? Considering that I found out recently that we are effectively using dynamic in TF (#33442) and operationally things were OK, I think that it's reasonable to try it wider. |
Yes the level of "dynamic" can be controlled:
Sure I can open a PR. Any suggestion on how dynamic we want to use? |
|
@slava77 OK I made the PR: cms-sw/cmsdist#6855. |
jenkins tests with timing monitored in miniAOD should be enough to confirm the benefits. I guess that there will be small differences between the |
@slava77 You are referring to the small numerical difference of the outputs right? |
yes |
@emilbols please take note of this performance issue, and let us know the plans to address this. @cms-sw/btv-pog-l2 |
I believe after PR cms-sw/cmsdist#6855 there was a reduction for all the ONNX modules cms-sw/cmsdist#6855 (comment). If im not mistaken the table referenced here is before that. A simple thing that might be useful to do, is to make sure the tagger is only running of the phase space that is needed. For instance I believe DeepJet runs on jets beyond eta 2.5 and below pt 20 GeV even though it is not used in this phase space. On the actual ML inference side, we have to investigate further how to improve the situation. I will bring it up with the BTV conveners. |
Thanks for confirming. Comparing 11_3_0 (which already includes the AVX2 fix I think) and 12_4_0_pre2 in Run3 MINIAOD, 400evs, on the exact same machine:
Additional improvements (e.g. not doing inference in unused phase space) would be useful. |
+reconstruction |
@cmsbuild please close |
This issue is fully signed and ready to be closed. |
This is a replacement/refresh for #25230 where the total of ML jet taggers was 20% of miniAOD time.
in a recent variant of reminiAOD (now the 2018 UL remini wf 136.88811) jet tagging inference takes 15% of the miniAOD processing time, as measured in CMSSW_11_3_0_pre2
The text was updated successfully, but these errors were encountered: