-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use 2D grid for inner loop parallelization #260
Use 2D grid for inner loop parallelization #260
Conversation
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Performances HEAD
#260 - this PR
|
If we look to nvprof instead
#260 - this PR
what is difficult to understand is
|
The total time spent in the kernels decreases (both with #242 and #260), so something else must be taking longer:
For example, I do not know if the time spent in creating and launching the blocks is accounted, nor what happens to the overall GPU occupancy. |
Here is summary of the comparison of the time spent in the kernels, with the modified ones in bold:
It seems that |
we need to run regression on MC to verify that MTV is identical. |
Here are the improvements I see if I apply only the changes to on a pair of P100on a pair of V100 |
MTV http://innocent.home.cern.ch/innocent/RelVal/pixOnlyPU50_gpuPR260/plots_summary.html
A: HEAD |
ok with
I get from this PR
and
so it seems that , at least for this workflow on V100, 2D parallelization is not worth for Doublet Building with "stride=1" we reach "1420.1 ± 5.3 ev/s" with stride=8" in fishbone is "1426.3 ± 4.8 ev/s" |
OK... does #261 look good ? |
#261 "looks" good , maybe I should test it.... |
#261 gives me "1419.6 ± 3.7 ev/s" equivalent to this PR and stride=1 in the doublet finder. |
This version should be equivalent to #261 with the advantage of having the ability to modify the 2D grid params for fishbone and doublet finder as well |
for completeness the nvprof report for this version
|
@fwyzard , will you have the time to update your curves with the latest version of this PR? |
sure |
Validation summaryReference release CMSSW_10_4_0 at b8365c6
|
|
I'm running the validation on the P100 to avoid |
/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW summary
/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW summary
|
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
Introduce the inner loop parallelization in the doublet finder using the stride pattern already used in the "fishbone", and make use of a 2D grid instead of a hand-made stride.
w/tr/ #242 here we keep the same number of threads per block as in the baseline
(this was already the case for the doublet finder)