-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for SiPixelRecHitFromCUDA crash during online GPU tests #35229
Fix for SiPixelRecHitFromCUDA crash during online GPU tests #35229
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35229/25186
|
A new Pull Request was created by @czangela for master. It involves the following packages:
@jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
enable gpu |
abort test |
@cmsbuild please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8b6b98/18496/summary.html GPU Comparison SummarySummary:
Comparison SummarySummary:
|
type bug-fix |
shouldn't the fix be done upstream? What was the reason that a non-zero adc was computed for this digi while the packed value is 0? Unless it's clear that this behavior is expected by design, if this is somehow time critical and can not be fixed upstream now, please add a LogWarning message for the |
@slava77 this was meant to be a quick fix just to prevent a crash. We are working on the proper (upstream) fix |
4ebe06e
to
f6d32fa
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35229/25297
|
@cmsbuild please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8b6b98/18653/summary.html GPU Comparison SummarySummary:
Comparison SummarySummary:
|
+reconstruction
|
+heterogeneous |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1
|
PR description:
These changes address the GPU error brought up in #34831 comment. [1]
The was error apparently was due to a
digi
atrow
0col
0, and theadc
value not being (correctly) set in the packing, and then when checking in SiPixelDigisClustersFromSoA.cc#137 that one digi was skipped, so the associated cluster never got created, so later one ends up with one more rechit than actual clusters in that one module.This is the actual digi being lost:
Digi index 754; clusid 8; rawIdArr 344545284; adc 25661, pdigi: 00000000000000000000000000000000
The condition
digis.pdigi(i) == 0
was initially meant to check whether the digi was left uninitialized from the RawToDigi_kernel. But the problem is that a packed digi can be all zeros. For checking uninitialized digis, this condition is changed todigis.rawIdArr(i) == 0
.Other than that, to filter noisy/dead pixel digis the condition
digis.adc(i) == 0
was added well, based on the treating of noisy/dead pixels in the calibDigis kernel.[1]
PR validation:
Atm it was only checked that the recipe described in #34831 does not crash in 11_3_4.
if this PR is a backport please specify the original PR and why you need to backport that PR: