-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error handling in GPU callbacks [14.0.x] #44477
Improve error handling in GPU callbacks [14.0.x] #44477
Conversation
backport #44476 |
enable gpu |
please test |
A new Pull Request was created by @fwyzard for CMSSW_14_0_X. It involves the following packages:
@fwyzard, @makortel can you please review it and eventually sign? Thanks. cms-bot commands are listed here
|
cms-bot internal usage |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a5758/38271/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
Update the callbacks used on the CUDA and HIP/ROCm backends to match the original CUDA implementation: in case of asynchronous errors, throw-catch an exception to let GDB intercept it, and propagate the exception to the framework.
8c3b153
to
127bf32
Compare
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a5758/38299/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
+heterogeneous |
This pull request is fully signed and it will be integrated in one of the next CMSSW_14_0_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_14_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
Update the callbacks used on the CUDA and HIP/ROCm backends to match the original CUDA implementation: in case of asynchronous errors, throw-catch an exception to let GDB intercept it, and propagate the exception to the framework.
PR validation:
Unit tests ran.
Backport status
Backport of #44476 to 14.0.x for data taking.