-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion 'nHits >= 3' failed in RelVal 12861.0 #45871
Comments
assign heterogeneous |
cms-bot internal usage |
A new Issue was created by @iarspider. @Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
Also in CMSSW_14_2_X_2024-09-03-2300
And in CMSSW_14_2_CLANG_X_2024-09-03-2300, CMSSW_14_2_DEVEL_X_2024-09-03-2300 |
Different Relval 12461.402 failed in GPU_X IB with the same assertion. |
assign reconstruction |
type tracking |
New categories assigned: reconstruction @jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks |
I think the problem arises since when we have no hits we store the tracks without setting diff --git a/RecoTracker/PixelSeeding/plugins/alpaka/CAHitNtupletGenerator.cc b/RecoTracker/PixelSeeding/plugins/alpaka/CAHitNtupletGenerator.cc
index 0abb5d2b1bb..d3101b203b6 100644
--- a/RecoTracker/PixelSeeding/plugins/alpaka/CAHitNtupletGenerator.cc
+++ b/RecoTracker/PixelSeeding/plugins/alpaka/CAHitNtupletGenerator.cc
@@ -301,8 +301,12 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE {
// Don't bother if less than 2 this
if (hits_d.view().metadata().size() < 2)
+ {
+ const auto device = alpaka::getDev(queue);
+ auto ntracks_d = cms::alpakatools::make_device_view(device, tracks.view().nTracks());
+ alpaka::memset(queue,ntracks_d,0);
return tracks;
-
+ }
GPUKernels kernels(m_params, hits_d.view().metadata().size(), hits_d.offsetBPIX2(), queue);
kernels.buildDoublets(hits_d.view(), hits_d.offsetBPIX2(), queue); I tested this avoid the crash (and solves the issue). |
@AdrianoDee can you open PRs (and backports)? |
On a bit different angle, given the fix above the problem seems to be related to #45837 that was merged in CMSSW_14_2_X_2024-09-03-2300, I became curious why the assertion failure occurred in CMSSW_14_2_NOOFAST_X_2024-09-03-1100 but not in default CMSSW_14_2_X_2024-09-03-1100, and in CMSSW_14_2_X_2024-09-03-2300 the failure seems to be consistent across across all IB flavors. I found that
@smuzaffar What happened that CMSSW_14_2_NOOFAST_X_2024-09-03-1100 became different than CMSSW_14_2_X_2024-09-03-1100? |
If this is related to #45837, please prepare also backports of the fix to 14_1 and 14_0. And we will wait for them before building the (patch) release we were planning. |
this look strange. Though CMSSW_14_2_NOOFAST_X_2024-09-03-1100 was trigger bit late in the evening ( so #45837 might have merged by that time) but IB tagging job should have tagged latest merge before 2024-09-03-1100. @iarspider is looking in to it |
cms-sw/cms-bot#2327 should fix the IB tagging. Previously bot was using UTC 11h00 time which means any thing merged till 13h00 local time was going in to 11h00 IB. As NOOFAST IB was triggers around 17h00 local time and #45837 was merged at UTC 2024-09-03T09:52:33 (Local time: 2024-09-03T11:52:33) that is why bot tagged the newer commits. |
I think we can close it since we have no failure in the latest GPU IBs |
+1 |
+heterogeneous |
This issue is fully signed and ready to be closed. |
@cmsbuild, please close |
In CMSSW_14_2_NOOFAST_X_2024-09-03-1100 IB, relval 12861.0 failed:
full log
I couldn't reproduce this locally.
The text was updated successfully, but these errors were encountered: