Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DBG_X] Multiple relvals failing due to "looking for SiStripNoises for a strip out of range: strip 0" #46250

Closed
iarspider opened this issue Oct 4, 2024 · 19 comments

Comments

@iarspider
Copy link
Contributor

In CMSSW_14_2_DBG_X_2024-10-03-2300, multiple RelVals failed with

----- Begin Fatal Exception 04-Oct-2024 06:21:16 CEST-----------------------
An exception of category 'CorruptedData' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'HLTAnalyzerEndpath'
   [2] Prefetching for module L1TRawToDigi/'hltGtStage2Digis'
   [3] Prefetching for module RawDataCollectorByLabel/'rawDataCollector'
   [4] Prefetching for module SiStripDigiToRawModule/'SiStripDigiToRaw'
   [5] Calling method for module MixingModule/'mix'
Exception Message:
[SiStripNoises::getNoise] looking for SiStripNoises for a strip out of range: strip 0
----- End Fatal Exception -------------------------------------------------

Full log: link

@iarspider
Copy link
Contributor Author

assign reconstruction

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 4, 2024

New categories assigned: reconstruction

@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 4, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 4, 2024

A new Issue was created by @iarspider.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Oct 4, 2024

[SiStripNoises::getNoise] looking for SiStripNoises for a strip out of range: strip 0

this in all likelihood is coming from the GT updates in #46184

@makortel
Copy link
Contributor

makortel commented Oct 4, 2024

assign simulation, alca

@makortel
Copy link
Contributor

makortel commented Oct 4, 2024

unassign reconstruction

MixingModule is not part of reconstruction

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 4, 2024

New categories assigned: simulation,alca

@atpathak,@civanch,@consuegs,@kpedro88,@mdhildreth,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

@iarspider
Copy link
Contributor Author

@cms-sw/alca-l2 @cms-sw/simulation-l2 This is still happening - see e.g. 11601.0

----- Begin Fatal Exception 11-Oct-2024 08:39:03 CEST-----------------------
An exception of category 'CorruptedData' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'HLTAnalyzerEndpath'
   [2] Prefetching for module L1TRawToDigi/'hltGtStage2Digis'
   [3] Prefetching for module RawDataCollectorByLabel/'rawDataCollector'
   [4] Prefetching for module SiStripDigiToRawModule/'SiStripDigiToRaw'
   [5] Calling method for module MixingModule/'mix'
Exception Message:
[SiStripNoises::getNoise] looking for SiStripNoises for a strip out of range: strip 0
----- End Fatal Exception -------------------------------------------------

@perrotta
Copy link
Contributor

[SiStripNoises::getNoise] looking for SiStripNoises for a strip out of range: strip 0

this in all likelihood is coming from the GT updates in #46184

More likely #46124, which included the latest GT updates for 2022MC used in 11601.0 (even though the SiStripNoises, tag was untouched in it)

@iarspider , maybe a naive question: the fact that it only shows up in the DBG_X builds couldn't point out to some wrong memory management in some Debug part of the code?

@mmusich
Copy link
Contributor

mmusich commented Oct 11, 2024

the fact that it only shows up in the DBG_X builds couldn't point out to some wrong memory management in some Debug part of the code?

Nope:

static float getNoise(uint16_t strip, const Range& range) {
#ifdef EDM_ML_DEBUG
verify(strip, range);
#endif
return getNoiseFast(strip, range);
}

void SiStripNoises::verify(uint16_t strip, const Range& range) {
if (9 * strip >= (range.second - range.first) * 8)
throw cms::Exception("CorruptedData")
<< "[SiStripNoises::getNoise] looking for SiStripNoises for a strip out of range: strip " << strip;
}

verify - which is throwing now, is executed only in debug mode (because it's very time consuming)

@mmusich
Copy link
Contributor

mmusich commented Oct 11, 2024

(even though the SiStripNoises, tag was untouched in it)

comparing the two versions of the GTs updated in PR #46124, I see:

conddb diff 140X_mcRun3_2022_realistic_v9 140X_mcRun3_2022_realistic_v3 | grep SiStrip
[2024-10-11 13:15:56,066] INFO: Connecting to pro [frontier://PromptProd/cms_conditions]
SiStripBadChannelRcd                 -               SiStripBadComponents_realisticMC_for2022_pre_EE_v0_mc            SiStripBadComponents_realisticMC_for2022_v2_mc            

IF SiStripBadComponents_realisticMC_for2022_pre_EE_v0_mc unmasked a strip that was otherwise masked before, it might have uncovered an underlying issue with the existing SiStripNoise tag.

@perrotta
Copy link
Contributor

Thank you @mmusich

Then the message points out to some range where range.first is supposedly larger than range.second (because strip is 0 in the log message). That range is computed in CondTools/SiStrip/plugins/SiStripNoisesAndBadCompsChecker.cc

If we want to search the origin of it in the GT update, then the tag for SiStripBadChannel was updated in them, see the diff here: probably @cms-sw/trk-dpg-l2 can further check if those updated conditions may have originated the wrong range.

@mmusich
Copy link
Contributor

mmusich commented Oct 11, 2024

That range is computed in CondTools/SiStrip/plugins/SiStripNoisesAndBadCompsChecker.cc

again no. This is just a utility to fix the payloads exactly for these use cases (making sure that the noise and bad components are covering the same sets of strips).

@mmusich
Copy link
Contributor

mmusich commented Oct 11, 2024

If we want to search the origin of it in the GT update, then the tag for SiStripBadChannel was updated in them

this is exactly the content of #46250 (comment) I guess messages crossing...

@mmusich
Copy link
Contributor

mmusich commented Oct 11, 2024

Something looks very wrong with SiStripBadComponents_realisticMC_for2022_pre_EE_v0_mc.
It's not masking a lot of channel that should be masked.

image

see difference w.r.t the previous tag SiStripBadComponents_realisticMC_for2022_v2_mc

image

@perrotta @cms-sw/alca-l2 why was this payload admitted to the queue?

@mmusich
Copy link
Contributor

mmusich commented Oct 15, 2024

As far as I understand #46379 will fix this issue.

@mmusich
Copy link
Contributor

mmusich commented Oct 19, 2024

Looks like #46379 (as expected) fixed the problem, see e.g. log of wf11601.0 in CMSSW_14_2_DBG_X_2024-10-17-2300, thus this issue could be signed and closed.

@iarspider
Copy link
Contributor Author

@cmsbuild please close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants