Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLT crashes in run 359297 from module EcalRecHitProducer:hltEcalRecHitWithoutTPs #39568

Closed
trocino opened this issue Oct 1, 2022 · 31 comments
Closed

Comments

@trocino
Copy link
Contributor

trocino commented Oct 1, 2022

Several HLT jobs crashed during run 359297, all due to module EcalRecHitProducer:hltEcalRecHitWithoutTPs. Before crashing, the following error message appears:
cmsRun: /.../cmssw/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_9/src/CalibCalorimetry/EcalLaserAnalyzer/src/MEEBGeom.cc:31: static int MEEBGeom::sm(MEEBGeom::EBGlobalCoord, MEEBGeom::EBGlobalCoord): Assertion ``ieta > 0 && ieta <= 85' failed.
The full log output for several such cases, including the stack trace, can be found on EOS:
/eos/cms/store/user/trocino/HLT_ECAL_Debug/LogOutput/
ROOT RAW files containing all incriminated events can be found at
/eos/cms/store/user/trocino/HLT_ECAL_Debug/EdmRawRoot/

Please note that the error does not seem to be reproducible on LXPLUS (probably because it runs on CPUs), while it's reproducible on machines with GPUs, e.g. Hilton machines.

A recipe to reproduce the errors:

cmsrel CMSSW_12_4_9
cd CMSSW_12_4_9/src
cmsenv
hltGetConfiguration  run:359297  --globaltag 124X_dataRun3_HLT_v4  --process HLT  --data  --unprescale  --input /store/user/trocino/HLT_ECAL_Debug/EdmRawRoot/run359297_ls0232_index000269_fu-c2b05-14-01_pid3023154.root  --output all  > hlt.py
cmsRun hlt.py
@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 1, 2022

A new Issue was created by @trocino Daniele Trocino.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@missirol
Copy link
Contributor

missirol commented Oct 1, 2022

assign ecal-dpg

I think the crash occurs only if running the ECAL unpacker on GPU. One can check this by appending

del process.hltEcalUncalibRecHit.cuda
del process.hltEcalRecHit.cuda

which, I think, runs the RecHit producers on CPU, but still running the unpacker on GPU (adding also del process.hltEcalDigis.cuda makes the crash disappear).

Adding some printouts shows that behind the crash there is a RecHit with invalid detId (the value of the invalid detId is not always the same for the different crashes seen during run 359297, the one below is only one example):

DetId::subdetId() = EcalBarrel
DetId::rawId() = 838860888
EBDetId::hashedIndex() = 30687
EBDetId::ieta() = 0
EBDetId::iphi() = 88
EBDetId::zside() = -1
EBDetId::validDetId(ieta, iphi) = false

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 1, 2022

New categories assigned: ecal-dpg

@simonepigazzini,@jainshilpi,@thomreis you have been requested to review this Pull request/Issue and eventually sign? Thanks

@thomreis
Copy link
Contributor

thomreis commented Oct 3, 2022

This example is indeed an invalid detector id. We will take a look at the digis from the GPU unpacker to see why this happens.

The RecHit producer itself always runs on CPU at the moment but with different input collections on machines with a GPU. Therefore, an issue on the GPU unpacker can lead to a crash in the RecHit producer.

@missirol
Copy link
Contributor

missirol commented Oct 6, 2022

FYI: @cms-sw/hlt-l2 @cms-sw/heterogeneous-l2

(HLT crashes, seemingly specific to reconstruction on GPUs)

@perrotta
Copy link
Contributor

perrotta commented Nov 2, 2022

urgent
(marking urgent the issues affecting online workflows)

@cmsbuild cmsbuild added the urgent label Nov 2, 2022
@missirol
Copy link
Contributor

assign hlt

(To make sure this remains on HLT's radar.)

@cmsbuild
Copy link
Contributor

New categories assigned: hlt

@missirol,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

@missirol
Copy link
Contributor

missirol commented May 21, 2023

@cms-sw/ecal-dpg-l2

Today, during collisions, there was a crash at HLT which looks similar to the one described here, see
http://cmsonline.cern.ch/cms-elog/1183558

@thomreis
Copy link
Contributor

Hi @missirol , this last instance is likely caused by a tower in EB-01 that has data integrity problems. It is mostly contained in one tower which could be masked as a short term solution if needed. See also slide 7 of last week's ECAL PFG shifter report https://indico.cern.ch/event/1288622/contributions/5414918/attachments/2650937/4590074/PFG_week_20_report_Orlandi.pdf

@thomreis
Copy link
Contributor

FYI @grasph

@missirol
Copy link
Contributor

missirol commented May 22, 2023

@thomreis , I reproduced the latest crash on lxplus-gpu with

./test.sh 367771 1

using the script copied in [1].

Like for the first crash described in this issue, it does not occur if the GPU reconstruction is disabled.

[1]

#!/bin/bash

# cmsrel CMSSW_13_0_6
# cd CMSSW_13_0_6/src
# cmsenv
# # save this file as test.sh
# chmod u+x test.sh
# ./test.sh 367771 4 # runNumber nThreads

[ $# -eq 2 ] || exit 1

RUNNUM="${1}"
NUMTHREADS="${2}"

ERRDIR=/eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/FOG/error_stream
RUNDIR="${ERRDIR}"/run"${RUNNUM}"

for dirPath in $(ls -d "${RUNDIR}"*); do
  # require at least one non-empty FRD file
  [ $(cd "${dirPath}" ; find -maxdepth 1 -size +0 | grep .raw | wc -l) -gt 0 ] || continue
  runNumber="${dirPath: -6}"
  JOBTAG=test_run"${runNumber}"
  HLTMENU="--runNumber ${runNumber}"
  hltConfigFromDB ${HLTMENU} > "${JOBTAG}".py
  cat <<EOF >> "${JOBTAG}".py
process.options.numberOfThreads = ${NUMTHREADS}
process.options.numberOfStreams = 0
process.hltOnlineBeamSpotESProducer.timeThreshold = int(1e6)
del process.PrescaleService
del process.MessageLogger
process.load('FWCore.MessageService.MessageLogger_cfi')
import os
import glob
process.source.fileListMode = True
process.source.fileNames = sorted([foo for foo in glob.glob("${dirPath}/*raw") if os.path.getsize(foo) > 0])
process.EvFDaqDirector.buBaseDir = "${ERRDIR}"
process.EvFDaqDirector.runNumber = ${runNumber}
process.hltDQMFileSaverPB.runNumber = ${runNumber}
# remove paths containing OutputModules
streamPaths = [pathName for pathName in process.finalpaths_()]
for foo in streamPaths:
    process.__delattr__(foo)
EOF
  rm -rf run"${runNumber}"
  mkdir run"${runNumber}"
  echo "run${runNumber} .."
  cmsRun "${JOBTAG}".py &> "${JOBTAG}".log
  echo "run${runNumber} .. done (exit code: $?)"
  unset runNumber
done
unset dirPath

@hannahbnelson
Copy link

There was another instance of this crash in run 368547 (1 crash).
f3mon_run368547.txt

@thomreis
Copy link
Contributor

thomreis commented Jun 6, 2023

Is this the first new crash in the last two weeks since the one in 367771?

@missirol
Copy link
Contributor

missirol commented Jun 6, 2023

Yes, as far as I know (we monitor the crashes semi-automatically, it's possible we can miss one, but I don't think we missed any in this case).

@missirol
Copy link
Contributor

Reporting another HLT crash of this kind.

  • Run 368547 (pp collisions)
  • Release: CMSSW_13_0_7
  • Full log from DAQ: f3mon_run368547.txt
  • Piece of stack trace:
cmsRun: /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_7-el8_amd64_gcc11/build/CMSSW_13_0_7-build/tmp/BUILDROOT/9019b82ce41695dd3e01c9d81cd67c61/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/src/CalibCalorimetry/EcalLaserAnalyzer/src/MEEBGeom.cc:31: static int MEEBGeom::sm(MEEBGeom::EBGlobalCoord, MEEBGeom::EBGlobalCoord): Assertion `ieta > 0 && ieta <= 85' failed.


A fatal system signal has occurred: abort signal
The following is the call stack containing the origin of the signal.

Tue Jun 6 12:03:13 CEST 2023
Thread 23 (Thread 0x7f3ca17fd700 (LWP 1800032) "cmsRun"):
#0 0x00007f3d7b47ea71 in poll () from /lib64/libc.so.6
#1 0x00007f3d744f046f in full_read.constprop () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#2 0x00007f3d744bbb6c in edm::service::InitRootHandlers::stacktraceFromThread() () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3 0x00007f3d744bc33b in sig_dostack_then_abort () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4 <signal handler called>
#5 0x00007f3d7b3c437f in raise () from /lib64/libc.so.6
#6 0x00007f3d7b3aedb5 in abort () from /lib64/libc.so.6
#7 0x00007f3d7b3aec89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#8 0x00007f3d7b3bca76 in __assert_fail () from /lib64/libc.so.6
#9 0x00007f3d1a4ca313 in MEEBGeom::sm(int, int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libCalibCalorimetryEcalLaserAnalyzer.so
#10 0x00007f3d1a4ca389 in MEEBGeom::dcc(int, int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libCalibCalorimetryEcalLaserAnalyzer.so
#11 0x00007f3d1a4cbe12 in MEEBGeom::lmr(int, int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libCalibCalorimetryEcalLaserAnalyzer.so

@missirol
Copy link
Contributor

Reporting another HLT crash of this kind.

  • Run 368724 (pp collisions)
  • Release: CMSSW_13_0_7
  • Full log from DAQ: f3mon_run368724.txt
  • Piece of stack trace:
cmsRun: /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_7-el8_amd64_gcc11/build/CMSSW_13_0_7-build/tmp/BUILDROOT/9019b82ce41695dd3e01c9d81cd67c61/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/src/CalibCalorimetry/EcalLaserAnalyzer/src/MEEBGeom.cc:31: static int MEEBGeom::sm(MEEBGeom::EBGlobalCoord, MEEBGeom::EBGlobalCoord): Assertion `ieta > 0 && ieta <= 85' failed.

A fatal system signal has occurred: abort signal
The following is the call stack containing the origin of the signal.
Sun Jun 11 08:23:27 CEST 2023

(..)

Thread 7 (Thread 0x7f26c2ffe700 (LWP 3227352) "cmsRun"):
#0  0x00007f2739f9aa71 in poll () from /lib64/libc.so.6
#1  0x00007f2730ed846f in full_read.constprop () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#2  0x00007f2730ea3b6c in edm::service::InitRootHandlers::stacktraceFromThread() () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  0x00007f2730ea433b in sig_dostack_then_abort () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f2739ee037f in raise () from /lib64/libc.so.6
#6  0x00007f2739ecadb5 in abort () from /lib64/libc.so.6
#7  0x00007f2739ecac89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#8  0x00007f2739ed8a76 in __assert_fail () from /lib64/libc.so.6
#9  0x00007f26d8f9d313 in MEEBGeom::sm(int, int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libCalibCalorimetryEcalLaserAnalyzer.so
#10 0x00007f26d8f9d389 in MEEBGeom::dcc(int, int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libCalibCalorimetryEcalLaserAnalyzer.so
#11 0x00007f26d8f9ee12 in MEEBGeom::lmr(int, int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libCalibCalorimetryEcalLaserAnalyzer.so
#12 0x00007f26d909f0f4 in EcalLaserDbService::getLaserCorrection(DetId const&, edm::Timestamp const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libCalibCalorimetryEcalLaserCorrection.so
#13 0x00007f26d9245502 in EcalRecHitWorkerSimple::run(edm::Event const&, EcalUncalibratedRecHit const&, edm::SortedCollection<EcalRecHit, edm::StrictWeakOrdering<EcalRecHit> >&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoLocalCaloEcalRecProducersPlugins.so
#14 0x00007f26d9236050 in EcalRecHitProducer::produce(edm::Event&, edm::EventSetup const&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoLocalCaloEcalRecProducersPlugins.so
#15 0x00007f273c9e795d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so
#16 0x00007f273c9ce072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so
#17 0x00007f273c95a6da in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so
#18 0x00007f273c95ab88 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so
#19 0x00007f273c6aff79 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreConcurrency.so
#20 0x00007f273b12c304 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7f25e5defe00, waiter=..., this=0x7f2735f93b00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/task_dispatcher.h:322
#21 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7f2735f93b00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/task_dispatcher.h:458
#22 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/arena.cpp:137
#23 tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/market.cpp:599
#24 0x00007f273b12e4c6 in tbb::detail::r1::rml::private_worker::run (this=0x7f2735f6fe80) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/private_server.cpp:271
#25 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7f2735f6fe80) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/private_server.cpp:221
#26 0x00007f273a27817a in start_thread () from /lib64/libpthread.so.0
#27 0x00007f2739fa5df3 in clone () from /lib64/libc.so.6

(..)

Current Modules:
Module: EcalRecHitProducer:hltEcalRecHit (crashed)
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidates
Module: SeedGeneratorFromProtoTracksEDProducer:hltIter0IterL3FromL1MuonPixelSeedsFromPixelTracks
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoublets
Module: CorrectedCaloJetProducer:hltAK4CaloJetsCorrected
Module: CorrectedPFJetProducer:hltAK4PFJetsTightIDVBFCorrected
Module: MultiHitFromChi2EDProducer:hltDisplacedhltIter4PFlowPixelLessHitTripletsForTau
Module: none
Module: PFMultiDepthClusterProducer:hltParticleFlowClusterHCAL
Module: HLTEcalRecHitInAllL1RegionsProducer:hltRechitInRegionsECAL
Module: PFRecHitProducer:hltParticleFlowRecHitHBHE
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidatesOpenMu
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidates
Module: CkfTrackCandidateMaker:hltDisplacedhltIter4PFlowCkfTrackCandidatesForTau
Module: SiPixelDigisClustersFromSoAPhase1:hltSiPixelClustersFromSoA
Module: MuonIdProducer:hltGlbTrkMuonsLowPtIter01Merge
Module: TSGForOIDNN:hltIterL3OISeedsFromL2Muons
Module: HitPairEDProducer:hltElePixelHitDoubletsForTripletsUnseeded
Module: none
Module: TriggerSummaryProducerAOD:hltTriggerSummaryAOD
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoubletsUnseeded
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoublets
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidatesNoVtx
Module: MuonIdProducer:hltIterL3MuonsNoVtx
Module: TriggerSummaryProducerRAW:hltTriggerSummaryRAW
Module: CkfTrackCandidateMaker:hltIter0PFlowCkfTrackCandidates
Module: none
Module: HLTL1TSeed:hltL1sMu18erTau26er2p1Jet55
Module: CSCRecHitDProducer:hltCsc2DRecHits
Module: HBHEPhase1Reconstructor:hltHbherecoLegacy
Module: PFMultiDepthClusterProducer:hltParticleFlowClusterHCAL
Module: EcalRecHitProducer:hltEcalRecHit
A fatal system signal has occurred: abort signal

@missirol
Copy link
Contributor

@thomreis , these crashes are not frequent, but they continue to happen.
I'm wondering if there is an ETA for a fix. I don't have a sense of how difficult this is.

@thomreis
Copy link
Contributor

Hi @missirol there is no ETA yet. I have just started to look into this today and manage to reproduce the the crash with your recipe.

@thomreis
Copy link
Contributor

Hi @missirol are the error_stream files for the crashes in runs 368547 and 368724 available somewhere? I would like to check if the fix for 367771 also avoids the other two crashes.

@missirol
Copy link
Contributor

Hi @thomreis , we have the files for run-368724 [1]. I will request the files for run-368547, and share them here if they are still available.

[1]

/eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/FOG/error_stream/run368724

@missirol
Copy link
Contributor

Hi @thomreis , below is the path to the error-stream files of run-368547.

/eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/FOG/error_stream/run368547

@thomreis
Copy link
Contributor

PR #41977 should avoid crashes like these in the future. Backports will follow.

@missirol
Copy link
Contributor

Just for the record, the last crash of this kind was seen in run-370293.

The corresponding input files can be found in

/eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/FOG/error_stream/run370293

@missirol
Copy link
Contributor

+hlt

@thomreis provided a fix for these crashes in #41977, then backported and integrated in CMSSW_13_0_10, Even though not a lot of data was taken with 13_0_10 at HLT yet, the fix was tested on the reproducers, so I think it'd be okay to close this issue (if needed, it can be re-opened). A follow-up of #41977 is in #42301.

@thomreis
Copy link
Contributor

Just for the record, the last crash of this kind was seen in run-370293.

* Run 370293 (pp collisions)

* Release: `CMSSW_13_0_9`

* Full log from DAQ: [f3mon_run370293.txt](https://github.com/cms-sw/cmssw/files/12103083/f3mon_run370293.txt)

The corresponding input files can be found in

/eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/FOG/error_stream/run370293

I ran some tests and this crash would not have happened with the fix already in 13_0_10 and also not with the improved fix in #42301

@missirol
Copy link
Contributor

missirol commented Jul 20, 2023

Thanks for checking, @thomreis. Do you want to sign off this issue for ECAL , before I close it ?

@thomreis
Copy link
Contributor

thomreis commented Jul 20, 2023

+ecal-dpg

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@missirol
Copy link
Contributor

please close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants