Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLT crash in run-367906 (sistrip::FEDBuffer::findChannels()) #41786

Open
missirol opened this issue May 28, 2023 · 41 comments
Open

HLT crash in run-367906 (sistrip::FEDBuffer::findChannels()) #41786

missirol opened this issue May 28, 2023 · 41 comments

Comments

@missirol
Copy link
Contributor

In run-367906 (pp collisions), DAQ reported 1 CMSSW crash at HLT (release: CMSSW_13_0_6) [link to HLT elog].

The stack trace is attached (f3mon_run367906.txt). A piece of stack trace which is possibly relevant is in [1].

The corresponding error-stream files are available, but first attempts to reproduce the crashes offline failed (tried on "Hilton" HLT node).

The recipe used for those failed attempts is adapted in [2] to be valid for lxplus and lxplus-gpu.

FYI: @cms-sw/hlt-l2 @silviodonato @fwyzard @mzarucki @trtomei

[1]

msgtime:2023-05-24 22:37:12
doc_type:cmsswlog
date:2023-05-24T20:37:12.106Z
run:367906
host:fu-c2b03-18-01
pid:2793118
doctype:stacktrace
severity:FATAL
severityVal:4
instance:global
lexicalId:549852445
message:A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.
Wed May 24 22:36:52 CEST 2023

(..)

Thread 6 (Thread 0x7fe97ea4f700 (LWP 2794125) "cmsRun"):
#0  0x00007fe9f3d60a71 in poll () from /lib64/libc.so.6
#1  0x00007fe9eac9846f in full_read.constprop () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#2  0x00007fe9eac63b6c in edm::service::InitRootHandlers::stacktraceFromThread() () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  0x00007fe9eac6433b in sig_dostack_then_abort () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007fe990ee5092 in sistrip::FEDBuffer::findChannels() () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libEventFilterSiStripRawToDigi.so
#6  0x00007fe990f5a21e in (anonymous namespace)::ClusterFiller::fill(edmNew::DetSetVector<SiStripCluster>::TSFastFiller&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoLocalTrackerSiStripCluste\
rizerPlugins.so
#7  0x00007fe9940a04bd in StMeasurementDetSet::getDetSet(int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#8  0x00007fe9940a08a6 in TkStripMeasurementDet::empty(MeasurementTrackerEvent const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#9  0x00007fe9940a30f1 in TkGluedMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw\
/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#10 0x00007fe99400e347 in LayerMeasurements::groupedMeasurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_\
amd64_gcc11/libTrackingToolsMeasurementDet.so
#11 0x00007fe8f21a01b1 in GroupedCkfTrajectoryBuilder::advanceOneLayer(TrajectorySeed const&, TempTrajectory&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<T\
empTrajectory, std::allocator<TempTrajectory> >&) const [clone .constprop.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#12 0x00007fe8f219338d in GroupedCkfTrajectoryBuilder::groupedLimitedCandidates(TrajectorySeed const&, TempTrajectory const&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&\
) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#13 0x00007fe8f2196846 in GroupedCkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /opt/offline/el8_amd64_gcc11/cms\
/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#14 0x00007fe8f2150263 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_a\
md64_gcc11/libRecoTrackerCkfPattern.so
#15 0x00007fe8f2151ceb in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#16 0x00007fe9f67ad95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gc\
c11/libFWCoreFramework.so
#17 0x00007fe9f6794072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libFWCo\
reFramework.so
#18 0x00007fe9f67206da in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm:\
:EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /opt/offline/el8_amd64_gcc11/c\
ms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libFWCoreFramework.so
#19 0x00007fe9f6720b88 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libFWCoreFramework.so
#20 0x00007fe9f6475f79 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libFWCore\
Concurrency.so
#21 0x00007fe9f4ef2304 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7fe82e94ab00, waiter=..., this=0x7fe9efd53780) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_\
2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/task_dispatcher.h:322
#22 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7fe9efd53780) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-bui\
ld/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/task_dispatcher.h:458
#23 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f\
6c08f7b1/tbb-v2021.8.0/src/tbb/arena.cpp:137
#24 tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6\
c08f7b1/tbb-v2021.8.0/src/tbb/market.cpp:599
#25 0x00007fe9f4ef44c6 in tbb::detail::r1::rml::private_worker::run (this=0x7fe9efd30100) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb\
5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/private_server.cpp:271
#26 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fe9efd30100) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6\
d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/private_server.cpp:221
#27 0x00007fe9f403e17a in start_thread () from /lib64/libpthread.so.0
#28 0x00007fe9f3d6bdf3 in clone () from /lib64/libc.so.6

(..)

Current Modules:
Module: CkfTrackCandidateMaker:hltIter0PFlowCkfTrackCandidates (crashed)
Module: CkfTrackCandidateMaker:hltMuCkfTrackCandidates
Module: PFBlockProducer:hltParticleFlowBlockForDisplTaus
Module: PFBlockProducer:hltParticleFlowBlock
Module: CkfTrackCandidateMaker:hltIter0IterL3FromL1MuonCkfTrackCandidates
Module: PFClusterProducer:hltParticleFlowClusterHBHE
Module: CkfTrackCandidateMaker:hltIter0PFlowCkfTrackCandidates
Module: HcalDigisProducerGPU:hltHcalDigisGPU
Module: none
Module: BeamSpotToCUDA:hltOnlineBeamSpotToGPU
Module: TrackProducer:hltIter0PFlowCtfWithMaterialTracks
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidates
Module: none
Module: PFMultiDepthClusterProducer:hltParticleFlowClusterHCAL
Module: none
Module: CkfTrackCandidateMaker:hltIter0PFlowCkfTrackCandidates
Module: HcalCPURecHitsProducer:hltHbherecoFromGPU
Module: CkfTrackCandidateMaker:hltDisplacedhltIter4PFlowCkfTrackCandidatesForTau
Module: PFRecHitProducer:hltParticleFlowRecHitPSUnseeded
Module: PixelTrackProducerFromSoAPhase1:hltPixelTracks
Module: CkfTrackCandidateMaker:hltDisplacedhltIter4PFlowCkfTrackCandidatesForTau
Module: none
Module: none
Module: SiPixelRecHitCUDAPhase1:hltSiPixelRecHitsGPU
Module: SiPixelRecHitFromCUDAPhase1:hltSiPixelRecHitsFromGPU
Module: HBHERecHitProducerGPU:hltHbherecoGPU
Module: EcalUncalibRecHitProducerGPU:hltEcalUncalibRecHitGPU
Module: FastjetJetProducer:hltAK4CaloJets
Module: CAHitNtupletCUDAPhase1:hltPixelTracksGPU
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidatesNoVtx
Module: SiPixelDigisSoAFromCUDA:hltSiPixelDigisSoA
Module: PFBlockProducer:hltParticleFlowBlockCPUOnly
A fatal system signal has occurred: segmentation violation

[2]

#!/bin/bash

# cmsrel CMSSW_13_0_6
# cd CMSSW_13_0_6/src
# cmsenv
# # save this file as test.sh
# chmod u+x test.sh
# ./test.sh 367906 4 # runNumber nThreads

[ $# -eq 2 ] || exit 1

RUNNUM="${1}"
NUMTHREADS="${2}"

ERRDIR=/eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/FOG/error_stream
RUNDIR="${ERRDIR}"/run"${RUNNUM}"

for dirPath in $(ls -d "${RUNDIR}"*); do
  # require at least one non-empty FRD file
  [ $(cd "${dirPath}" ; find -maxdepth 1 -size +0 | grep .raw | wc -l) -gt 0 ] || continue
  runNumber="${dirPath: -6}"
  JOBTAG=test_run"${runNumber}"
  HLTMENU="--runNumber ${runNumber}"
  hltConfigFromDB ${HLTMENU} > "${JOBTAG}".py
  cat <<EOF >> "${JOBTAG}".py
process.options.numberOfThreads = ${NUMTHREADS}
process.options.numberOfStreams = 0
process.hltOnlineBeamSpotESProducer.timeThreshold = int(1e6)
del process.PrescaleService
del process.MessageLogger
process.load('FWCore.MessageService.MessageLogger_cfi')
import os
import glob
process.source.fileListMode = True
process.source.fileNames = sorted([foo for foo in glob.glob("${dirPath}/*raw") if os.path.getsize(foo) > 0])
process.EvFDaqDirector.buBaseDir = "${ERRDIR}"
process.EvFDaqDirector.runNumber = ${runNumber}
process.hltDQMFileSaverPB.runNumber = ${runNumber}
# remove paths containing OutputModules
streamPaths = [pathName for pathName in process.finalpaths_()]
for foo in streamPaths:
    process.__delattr__(foo)
EOF
  rm -rf run"${runNumber}"
  mkdir run"${runNumber}"
  echo "run${runNumber} .."
  cmsRun "${JOBTAG}".py &> "${JOBTAG}".log
  echo "run${runNumber} .. done (exit code: $?)"
  unset runNumber
done
unset dirPath
@cmsbuild
Copy link
Contributor

A new Issue was created by @missirol Marino Missiroli.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@missirol
Copy link
Contributor Author

assign hlt

(I let others assign to other groups, if needed.)

@cmsbuild
Copy link
Contributor

New categories assigned: hlt

@missirol,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

@missirol
Copy link
Contributor Author

The corresponding error-stream files are available, but first attempts to reproduce the crashes offline failed (tried on Hilton machine).

This is another instance of recent HLT crashes that I can't reproduce offline (see for example #40174, #41741 and #41742).

This time I can also include the full log of the CMSSW job that crashed (see [1]), but I don't know if that helps.

  • The log contains a large number of log-warnings and log-errors which I don't see when running on the 200 events of the error-stream files [2].
  • At the same time, the job processed more than those 200 events, and I guess it's possible that those 200 events didn't issue any log-errors or log-warnings even online.
  • It's also possible that somehow the 200 events in the error-stream files do not contain the event that caused the crash (we have seen this happen already in recent weeks, see comment on run-366469 in CMSLITOPS-411).

@smorovic , is it possible to draw any conclusions comparing the log of the CMSSW job [1] and the content of the error-stream files [2] ?

[1] old_hlt_run367906_pid2793118.log

[2] /eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/FOG/error_stream/run367906/

@smorovic
Copy link
Contributor

Event IDs in two raw files:

run367906_ls0056_index000213_fu-c2b03-18-01_pid2793118.raw
128082587 - 128091658

run367906_ls0056_index000236_fu-c2b03-18-01_pid2793118.raw
128183442 - 128186805

Last message in the log is from one of previous events (file):

%MSG-e TrajectoryNotPosDef:   TrackProducer:hltL3NoFiltersTkTracksFromL2IOHitNoVtx 24-May-2023 22:36:51 CEST  Run: 367906 Event:  127979616
Trajectory covariance is not positive-definite
%MSG

Timestamps of last few files appearing locally at hltd for that process (last 3).

INFO:2023-05-24 22:36:49 - processIndexFile - RUN:367906 - run367906_ls0056_index000189_pid2793118.jsn

INFO:2023-05-24 22:36:51 - processIndexFile - RUN:367906 - run367906_ls0056_index000213_pid2793118.jsn
INFO:2023-05-24 22:36:52 - processIndexFile - RUN:367906 - run367906_ls0056_index000236_pid2793118.jsn
INFO:2023-05-24 22:37:04 - processCRASHfile - RUN:367906 - 'run367906_ls0000_crash_pid2793118.jsn' with errcode: -11
INFO:2023-05-24 22:37:04 - processCRASHFile - RUN:367906 - inputFileList: run367906_ls0056_index000213_fu-c2b03-18-01_pid2793118.raw,run367906_ls0056_index000236_fu-c2b03-18-01_pid2793118.raw

However, this looks ok. Last two open files by the process were also saved, older ones were alread handled and closed.
Source keeps up to 2 files open and buffered at the time.

For the crash, there is no information of event ID (only for Exception this is known).

@makortel
Copy link
Contributor

assign reconstruction

FYI @cms-sw/tracking-pog-l2

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction

@mandrenguyen,@clacaputo you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

Possibly incidental, but there are two other threads in StMeasurementDetSet::getDetSet(int) at the time of the crash

Thread 36 (Thread 0x7fe8a65ff700 (LWP 2794392) "cmsRun"):
#2  0x00007fe9eac60ed0 in sig_pause_for_stacktrace () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007fe990f58e90 in (anonymous namespace)::ClusterFiller::fill(edmNew::DetSetVector<SiStripCluster>::TSFastFiller&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoLocalTrackerSiStripClusterizerPlugins.so
#5  0x00007fe9940a04bd in StMeasurementDetSet::getDetSet(int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#6  0x00007fe9940a08a6 in TkStripMeasurementDet::empty(MeasurementTrackerEvent const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#7  0x00007fe9940a30f1 in TkGluedMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#8  0x00007fe99400e347 in LayerMeasurements::groupedMeasurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libTrackingToolsMeasurementDet.so
#9  0x00007fe8f21a01b1 in GroupedCkfTrajectoryBuilder::advanceOneLayer(TrajectorySeed const&, TempTrajectory&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const [clone .constprop.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#10 0x00007fe8f219338d in GroupedCkfTrajectoryBuilder::groupedLimitedCandidates(TrajectorySeed const&, TempTrajectory const&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#11 0x00007fe8f2196846 in GroupedCkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#12 0x00007fe8f2150263 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#13 0x00007fe8f2151ceb in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#14 0x00007fe9f67ad95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libFWCoreFramework.so
#15 0x00007fe9f6794072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 21 (Thread 0x7fe91dbfe700 (LWP 2794140) "cmsRun"):
#2  0x00007fe9eac60ed0 in sig_pause_for_stacktrace () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007fe9940a0480 in StMeasurementDetSet::getDetSet(int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#5  0x00007fe9940a08a6 in TkStripMeasurementDet::empty(MeasurementTrackerEvent const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#6  0x00007fe9940a30f1 in TkGluedMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#7  0x00007fe99400e347 in LayerMeasurements::groupedMeasurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libTrackingToolsMeasurementDet.so
#8  0x00007fe8f21a01b1 in GroupedCkfTrajectoryBuilder::advanceOneLayer(TrajectorySeed const&, TempTrajectory&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const [clone .constprop.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#9  0x00007fe8f219338d in GroupedCkfTrajectoryBuilder::groupedLimitedCandidates(TrajectorySeed const&, TempTrajectory const&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#10 0x00007fe8f2196846 in GroupedCkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#11 0x00007fe8f2150263 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#12 0x00007fe8f2151ceb in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#13 0x00007fe9f67ad95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libFWCoreFramework.so
#14 0x00007fe9f6794072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_6/lib/el8_amd64_gcc11/libFWCoreFramework.so

@makortel
Copy link
Contributor

makortel commented May 30, 2023

So threads 36 and 6 (crashing one) are operating on the same StMeasurementDetSet object (address 0x00007fe9940a04bd). The code of StMeasurementDetSet::detSet() and StMeasurementDetSet::getDetSet() are technically not thread safe

const StripDetset& detSet(int i) const {
if (ready_[i])
const_cast<StMeasurementDetSet*>(this)->getDetSet(i);
return detSet_[i];
}

void getDetSet(int i) {
if (detIndex_[i] >= 0) {
detSet_[i].set(*handle_, handle_->item(detIndex_[i]));
empty_[i] = false; // better be false already
incAct();
} else { // we should not be here
detSet_[i] = StripDetset();
empty_[i] = true;
}
ready_[i] = false;
incSet();
}

std::vector<bool> empty_;
std::vector<bool> activeThisEvent_;
// full reco
std::vector<StripDetset> detSet_;
std::vector<int> detIndex_;
std::vector<bool> ready_; // to be cleaned

I'm assuming the detIndex_ does not change during the event processing, the elements of empty_ and ready_ are accessed and modified without any protection.

On a cursory look the edmNew::DetSet<SiStripCluster>::set() (called on line 232 above) looks like it would be thread safe. Both threads end up calling ClusterFiller::fill(), but it could be different elements of i.

Another possible thread-safety problem is in edmNew::DetSetVector<T>::update()

template <typename T>
inline void DetSetVector<T>::update(const Item& item) const {
// no m_getter or already updated
if (!m_getter) {
assert(item.isValid());
return;
}
if (item.initialize()) {
assert(item.initializing());
{
TSFastFiller ff(*this, item);
static_cast<Getter*>(m_getter.get())->fill(ff);
}
assert(item.isValid());
}
}

Here the m_getter is defined as
std::shared_ptr<void> m_getter;

but in practice is used as pointer to Getter which is defined as
typedef dslv::LazyGetter<T> Getter;

and the LazyGetter<T>::fill() is not defined as const!
template <typename T>
class LazyGetter {
public:
virtual ~LazyGetter() {}
virtual void fill(typename DetSetVector<T>::TSFastFiller&) = 0;
};
} // namespace dslv

So if the concrete LazyGetter<T>::fill() is not thread-safe, it could cause problems. In this case the concrete LazyGetter<T> is ClusterFiller
void ClusterFiller::fill(StripClusterizerAlgorithm::output_t::TSFastFiller& record) {

(which I haven't digested yet)

Note that despite of all I wrote above, I can't tell from the stack trace if the problem is really in thread safety or something else.

@makortel
Copy link
Contributor

makortel commented Jun 2, 2023

the LazyGetter<T>::fill() is not defined as const!

This part is now addressed in #41853 . It helped me to reach conclusion that the

void ClusterFiller::fill(StripClusterizerAlgorithm::output_t::TSFastFiller& record) {

looks like it would be thread safe.

@makortel
Copy link
Contributor

makortel commented Jun 5, 2023

The code of StMeasurementDetSet::detSet() and StMeasurementDetSet::getDetSet() are technically not thread safe

The race condition mentioned above is fixed in #41872. I'm not convinced though it would be the full cause of the crash. Idealistically the race condition would only lead to edmNew::DetSet<SiStripCluster>::set() to be called more than needed, but strictly speaking a race condition leads to undefined behavior so who knows.

@missirol
Copy link
Contributor Author

missirol commented Jun 5, 2023

Thanks for the suggested fix, @makortel !

@makortel
Copy link
Contributor

makortel commented Jun 5, 2023

Thanks for the suggested fix

@missirol Do you want it backported to 13_0_X? (since it is unclear whether is plays a role in the crash)

@missirol
Copy link
Contributor Author

missirol commented Jun 5, 2023

If it's clear that it is a fix (even partial), I would be in favor of backporting it, since we will still use 13_0_X online for a while. If it helps, I can prepare the backports.

@makortel
Copy link
Contributor

makortel commented Jun 5, 2023

If it's clear that it is a fix (even partial), I would be in favor of backporting it, since we will still use 13_0_X online for a while.

Thanks, I'll prepare the backports after the review of #41872 completes (in the current form it is easily cherry-pickable).

@dan131riley
Copy link

As long as we're looking at DetSetNew, we're getting with some frequency DetSetNew assertion failures on aarch64

/data/cmsbld/jenkins_b/workspace/build-any-ib/w/tmp/BUILDROOT/95e24eec79ed42decc0c70dcac7a0f7d/opt/cmssw/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-06-01-2300/src/DataFormats/Common/interface/DetSetNew.h:86: const data_type* edmNew::DetSet<T>::data() const [with T = SiStripCluster; edmNew::DetSet<T>::data_type = SiStripCluster]: Assertion `m_data' failed.

from here:

data_type const *data() const {
if (m_offset | m_size)
assert(m_data);
return m_data ? (&((*m_data)[m_offset])) : nullptr;
}

The test at line 85 looks to be wrong--using a bitwise OR instead of logical, and m_offset is initialized to -1. There's probably also a race condition, but I haven't stared at it long enough yet.

Stack trace:

Thread 3 (Thread 0x400086359260 (LWP 2601823) "cmsRun"):
#8  0x00004000385bfc18 in __assert_fail () from /lib64/libc.so.6
#9  0x0000400063adda58 in edmNew::DetSet<SiStripCluster>::data() const [clone .part.0] [clone .lto_priv.0] () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02787/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-06-01-2300/lib/el8_aarch64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#10 0x0000400063ae790c in TkStripMeasurementDet::recHits(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > >&, std::vector<float, std::allocator<float> >&) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02787/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-06-01-2300/lib/el8_aarch64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#11 0x0000400063ae7c38 in TkStripMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02787/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-06-01-2300/lib/el8_aarch64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#12 0x0000400063b8723c in LayerMeasurements::measurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02787/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-05-28-2300/lib/el8_aarch64_gcc11/libTrackingToolsMeasurementDet.so
#13 0x00004000a5fa57bc in MuonCkfTrajectoryBuilder::collectMeasurement(DetLayer const*, std::vector<DetLayer const*, std::allocator<DetLayer const*> > const&, TrajectoryStateOnSurface const&, std::vector<TrajectoryMeasurement, std::allocator<TrajectoryMeasurement> >&, int&, Propagator const*) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02787/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-06-01-2300/lib/el8_aarch64_gcc11/libRecoMuonL3TrackFinder.so
#14 0x00004000a5fa743c in MuonCkfTrajectoryBuilder::findCompatibleMeasurements(TrajectorySeed const&, TempTrajectory const&, std::vector<TrajectoryMeasurement, std::allocator<TrajectoryMeasurement> >&) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02787/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-06-01-2300/lib/el8_aarch64_gcc11/libRecoMuonL3TrackFinder.so
#15 0x00004000a5f380cc in CkfTrajectoryBuilder::limitedCandidates(std::shared_ptr<TrajectorySeed const> const&, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<Trajectory, std::allocator<Trajectory> >&) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02787/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-06-01-2300/lib/el8_aarch64_gcc11/libRecoTrackerCkfPattern.so
#16 0x00004000a5f392a8 in CkfTrajectoryBuilder::limitedCandidates(TrajectorySeed const&, TempTrajectory&, std::vector<Trajectory, std::allocator<Trajectory> >&) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02787/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-06-01-2300/lib/el8_aarch64_gcc11/libRecoTrackerCkfPattern.so
#17 0x00004000a5f394dc in CkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02787/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-06-01-2300/lib/el8_aarch64_gcc11/libRecoTrackerCkfPattern.so
#18 0x00004000a5f2d7fc in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02787/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-06-01-2300/lib/el8_aarch64_gcc11/libRecoTrackerCkfPattern.so
#19 0x00004000a5f2edc4 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02787/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-06-01-2300/lib/el8_aarch64_gcc11/libRecoTrackerCkfPattern.so

@makortel
Copy link
Contributor

makortel commented Jun 6, 2023

The test at line 85 looks to be wrong--using a bitwise OR instead of logical, and m_offset is initialized to -1

I agree (especially on the m_offset check should be against -1). Could you make a PR?

There's probably also a race condition

At least the code has

TkStripMeasurementDet::RecHitContainer TkStripMeasurementDet::recHits(const TrajectoryStateOnSurface& ts,
const MeasurementTrackerEvent& data) const {
RecHitContainer result;
if UNLIKELY ((!isActive(data)) || isEmpty(data.stripData()))
return result;

bool isEmpty(const StMeasurementDetSet& theDets) const { return theDets.empty(index()); }

which ends up calling
bool empty(int i) const { return empty_[i]; }

which is part of the race condition I'm trying to fix in #41872 (assuming the stack trace is from an HLT job that does the on-demand strip unpacking and clustering; if not, the cause is likely something else)

@missirol
Copy link
Contributor Author

missirol commented Jun 6, 2023

(assuming the stack trace is from an HLT job that does the on-demand strip unpacking and clustering

I think this is the case, as the config had

process.hltSiStripRawToClustersFacility = cms.EDProducer( "SiStripClusterizerFromRaw",
    onDemand = cms.bool( True ),
[..]

@makortel
Copy link
Contributor

makortel commented Jun 6, 2023

(assuming the stack trace is from an HLT job that does the on-demand strip unpacking and clustering

I think this is the case, as the config had

I meant Dan's stack trace on the assertion failure on aarch64 (sorry for being unclear).

@makortel
Copy link
Contributor

makortel commented Jun 9, 2023

If it's clear that it is a fix (even partial), I would be in favor of backporting it, since we will still use 13_0_X online for a while.

Thanks, I'll prepare the backports after the review of #41872 completes (in the current form it is easily cherry-pickable).

The backports are in #41909 (13_1_X) and #41910 (13_0_X)

@missirol
Copy link
Contributor Author

Reporting another HLT crash which may be related to this issue.

  • Run 368566 (pp collisions)
  • Release: CMSSW_13_0_7
  • Full log from DAQ: f3mon_run368566.txt (1st crash in the log)
  • Piece of stack trace:
#3  0x00007f9fd21f133b in sig_dostack_then_abort () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f9e26dcff20 in ?? ()
#6  0x00007f9f763b6216 in (anonymous namespace)::ClusterFiller::fill(edmNew::DetSetVector<SiStripCluster>::TSFastFiller&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoLocalTrackerSiStripClusterizerPlugins.so
#7  0x00007f9f794fc4bd in StMeasurementDetSet::getDetSet(int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#8  0x00007f9f7950eb28 in TkStripMeasurementDet::recHits(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > >&, std::vector<float, std::allocator<float> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#9  0x00007f9f7950ef0d in TkStripMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#10 0x00007f9f7946a347 in LayerMeasurements::groupedMeasurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libTrackingToolsMeasurementDet.so
#11 0x00007f9ed7df21b1 in GroupedCkfTrajectoryBuilder::advanceOneLayer(TrajectorySeed const&, TempTrajectory&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const [clone .constprop.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#12 0x00007f9ed7de538d in GroupedCkfTrajectoryBuilder::groupedLimitedCandidates(TrajectorySeed const&, TempTrajectory const&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#13 0x00007f9ed7de8846 in GroupedCkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#14 0x00007f9ed7da2263 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#15 0x00007f9ed7da3ceb in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#16 0x00007f9fdbbd095d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so
#17 0x00007f9fdbbb7072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so

(..)

Current Modules:
Module: CkfTrackCandidateMaker:hltIter0PFlowCkfTrackCandidates (crashed)
Module: CkfTrackCandidateMaker:hltIter2PFlowCkfTrackCandidatesForDisplaced
Module: HcalRawToDigi:hltHcalDigis
Module: RecoTauProducer:hltHpsCombinatoricRecoTaus
Module: CkfTrackCandidateMaker:hltIterL3OIGlbDisplacedTrackCandidates
Module: TrackProducer:hltIter0PFlowCtfWithMaterialTracksForDisplaced
Module: L2MuonProducer:hltL2CosmicMuons
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidates
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoubletsUnseeded
Module: EcalUncalibRecHitProducer:hltEcalUncalibRecHitCPUOnly
Module: PFClusterProducer:hltParticleFlowClusterPSUnseeded
Module: PFBlockProducer:hltParticleFlowBlockForTaus
Module: TrackProducer:hltIter0PFlowCtfWithMaterialTracks
Module: CkfTrackCandidateMaker:hltMuCkfTrackCandidates
Module: HLTL1TSeed:hltL1sDoubleEGXer1p2dRMaxY
Module: LightPFTrackProducer:hltLightPFTracks
Module: PFClusterProducer:hltParticleFlowClusterPSUnseeded
Module: HitPairEDProducer:hltElePixelHitDoubletsUnseeded
Module: none
Module: FastjetJetProducer:hltAK4CaloJetsPF
Module: PathStatusInserter:HLT_CaloMET350_NotCleaned_v8
Module: PFBlockProducer:hltParticleFlowBlockForDisplTaus
Module: CAHitNtupletCUDAPhase1:hltPixelTracksCPUOnly
Module: CkfTrackCandidateMaker:hltIter0IterL3FromL1MuonCkfTrackCandidates
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoublets
Module: PFClusterProducer:hltParticleFlowClusterHBHE
Module: CSCRecHitDProducer:hltCsc2DRecHits
Module: PFBlockProducer:hltParticleFlowBlock
Module: RecoTauJetRegionProducer:hltTauPFJets08Region
Module: none
Module: SiPixelClusterProducer:hltSiPixelClustersRegForDisplaced
Module: PFClusterProducer:hltParticleFlowClusterHBHE
A fatal system signal has occurred: segmentation violation

@missirol
Copy link
Contributor Author

Reporting another HLT crash which may be related to this issue.

  • Run 368636 (pp collisions)
  • Release: CMSSW_13_0_7
  • Full log from DAQ: f3mon_run368636.txt
  • Piece of stack trace:
#3  0x00007f3bc7f3133b in sig_dostack_then_abort () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f3b6c0610f1 in sistrip::FEDBuffer::findChannels() () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libEventFilterSiStripRawToDigi.so
#6  0x00007f3b6c0d621e in (anonymous namespace)::ClusterFiller::fill(edmNew::DetSetVector<SiStripCluster>::TSFastFiller&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoLocalTrackerSiStripClusterizerPlugins.so
#7  0x00007f3b6f21c4bd in StMeasurementDetSet::getDetSet(int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#8  0x00007f3b6f21c8a6 in TkStripMeasurementDet::empty(MeasurementTrackerEvent const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#9  0x00007f3b6f21f0f1 in TkGluedMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#10 0x00007f3b6f18a347 in LayerMeasurements::groupedMeasurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libTrackingToolsMeasurementDet.so
#11 0x00007f3acd30f1b1 in GroupedCkfTrajectoryBuilder::advanceOneLayer(TrajectorySeed const&, TempTrajectory&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const [clone .constprop.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#12 0x00007f3acd30238d in GroupedCkfTrajectoryBuilder::groupedLimitedCandidates(TrajectorySeed const&, TempTrajectory const&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#13 0x00007f3acd305846 in GroupedCkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#14 0x00007f3acd2bf263 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#15 0x00007f3acd2c0ceb in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#16 0x00007f3bd192d95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so
#17 0x00007f3bd1914072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so

(..)

Current Modules:
Module: CkfTrackCandidateMaker:hltIter0PFlowCkfTrackCandidatesCPUOnly (crashed)
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidatesNoVtx
Module: TrackProducer:hltIter0PFlowCtfWithMaterialTracks
Module: CorrectedECALPFClusterProducer:hltParticleFlowClusterECALUnseeded
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoubletsUnseeded
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoubletsUnseeded
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidatesNoVtx
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoublets
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoublets
Module: CkfTrajectoryMaker:hltL3TrackCandidateFromL2IOHit
Module: MuonIdProducer:hltGlbTrkMuonsLowPtIter01Merge
Module: CkfTrackCandidateMaker:hltIter0IterL3FromL1MuonCkfTrackCandidates
Module: FastjetJetProducer:hltAK4PixelOnlyPFJets
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidatesNoVtx
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidatesNoVtx
Module: TrackProducer:hltIter0IterL3FromL1MuonCtfWithMaterialTracks
Module: HLTL1TSeed:hltL1sTripleMuOpen53p52UpsilonMuon
Module: DeepTauId:hltHpsPFTauDeepTauProducerForVBFIsoTau
Module: HLTL1TSeed:hltL1VBFIsoEG
Module: SeedCombiner:hltElePixelSeedsCombined
Module: CorrectedCaloJetProducer:hltAK4CaloJetsCorrected
Module: MuonIdProducer:hltMuonsForDisplTau
Module: GlobalEvFOutputModule:hltOutputCalibration
Module: CkfTrackCandidateMaker:hltIter0IterL3FromL1MuonCkfTrackCandidates
Module: FastjetJetProducer:hltAK4CaloJets
Module: FastjetJetProducer:hltAK8CaloJets
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoublets
Module: CaloTowersCreator:hltTowerMakerForAll
Module: none
Module: PFMultiDepthClusterProducer:hltParticleFlowClusterHCAL
Module: CkfTrackCandidateMaker:hltIter0PFlowCkfTrackCandidatesCPUOnly
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidates
A fatal system signal has occurred: segmentation violation

@makortel
Copy link
Contributor

Extracting more stack trace from #41786 (comment)

Thread 21 (Thread 0x7f9f003fc700 (LWP 1659889) "cmsRun"):
#2  0x00007f9fd21eded0 in sig_pause_for_stacktrace () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007f9f794fc41a in StMeasurementDetSet::getDetSet(int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#5  0x00007f9f794fc8a6 in TkStripMeasurementDet::empty(MeasurementTrackerEvent const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#6  0x00007f9f794ff0f1 in TkGluedMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#7  0x00007f9f7946a347 in LayerMeasurements::groupedMeasurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libTrackingToolsMeasurementDet.so
#8  0x00007f9ed7df21b1 in GroupedCkfTrajectoryBuilder::advanceOneLayer(TrajectorySeed const&, TempTrajectory&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const [clone .constprop.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#9  0x00007f9ed7de538d in GroupedCkfTrajectoryBuilder::groupedLimitedCandidates(TrajectorySeed const&, TempTrajectory const&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#10 0x00007f9ed7de8846 in GroupedCkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#11 0x00007f9ed7da2263 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#12 0x00007f9ed7da3ceb in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#13 0x00007f9fdbbd095d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so
#14 0x00007f9fdbbb7072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 14 (Thread 0x7f9f03bff700 (LWP 1659882) "cmsRun"):
#3  0x00007f9fd21f133b in sig_dostack_then_abort () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f9e26dcff20 in ?? ()
#6  0x00007f9f763b6216 in (anonymous namespace)::ClusterFiller::fill(edmNew::DetSetVector<SiStripCluster>::TSFastFiller&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoLocalTrackerSiStripClusterizerPlugins.so
#7  0x00007f9f794fc4bd in StMeasurementDetSet::getDetSet(int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#8  0x00007f9f7950eb28 in TkStripMeasurementDet::recHits(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > >&, std::vector<float, std::allocator<float> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#9  0x00007f9f7950ef0d in TkStripMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#10 0x00007f9f7946a347 in LayerMeasurements::groupedMeasurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libTrackingToolsMeasurementDet.so
#11 0x00007f9ed7df21b1 in GroupedCkfTrajectoryBuilder::advanceOneLayer(TrajectorySeed const&, TempTrajectory&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const [clone .constprop.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#12 0x00007f9ed7de538d in GroupedCkfTrajectoryBuilder::groupedLimitedCandidates(TrajectorySeed const&, TempTrajectory const&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#13 0x00007f9ed7de8846 in GroupedCkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#14 0x00007f9ed7da2263 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#15 0x00007f9ed7da3ceb in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#16 0x00007f9fdbbd095d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so
#17 0x00007f9fdbbb7072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so

@makortel
Copy link
Contributor

makortel commented Jun 12, 2023

In #41786 (comment)

only one thread was in StMeasurementDetSet::getDetSet(), making the stack trace different from the earlier ones. Under the "race condition somewhere in call chain" hypothesis the closes match would be

Thread 36 (Thread 0x7f3a813ff700 (LWP 2036162) "cmsRun"):
#2  0x00007f3bc7f2ded0 in sig_pause_for_stacktrace () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007f3b6f22418d in SiStripRecHit2D& std::vector<SiStripRecHit2D, std::allocator<SiStripRecHit2D> >::emplace_back<Point3DBase<float, LocalTag> const&, LocalError const&, GeomDet const&, edm::Ref<edmNew::DetSetVector<SiStripCluster>, SiStripCluster, edmNew::DetSetVector<SiStripCluster>::FindForDetSetVector> const&>(Point3DBase<float, LocalTag> const&, LocalError const&, GeomDet const&, edm::Ref<edmNew::DetSetVector<SiStripCluster>, SiStripCluster, edmNew::DetSetVector<SiStripCluster>::FindForDetSetVector> const&) [clone .isra.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#5  0x00007f3b6f224c74 in bool TkStripMeasurementDet::filteredRecHits<edm::Ref<edmNew::DetSetVector<SiStripCluster>, SiStripCluster, edmNew::DetSetVector<SiStripCluster>::FindForDetSetVector> >(edm::Ref<edmNew::DetSetVector<SiStripCluster>, SiStripCluster, edmNew::DetSetVector<SiStripCluster>::FindForDetSetVector> const&, StripCPE::AlgoParam const&, TrajectoryStateOnSurface const&, MeasurementEstimator const&, std::vector<bool, std::allocator<bool> > const&, std::vector<SiStripRecHit2D, std::allocator<SiStripRecHit2D> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#6  0x00007f3b6f22de80 in TkStripMeasurementDet::simpleRecHits(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, std::vector<SiStripRecHit2D, std::allocator<SiStripRecHit2D> >&) const [clone .isra.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#7  0x00007f3b6f21f15d in TkGluedMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#8  0x00007f3b6f18a347 in LayerMeasurements::groupedMeasurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libTrackingToolsMeasurementDet.so
#9  0x00007f3acd30f1b1 in GroupedCkfTrajectoryBuilder::advanceOneLayer(TrajectorySeed const&, TempTrajectory&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const [clone .constprop.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#10 0x00007f3acd30238d in GroupedCkfTrajectoryBuilder::groupedLimitedCandidates(TrajectorySeed const&, TempTrajectory const&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#11 0x00007f3acd305846 in GroupedCkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#12 0x00007f3acd2bf263 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#13 0x00007f3acd2c0ceb in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#14 0x00007f3bd192d95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so
#15 0x00007f3bd1914072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 16 (Thread 0x7f3af7dfc700 (LWP 2035972) "cmsRun"):
#3  0x00007f3bc7f3133b in sig_dostack_then_abort () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f3b6c0610f1 in sistrip::FEDBuffer::findChannels() () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libEventFilterSiStripRawToDigi.so
#6  0x00007f3b6c0d621e in (anonymous namespace)::ClusterFiller::fill(edmNew::DetSetVector<SiStripCluster>::TSFastFiller&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoLocalTrackerSiStripClusterizerPlugins.so
#7  0x00007f3b6f21c4bd in StMeasurementDetSet::getDetSet(int) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#8  0x00007f3b6f21c8a6 in TkStripMeasurementDet::empty(MeasurementTrackerEvent const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#9  0x00007f3b6f21f0f1 in TkGluedMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#10 0x00007f3b6f18a347 in LayerMeasurements::groupedMeasurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libTrackingToolsMeasurementDet.so
#11 0x00007f3acd30f1b1 in GroupedCkfTrajectoryBuilder::advanceOneLayer(TrajectorySeed const&, TempTrajectory&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const [clone .constprop.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#12 0x00007f3acd30238d in GroupedCkfTrajectoryBuilder::groupedLimitedCandidates(TrajectorySeed const&, TempTrajectory const&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#13 0x00007f3acd305846 in GroupedCkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#14 0x00007f3acd2bf263 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#15 0x00007f3acd2c0ceb in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#16 0x00007f3bd192d95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so
#17 0x00007f3bd1914072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_7/lib/el8_amd64_gcc11/libFWCoreFramework.so

On the other hand, this observation supports my earlier hunch of the race condition in StMeasurementDetSet not being the full cause of the crash in sistrip::FEDBuffer::findChannels() (#41786 (comment)).

@makortel
Copy link
Contributor

@Dr15Jones pointed out that after #41872 the StMeasurementDetSet::getSet() still has a race condition in the assignment

} else { // we should not be here
det.detSet_ = StripDetset();

@makortel
Copy link
Contributor

StMeasurementDetSet::getSet() still has a race condition in the assignment

Fix proposed in #41936 (to be backported to 13_0_X as well)

@missirol
Copy link
Contributor Author

missirol commented Jul 1, 2023

The fixes in #41872 and #41936 were integrated and backported, and CMSSW_13_0_9 includes both. (Thanks for that !)

After HLT deployed CMSSW_13_0_9 online, we saw a runtime crash which looks similar to the ones discussed in this issue. We can share the corresponding error-stream file once available, if that helps.

  • Run 369870 (pp collisions)
  • Release: CMSSW_13_0_9
  • Full log from DAQ: f3mon_run369870.txt
  • Extract of log from DAQ:
msgtime:2023-06-30 17:51:19
doc_type:cmsswlog
date:2023-06-30T15:51:19.990Z
run:369870
host:fu-c2b05-13-01
pid:3824147
doctype:stacktrace
severity:FATAL
severityVal:4
instance:global
lexicalId:549852445
message:A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.
Fri Jun 30 17:50:56 CEST 2023

(..)

Thread 10 (Thread 0x7f5b987fe700 (LWP 3825123) "cmsRun"):
#0  0x00007f5c10ae3a71 in poll () from /lib64/libc.so.6
#1  0x00007f5c079d846f in full_read.constprop () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#2  0x00007f5c079a3b6c in edm::service::InitRootHandlers::stacktraceFromThread() () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  0x00007f5c079a433b in sig_dostack_then_abort () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f5a566562b0 in ?? ()
#6  0x00007f5baee67026 in (anonymous namespace)::ClusterFiller::fill(edmNew::DetSetVector<SiStripCluster>::TSFastFiller&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/pluginRecoLocalTrackerSiStripClusterizerPlugins.so
#7  0x00007f5bb1fae355 in StMeasurementDetSet::detSet(int) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#8  0x00007f5bb1fc024c in TkStripMeasurementDet::recHits(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > >&, std::vector<float, std::allocator<float> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#9  0x00007f5bb1fc091d in TkStripMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/pluginRecoTrackerMeasurementDetPlugins.so
#10 0x00007f5bb1f1c347 in LayerMeasurements::groupedMeasurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/libTrackingToolsMeasurementDet.so
#11 0x00007f5b38da01b1 in GroupedCkfTrajectoryBuilder::advanceOneLayer(TrajectorySeed const&, TempTrajectory&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const [clone .constprop.0] () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#12 0x00007f5b38d9338d in GroupedCkfTrajectoryBuilder::groupedLimitedCandidates(TrajectorySeed const&, TempTrajectory const&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#13 0x00007f5b38d96846 in GroupedCkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/pluginRecoTrackerCkfPatternPlugins.so
#14 0x00007f5b38d50263 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#15 0x00007f5b38d51ceb in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/libRecoTrackerCkfPattern.so
#16 0x00007f5c1353095d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/libFWCoreFramework.so
#17 0x00007f5c13517072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/libFWCoreFramework.so
#18 0x00007f5c134a36da in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/libFWCoreFramework.so
#19 0x00007f5c134a3b88 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/libFWCoreFramework.so
#20 0x00007f5c131f8f79 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /opt/offline/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_9/lib/el8_amd64_gcc11/libFWCoreConcurrency.so
#21 0x00007f5c11c75304 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7f5abcf5af00, waiter=..., this=0x7f5c0b9f3a00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/task_dispatcher.h:322
#22 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7f5c0b9f3a00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/task_dispatcher.h:458
#23 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/arena.cpp:137
#24 tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/market.cpp:599
#25 0x00007f5c11c774c6 in tbb::detail::r1::rml::private_worker::run (this=0x7f5c0b9e7d80) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/private_server.cpp:271
#26 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7f5c0b9e7d80) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_2-el8_amd64_gcc11/build/CMSSW_13_0_2-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-bb5e0283c68ca6d69bd8419f6c08f7b1/tbb-v2021.8.0/src/tbb/private_server.cpp:221
#27 0x00007f5c10dc117a in start_thread () from /lib64/libpthread.so.0
#28 0x00007f5c10aeedf3 in clone () from /lib64/libc.so.6

(..)

Current Modules:
Module: CkfTrackCandidateMaker:hltIter0PFlowCkfTrackCandidatesCPUOnly (crashed)
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidates
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidatesNoVtx
Module: PFBlockProducer:hltParticleFlowBlock
Module: TrackProducer:hltIter0PFlowCtfWithMaterialTracksCPUOnly
Module: PFClusterProducer:hltParticleFlowClusterHBHE
Module: TrackProducer:hltIter0PFlowCtfWithMaterialTracks
Module: GlobalEvFOutputModule:hltOutputPhysicsHLTPhysics2
Module: CorrectedECALPFClusterProducer:hltParticleFlowClusterECALUnseeded
Module: ElectronNHitSeedProducer:hltEgammaElectronPixelSeedsUnseeded
Module: CkfTrackCandidateMaker:hltIterL3OITrackCandidatesNoVtx
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoubletsUnseeded
Module: RecoTauProducer:hltHpsCombinatoricRecoTausDispl
Module: MuonHLTSeedMVAClassifier:hltIter0IterL3FromL1MuonPixelSeedsFromPixelTracksFiltered
Module: TrackProducer:hltIter0IterL3FromL1MuonCtfWithMaterialTracks
Module: PFMultiDepthClusterProducer:hltParticleFlowClusterHCAL
Module: none
Module: PFClusterProducer:hltParticleFlowClusterHBHE
Module: PFRecHitProducer:hltParticleFlowRecHitHF
Module: GlobalEvFOutputModule:hltOutputParkingDoubleMuonLowMass3
Module: SeedCreatorFromRegionConsecutiveHitsEDProducer:hltElePixelSeedsDoublets
Module: PixelTrackProducerFromSoAPhase1:hltPixelTracksFromSoACPUOnly
Module: HLTRegionalEcalResonanceFilter:hltAlCaPi0RecHitsFilterEBonlyRegional
Module: PFBlockProducer:hltParticleFlowBlockForTaus
Module: AlcaPCCEventProducer:hltAlcaPixelClusterCounts
Module: CkfTrackCandidateMaker:hltIter0PFlowCkfTrackCandidates
Module: HitPairEDProducer:hltElePixelHitDoubletsForTripletsUnseeded
Module: TrackProducer:hltIter0PFlowCtfWithMaterialTracks
Module: GsfTrackProducer:hltEgammaGsfTracksUnseeded
Module: CkfTrackCandidateMaker:hltIter0PFlowCkfTrackCandidates
Module: MuonHLTSeedMVAClassifier:hltIter0IterL3MuonPixelSeedsFromPixelTracksFiltered
Module: CkfTrackCandidateMaker:hltIter0PFlowCkfTrackCandidatesCPUOnly
A fatal system signal has occurred: segmentation violation

@slava77
Copy link
Contributor

slava77 commented Jul 1, 2023

type tracking

@makortel
Copy link
Contributor

makortel commented Jul 3, 2023

Thanks @missirol for reporting the new stack trace. I didn't see anything obviously related activity in the other threads. I suppose the further investigation should focus on the contents of the fill() function itself (I suspected also earlier)

void ClusterFiller::fill(StripClusterizerAlgorithm::output_t::TSFastFiller& record) const {

@missirol
Copy link
Contributor Author

missirol commented Jul 6, 2023

@dan131riley , would it be useful to backport #42194 to 13_0_X (and 13_1_X) as part of debugging these online crashes ?

@dan131riley
Copy link

@dan131riley , would it be useful to backport #42194 to 13_0_X (and 13_1_X) as part of debugging these online crashes ?

That PR is entirely about reducing false positives, it wouldn't help with the HLT crashes.

@dan131riley
Copy link

Naive question: are there circumstances where the FEDRawDataCollection could get released while the event is still in progress? Currently the on-demand getter holds a reference to the FEDRawDataCollection--should it be keeping a Handle to the FEDRawDataCollection instead?

@Dr15Jones
Copy link
Contributor

@dan131riley it is possible to tell the framework to delete a data product early. See process.options.canDeleteEarly for the list of data products that a configuration has marked to be allowed to delete early. I would not expect FEDRawDataCollection to be on that list since it has to remain in the event until the OutputModule.

IF FEDRawDataCollection is marked for delete early, one must also specify any data products which reference (say by holding pointers to or even edm::Ref to the data product) the to be deleted early data product in the configuration parameter

process.options.holdsReferencesToDeleteEarly

@fwyzard
Copy link
Contributor

fwyzard commented Jul 10, 2023

As far as I can see from a recent configuration (attached: hlt.py.gz), HLT does not perform any early deletion.

@dan131riley
Copy link

As far as I can see from a recent configuration (attached: hlt.py.gz), HLT does not perform any early deletion.

Thanks, that all makes sense. I'm having trouble constructing scenarios that could account for the crashes in sistrip::FEDBuffer::findChannels(), so there's some clutching at straws in effect trying to eliminate possibilities.

@missirol missirol changed the title HLT crash in run-367906 HLT crash in run-367906 (sistrip::FEDBuffer::findChannels()) Jul 24, 2023
@missirol
Copy link
Contributor Author

missirol commented Jul 27, 2023

Adding a belated summary of recent online crashes which might be related to this issue. All the runs below are 2023 pp-collisions runs after run-369870. The CMSSW release used in these runs was CMSSW_13_0_N with N >= 9. So far, these crashes were not reproduced offline. A recipe to try and reproduce is in [*].

Legend: run number, [total number of online crashes] number of crashes possibly related to this issue (based on my naive reading of the attached stack traces).

[*] Recipe tested on lxplus-gpu:
https://gist.github.com/missirol/45e9626c967e415ca39d2e86c7d26a4b

# example to run on files from run-370560 with 32 threads and 24 streams
./rerun_hlt_on_error_stream.sh -t 32 -s 24 \
 -i /eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/FOG/error_stream \
 -r 370560 -o tmp

@fwyzard
Copy link
Contributor

fwyzard commented Jul 27, 2023

If all the crashes are there since CMSSW_13_0_9, maybe #42033 is related ?

@mmusich
Copy link
Contributor

mmusich commented Jul 27, 2023

If all the crashes are there since CMSSW_13_0_9, maybe #42033 is related ?

I doubt it, since the first report is from May 28th (CMSSW_13_0_6): #41786 (comment)

@fwyzard
Copy link
Contributor

fwyzard commented Jul 27, 2023

Ah OK, thanks for pointing this out.

@mmusich
Copy link
Contributor

mmusich commented Dec 3, 2024

This type of crash didn't happen at all in 2024. Should we consider closing this issue?

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 3, 2024

cms-bot internal usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants