Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of Geometry Payloads for DataBase #46290

Closed
bsunanda opened this issue Oct 7, 2024 · 28 comments
Closed

Creation of Geometry Payloads for DataBase #46290

bsunanda opened this issue Oct 7, 2024 · 28 comments

Comments

@bsunanda
Copy link
Contributor

bsunanda commented Oct 7, 2024

The standard way of creating the payload is to follow these steps
cmsrel CMSSW_14_2_X_2024-10-06-2300
cd CMSSW_14_2_X_2024-10-06-2300/src
cmsenv
git cms-addpkg CondTools/Geometry
scram b -j4
cd CondTools/Geometry/test
/bin/cp writehelpers/* .
./createExtended2024DD4hepPayloads.sh 142DD4hepV1

This creates several .db files, some for XML files used for simulation and a number of files needed for loading parameters for reconstruction geometry

There are several cmsRun steps in createExtended2024DD4hepPayloads.sh
The second cmsRun which utilises cmsRun geometryExtended2024DD4heo_writer.py does not complete and gets killed.

Consequently, several .db files are not created which are recommended geometries for HCAL, ZDC, ,,,, and some parameters for Tracker

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 7, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 7, 2024

A new Issue was created by @bsunanda.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

makortel commented Oct 7, 2024

assign geometry, alca, db

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 7, 2024

New categories assigned: geometry,alca,db

@atpathak,@bsunanda,@civanch,@consuegs,@Dr15Jones,@francescobrivio,@kpedro88,@makortel,@mdhildreth,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

@perrotta
Copy link
Contributor

perrotta commented Oct 7, 2024

assign core
@makortel @cms-sw/core-l2 , the github issues was opened because @bsunanda cannot run the second cmsRun step as mentioned in the descriptions, because of a high memory issue (as reported by him at today's AlCaDB meeting). This is becoming extremely urgent, because in 10 days from now the ppRef run is expected to start, and an updated ZDC geometry is needed for it. At the meeting we suggested Sunanda to open this issue and ask for help from the O&C core team, trying to speed up the resolution. Would any of you be able to have a look at it, and maybe, given your experience, pinpointing the origin of the problem? That would be of great help: thank you!

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 7, 2024

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

makortel commented Oct 7, 2024

Thanks @perrotta for explaining the background and urgency. We'll try to take a look. It's unfortunate though the time line is so tight.

@Dr15Jones
Copy link
Contributor

I tried the instructions, and the first cmsRun job failed with

----- Begin Fatal Exception 08-Oct-2024 07:51:12 CDT-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named geometryExtended2024DD4hep_xmlwriter.py
Exception Message:
 unknown python problem occurred.
RuntimeError: An exception of category 'FileInPathError' occurred.
Exception Message:
edm::FileInPath unable to find file Geometry/CMSCommonData/data/dd4hep/cmsExtendedGeometry2024FlatPlus10PercentFlatPlus10PercentFlatPlus10Percent.xml anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is:  [cut ]

and looking at $CMSSW_RELEASE/src/Geometry/CMSCommonData/data/dd4hep/ such a file doesn't exist. However, the following does cmsExtendedGeometry2024FlatPlus10Percent.xml

@bsunanda
Copy link
Contributor Author

bsunanda commented Oct 8, 2024 via email

@makortel
Copy link
Contributor

makortel commented Oct 8, 2024

Following @Dr15Jones suggestion in the private email thread, limiting VSIZE to ~5 GB (ulimit -v 5000000) to trigger std::bad_alloc exception instead of being killed by the OS, and running in gdb by catching exceptions to see where the std::bad_alloc exception is thrown shows this stack trace

(gdb) where
#0  0x00007ffff5b612f1 in __cxxabiv1::__cxa_throw (obj=0x7fffcf972880, tinfo=0x7ffff5cc5e18 <typeinfo for std::bad_alloc>, dest=0x7ffff5b5f6e0 <std::bad_alloc::~bad_alloc()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1  0x00007ffff5b5811b in std::__throw_bad_alloc() ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02858/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib64/libstdc++.so.6
#2  0x00007ffff67bb39b in handleOOM (size=<optimized out>, nothrow=<optimized out>) at src/jemalloc_cpp.cpp:90
#3  0x00007fffcdbf2cef in HcalGeometry::init() ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libGeometryHcalTowerAlgo.so
#4  0x00007fffcdbf303d in HcalGeometry::HcalGeometry(HcalTopology const&) ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libGeometryHcalTowerAlgo.so
#5  0x00007fffcdbf429b in HcalFlexiHardcodeGeometryLoader::load(HcalTopology const&, HcalDDDRecConstants const&) ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libGeometryHcalTowerAlgo.so
#6  0x00007fffcac60803 in HcalHardcodeGeometryEP::produceAligned(HcalGeometryRecord const&) ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#7  0x00007fffcac6478e in void edm::SerialTaskQueueChain::actionToRun<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}&>(tbb::detail::d1::task_group*&) ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#8  0x00007fffcac64911 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}>(tbb::detail::d1::task_group&, tbb::detail::d1::task_group*&)::{lambda()#1}>::execute() ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#9  0x00007ffff79d0b35 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#10 0x00007ffff63c53e1 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#11 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#12 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#13 0x00007ffff7bce1ab in edm::FinalWaitingTask::wait() ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#14 0x00007ffff7bdbc8f in edm::EventProcessor::processRuns() ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x00007ffff7bdc141 in edm::EventProcessor::runToCompletion() ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x000000000040840c in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#17 0x00007ffff63b19ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/arena.cpp:688
#18 0x000000000040a0f2 in main::{lambda()#1}::operator()() const ()
#19 0x0000000000405100 in main ()

@makortel
Copy link
Contributor

makortel commented Oct 8, 2024

Running again produces also an assertion failure

cmsRun: src/Geometry/CaloGeometry/interface/EZMgrFL.h:21: EZMgrFL<T>::EZMgrFL(size_type, size_type) [with T = Point3DBase<float, GlobalTag>; size_type = long unsigned int]: Assertion `vecSize > 0' failed.

Thread 1 "cmsRun" received signal SIGABRT, Aborted.
0x00007ffff516852f in raise () from /lib64/libc.so.6
(gdb) where
#0  0x00007ffff516852f in raise () from /lib64/libc.so.6
#1  0x00007ffff513be65 in abort () from /lib64/libc.so.6
#2  0x00007ffff513bd39 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007ffff5160e86 in __assert_fail () from /lib64/libc.so.6
#4  0x00007fffcda3ec95 in EZMgrFL<Point3DBase<float, GlobalTag> >::EZMgrFL (subSize=8, vecSize=0, this=<optimized out>) at src/Geometry/CaloGeometry/interface/EZMgrFL.h:21
#5  CaloSubdetectorGeometry::allocateCorners (this=0x7fffcf929b80, n=0) at src/Geometry/CaloGeometry/src/CaloSubdetectorGeometry.cc:108
#6  0x00007fffcdbdcb8b in HcalFlexiHardcodeGeometryLoader::load (this=0x7fffffff0ef0, fTopology=..., hcons=...)
    at src/Geometry/HcalTowerAlgo/src/HcalFlexiHardcodeGeometryLoader.cc:28
#7  0x00007fffcaba4803 in HcalHardcodeGeometryEP::produceAligned(HcalGeometryRecord const&) ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#8  0x00007fffcaba878e in void edm::SerialTaskQueueChain::actionToRun<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}&>(tbb::detail::d1::task_group*&) ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#9  0x00007fffcaba8911 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}>(tbb::detail::d1::task_group&, tbb::detail::d1::task_group*&)::{lambda()#1}>::execute()
    () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#10 0x00007ffff79d0b35 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#11 0x00007ffff63c53e1 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#12 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#13 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#14 0x00007ffff7bce1ab in edm::FinalWaitingTask::wait() ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x00007ffff7bdbc8f in edm::EventProcessor::processRuns() ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x00007ffff7bdc141 in edm::EventProcessor::runToCompletion() ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#17 0x000000000040840c in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#18 0x00007ffff63b19ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/arena.cpp:688
#19 0x000000000040a0f2 in main::{lambda()#1}::operator()() const ()
#20 0x0000000000405100 in main ()

i.e. here

hcalGeometry->allocateCorners(fTopology.ncells() + fTopology.getHFSize());

the fTopology.ncells() + fTopology.getHFSize() is 0.

This kind of varying behavior hints towards a memory corruption.

@Dr15Jones
Copy link
Contributor

So when I run the job (after fixing the scripts) I see the out of memory error. The debugger showed a very large allocation request. I then turned on one of the debug printouts and see

HcalGeometry_init(): HBSize 892613681 HESize 808794672 HOSize 1111110454 HFSize 1634365029

Are these sizes what is actually expected?

@makortel
Copy link
Contributor

makortel commented Oct 8, 2024

I got these numbers from the same printout (now that my test case went back to the std::bad_alloc

HcalGeometry_init(): HBSize 1985088260 HESize 3964404929 HOSize 1208271767 HFSize 79882717

A smoking gun in both the bad_alloc and assertion failure behaviors is that the numbers come from HcalTopology.

@bsunanda
Copy link
Contributor Author

bsunanda commented Oct 8, 2024

I understood that the script createExtended2024DD4hepPayloads.sh cannot be run multiple times in the same area because of sed. However, after running the script once cmsRun could be run multiple times and that is when I saw memory exhausted and got a system "kill". I am attaching a log file which I got for the first time. You could see "ERROR" getting printed multiple times starting from TKRECO_Geometry

142DD4hepV3.log

@bsunanda
Copy link
Contributor Author

bsunanda commented Oct 8, 2024

HCalTopology was not a new code. ZDCTopology is new and so is calowriters where the ZDC part is new. Maybe I remove the alignment part for ZDC and see the impact

@makortel
Copy link
Contributor

makortel commented Oct 8, 2024

Got again the assertion failure behavior. This time the printout from HcalGeometry_init() is

HcalGeometry_init(): HBSize 0 HESize 0 HOSize 0 HFSize 1

This time the assertion failure stack trace was

#0  0x00007ffff516852f in raise () from /lib64/libc.so.6
#1  0x00007ffff513be65 in abort () from /lib64/libc.so.6
#2  0x00007ffff513bd39 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007ffff5160e86 in __assert_fail () from /lib64/libc.so.6
#4  0x00007fffcd9ded2f in EZMgrFL<float>::EZMgrFL (subSize=5, vecSize=0, this=<optimized out>) at src/Geometry/CaloGeometry/interface/EZMgrFL.h:21
#5  CaloSubdetectorGeometry::allocatePar (this=0x7fffcf8f2b80, n=0, m=5) at src/Geometry/CaloGeometry/src/CaloSubdetectorGeometry.cc:115
#6  0x00007fffcdb9cc68 in HcalFlexiHardcodeGeometryLoader::load (this=0x7fffffff0ef0, fTopology=..., hcons=...) at src/Geometry/HcalTowerAlgo/src/HcalFlexiHardcodeGeometryLoader.cc:33
#7  0x00007fffcab44803 in HcalHardcodeGeometryEP::produceAligned(HcalGeometryRecord const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#8  0x00007fffcab4878e in void edm::SerialTaskQueueChain::actionToRun<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}&>(tbb::detail::d1::task_group*&) ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#9  0x00007fffcab48911 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}>(tbb::detail::d1::task_group&, tbb::detail::d1::task_group*&)::{lambda()#1}>::execute() ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#10 0x00007ffff79d0b35 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#11 0x00007ffff63c53e1 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#12 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#13 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#14 0x00007ffff7bce1ab in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x00007ffff7bdbc8f in edm::EventProcessor::processRuns() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x00007ffff7bdc141 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#17 0x000000000040840c in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#18 0x00007ffff63b19ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/arena.cpp:688
#19 0x000000000040a0f2 in main::{lambda()#1}::operator()() const ()
#20 0x0000000000405100 in main ()

and problem being hcalGeometry->numberOfShapes() being 0 in

hcalGeometry->allocatePar(hcalGeometry->numberOfShapes(), HcalGeometry::k_NumberOfParametersPerShape);

@Dr15Jones
Copy link
Contributor

@makortel and I think we have found the problem. The values for HcalTopology::HBSize_ seem to be random. The value is set in the constructor here (this is the constructor being called by the job as the debugger hit this as a break point for me):

if (mode_ == HcalTopologyMode::LHC) {
topoVersion_ = 0; //DL
HBSize_ = kHBSizePreLS1; // qie-per-fiber * fiber/rm * rm/rbx * rbx/barrel * barrel/hcal
HESize_ = kHESizePreLS1; // qie-per-fiber * fiber/rm * rm/rbx * rbx/endcap * endcap/hcal
HOSize_ = kHOSizePreLS1; // ieta * iphi * 2
HFSize_ = kHFSizePreLS1; // ieta * iphi * depth * 2
CALIBSize_ = kCALIBSizePreLS1;
numberOfShapes_ = 87;
} else if (mode_ == HcalTopologyMode::SLHC) { // need to know more eventually
topoVersion_ = 10;
HBSize_ = nEtaHB_ * IPHI_MAX * maxDepthHB_ * 2;
HESize_ = nEtaHE_ * maxPhiHE_ * maxDepthHE_ * 2;
HOSize_ = (lastHORing_ - firstHORing_ + 1) * IPHI_MAX * 2; // ieta * iphi * 2
HFSize_ = (lastHFRing_ - firstHFRing_ + 1) * IPHI_MAX * maxDepthHF_ * 2; // ieta * iphi * depth * 2
CALIBSize_ = kOffCalibHFX_;
numberOfShapes_ = (maxPhiHE_ > 72) ? 1200 : 500;
}

Notice the if block. If neither of the two ifs are true, then HBSize_ is never set. Stepping through with the debugger shows that is the case here. I determined that the value of mode_ is 4 which corresponds to Run3

enum Mode { LHC = 0, H2 = 1, SLHC = 2, H2HE = 3, Run3 = 4, Run4 = 5 };

@makortel
Copy link
Contributor

makortel commented Oct 8, 2024

From git history I see the Run3 enum value was added in #45511. That PR did modify the HcalTopology constructor taking HcalTopologyMode::Mode to include cases for Run3, but not the constructor taking const HcalDDDRecConstants*.

@Dr15Jones
Copy link
Contributor

So after modifying the if block, I see what appears to be better values

%MSG-s HCalGeom: HcalGeometryToDBEP:HcalGeometryToDBEP@callESModule 08-Oct-2024 09:51:58 CDT Run: 1
HcalGeometry_init(): HBSize 9216 HESize 14112 HOSize 2160 HFSize 7488

@bsunanda
Copy link
Contributor Author

bsunanda commented Oct 8, 2024

Thanks - I shall try to cure this

@bsunanda
Copy link
Contributor Author

bsunanda commented Oct 8, 2024

I think the logic in HcalTopology needs to be modified

@Dr15Jones
Copy link
Contributor

See #46305

@Dr15Jones
Copy link
Contributor

With the PR I made, the job still fails with

----- Begin Fatal Exception 08-Oct-2024 09:52:03 CDT-----------------------
An exception of category 'NoProductResolverException' occurred while
   [0] Processing global begin Run run: 1
   [1] Prefetching for module PCaloGeometryBuilder/'CaloGeometryWriter'
   [2] Prefetching for EventSetup module ZdcGeometryToDBEP/''
   [3] Calling method for EventSetup module ZdcHardcodeGeometryEP/''
Exception Message:
Cannot find EventSetup module to produce data of type "ZdcTopology" in
record "HcalRecNumberingRecord" with product label "".
Please add an ESSource or ESProducer to your job which can deliver this data.
----- End Fatal Exception -------------------------------------------------

@bsunanda
Copy link
Contributor Author

bsunanda commented Oct 8, 2024 via email

@Dr15Jones
Copy link
Contributor

I was able to get the full script to run by adding

process.load("Geometry.ForwardGeometry.ZdcGeometry_cfi")

to geometryExtended2024DD4hep_writer.py

@perrotta
Copy link
Contributor

perrotta commented Oct 8, 2024

Thank you @makortel and @Dr15Jones for the big debug effort!
This is going to save the possibility to implement ZDC geometry updates for this year HI data taking!

@makortel
Copy link
Contributor

makortel commented Oct 8, 2024

Just for completeness, running valgrind (without #46305) did not reveal anything new.

@bsunanda
Copy link
Contributor Author

bsunanda commented Oct 9, 2024

With the proposed corrections to HcalTopology (+ other changes needed to this class), and correcting the scenario description in Configuration/Geometry, the payload creation has been done for 2024. So this issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants