Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLANG_X] Segmentation violation in ThePEG::EventGenerator::doinit #45872

Open
iarspider opened this issue Sep 4, 2024 · 31 comments
Open

[CLANG_X] Segmentation violation in ThePEG::EventGenerator::doinit #45872

iarspider opened this issue Sep 4, 2024 · 31 comments

Comments

@iarspider
Copy link
Contributor

RelVals 535.0, 537.0, 538.0 failed with SIGSEGV in ThePEG::EventGenerator::doinit:

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Wed Sep  4 07:50:25 CEST 2024
Thread 5 (Thread 0x14cb331ff700 (LWP 864038) "cmsRun"):
#0  0x000014cb6a747ac1 in poll () from /lib64/libc.so.6
#1  0x000014cb65f784bd in (anonymous namespace)::full_read(int, char*, unsigned long, int) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x000014cb65f77f54 in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  0x000014cb65f778bf in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x000014cb2f995a15 in (anonymous namespace)::recursionNotNull(ThePEG::Pointer::TransientConstRCPtr<ThePEG::PartonBin>, ThePEG::Pointer::TransientConstRCPtr<ThePEG::Particle>) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#6  0x000014cb2f9a9265 in ThePEG::LesHouchesReader::createPartonBinInstances() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#7  0x000014cb2f9a2c66 in ThePEG::LesHouchesReader::getXComb() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#8  0x000014cb2f9a2ec4 in ThePEG::LesHouchesReader::getSubProcess() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#9  0x000014cb2f9a4307 in ThePEG::LesHouchesReader::readEvent() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#10 0x000014cb2f99d754 in ThePEG::LesHouchesReader::scan() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#11 0x000014cb2f9a1e42 in ThePEG::LesHouchesReader::initialize(ThePEG::LesHouchesEventHandler&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#12 0x000014cb2f9cbd59 in ThePEG::LesHouchesFileReader::initialize(ThePEG::LesHouchesEventHandler&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#13 0x000014cb2f9d8466 in ThePEG::LesHouchesEventHandler::initialize() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#14 0x000014cb2fbf63a5 in ThePEG::EventGenerator::doinit() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#15 0x000014cb2fbf9ba5 in ThePEG::EventGenerator::setup(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::set<ThePEG::Pointer::RCPtr<ThePEG::InterfacedBase>, std::less<ThePEG::Pointer::RCPtr<ThePEG::InterfacedBase> >, std::allocator<ThePEG::Pointer::RCPtr<ThePEG::InterfacedBase> > >&, std::map<long, ThePEG::Pointer::RCPtr<ThePEG::ParticleData>, std::less<long>, std::allocator<std::pair<long const, ThePEG::Pointer::RCPtr<ThePEG::ParticleData> > > >&, std::set<ThePEG::Pointer::RCPtr<ThePEG::MatcherBase>, std::less<ThePEG::Pointer::RCPtr<ThePEG::MatcherBase> >, std::allocator<ThePEG::Pointer::RCPtr<ThePEG::MatcherBase> > >&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#16 0x000014cb2fc3dd6a in ThePEG::Repository::makeRun(ThePEG::Pointer::TransientRCPtr<ThePEG::EventGenerator>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#17 0x000014cb2fc4057c in ThePEG::Repository::exec(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::ostream&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#18 0x000014cb2fc40f6f in ThePEG::Repository::execAndCheckReply(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::ostream&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#19 0x000014cb2fc41279 in ThePEG::Repository::read(std::istream&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#20 0x000014cb2fc416dd in ThePEG::Repository::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::ostream&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#21 0x000014cb31305416 in (anonymous namespace)::HerwigGenericRead(Herwig::HerwigUI const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libHerwigAPI.so.2
#22 0x000014cb318a94e8 in Herwig7Interface::callHerwigGenerator() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libGeneratorInterfaceHerwig7Interface.so
#23 0x000014cb318a7a50 in Herwig7Interface::initRepository(edm::ParameterSet const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libGeneratorInterfaceHerwig7Interface.so
#24 0x000014cb318f8568 in Herwig7Hadronizer::initializeForExternalPartons() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#25 0x000014cb319068be in edm::HadronizerFilter<Herwig7Hadronizer, gen::ExternalDecayDriver>::beginLuminosityBlockProduce(edm::LuminosityBlock&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#26 0x000014cb6d2d5fd8 in edm::one::EDFilterBase::doBeginLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#27 0x000014cb6d2bf7dd in edm::WorkerT<edm::one::EDFilterBase>::implDoBegin(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#28 0x000014cb6d194745 in edm::workerhelper::CallImpl<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::call(edm::Worker*, edm::StreamID, edm::LumiTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::GlobalContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#29 0x000014cb6d19460a in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#30 0x000014cb6d194401 in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#31 0x000014cb6d192fdb in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#32 0x000014cb6d193a0f in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#33 0x000014cb6d1937c5 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#34 0x000014cb6cee13d9 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::$_0>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#35 0x000014cb6b8b5b3b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x14cb68f0a200, waiter=..., this=0x14cb68fc9500) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#36 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x14cb68fc9500) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#37 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.cpp:137
#38 tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/market.cpp:599
#39 0x000014cb6b8b7cee in tbb::detail::r1::rml::private_worker::run (this=0x14cb66874000) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#40 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x14cb66874000) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#41 0x000014cb6a9f31ca in start_thread () from /lib64/libpthread.so.0
#42 0x000014cb6a64e8d3 in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x14cb34198700 (LWP 864037) "cmsRun"):
#0  0x000014cb6a71d098 in nanosleep () from /lib64/libc.so.6
#1  0x000014cb6a71cf9e in sleep () from /lib64/libc.so.6
#2  0x000014cb65f77544 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000014cb6a64e41d in syscall () from /lib64/libc.so.6
#5  0x000014cb6b8b7fd2 in tbb::detail::r1::futex_wait (comparand=2, futex=0x14cb66874124) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:100
#6  tbb::detail::r1::binary_semaphore::P (this=0x14cb66874124) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:253
#7  tbb::detail::r1::rml::internal::thread_monitor::wait (this=0x14cb66874120) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/rml_thread_monitor.h:235
#8  tbb::detail::r1::rml::private_worker::run (this=0x14cb66874100) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:273
#9  tbb::detail::r1::rml::private_worker::thread_routine (arg=0x14cb66874100) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#10 0x000014cb6a9f31ca in start_thread () from /lib64/libpthread.so.0
#11 0x000014cb6a64e8d3 in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x14cb34b99700 (LWP 864036) "cmsRun"):
#0  0x000014cb6a71d098 in nanosleep () from /lib64/libc.so.6
#1  0x000014cb6a71cf9e in sleep () from /lib64/libc.so.6
#2  0x000014cb65f77544 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000014cb6a64e41d in syscall () from /lib64/libc.so.6
#5  0x000014cb6b8b7fd2 in tbb::detail::r1::futex_wait (comparand=2, futex=0x14cb668740a4) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:100
#6  tbb::detail::r1::binary_semaphore::P (this=0x14cb668740a4) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:253
#7  tbb::detail::r1::rml::internal::thread_monitor::wait (this=0x14cb668740a0) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/rml_thread_monitor.h:235
#8  tbb::detail::r1::rml::private_worker::run (this=0x14cb66874080) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:273
#9  tbb::detail::r1::rml::private_worker::thread_routine (arg=0x14cb66874080) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#10 0x000014cb6a9f31ca in start_thread () from /lib64/libpthread.so.0
#11 0x000014cb6a64e8d3 in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x14cb44347700 (LWP 864020) "cmsRun"):
#0  0x000014cb6a9fd6a2 in waitpid () from /lib64/libpthread.so.0
#1  0x000014cb65f781f1 in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x000014cb6b087a73 in std::execute_native_thread_routine (__p=0x14cb45794680) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#3  0x000014cb6a9f31ca in start_thread () from /lib64/libpthread.so.0
#4  0x000014cb6a64e8d3 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x14cb69b6b680 (LWP 863948) "cmsRun"):
#0  0x000014cb6a71d098 in nanosleep () from /lib64/libc.so.6
#1  0x000014cb6a71cf9e in sleep () from /lib64/libc.so.6
#2  0x000014cb65f77544 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000014cb6a64e41d in syscall () from /lib64/libc.so.6
#5  0x000014cb6b8bcca4 in tbb::detail::r1::futex_wait (comparand=2, futex=0x7ffebf455d90) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:100
#6  tbb::detail::r1::binary_semaphore::P (this=0x7ffebf455d90) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:253
#7  tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>::wait (this=0x7ffebf455d60) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/concurrent_monitor.h:170
#8  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (this=<optimized out>, node=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/concurrent_monitor.h:232
#9  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (node=..., this=0x14cb68fcb598) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/concurrent_monitor.h:228
#10 tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::wait<tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&>(tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&, tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>&&) (node=..., pred=<synthetic pointer>..., this=0x14cb68fcb598) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/concurrent_monitor.h:262
#11 tbb::detail::r1::sleep_waiter::sleep<tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}>(unsigned long, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}) (this=<optimized out>, wakeup_condition=..., uniq_tag=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/waiters.h:118
#12 tbb::detail::r1::external_waiter::pause (this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/waiters.h:144
#13 tbb::detail::r1::external_waiter::pause (this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/waiters.h:137
#14 tbb::detail::r1::task_dispatcher::receive_or_steal_task<true, tbb::detail::r1::external_waiter> (this=<optimized out>, tls=..., ed=..., waiter=..., isolation=<optimized out>, fifo_allowed=<optimized out>, critical_allowed=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:231
#15 0x000014cb6b8be4e2 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x0, this=0x14cb68fc9380) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:350
#16 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x14cb68fc9380) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#17 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#18 0x000014cb6d17f668 in void tbb::detail::d0::try_call_proxy<tbb::detail::d1::task_group_base::wait()::{lambda()#1}>::on_completion<tbb::detail::d1::task_group_base::wait()::{lambda()#2}>(tbb::detail::d1::task_group_base::wait()::{lambda()#2}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#19 0x000014cb6d17d9c5 in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#20 0x000014cb6d15baf0 in edm::EventProcessor::processRuns() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#21 0x000014cb6d158deb in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#22 0x0000563b49ef0b6e in tbb::detail::d1::task_arena_function<main::$_0::operator()() const::{lambda()#1}, void>::operator()() const ()
#23 0x000014cb6b8aa9ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.cpp:688
#24 0x0000563b49eefd06 in main::$_0::operator()() const ()
#25 0x0000563b49eed7ff in main ()

Current Modules:

Module: Herwig7HadronizerFilter:generator (crashed)
Module: none
Module: none
Module: none
@iarspider
Copy link
Contributor Author

assign generators

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 4, 2024

New categories assigned: generators

@bbilin,@mkirsano,@menglu21,@lviliani you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 4, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 4, 2024

A new Issue was created by @iarspider.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@lviliani
Copy link
Contributor

lviliani commented Sep 6, 2024

@Dominic-Stafford @theofil can you please take a look? It seems related to Herwig. Thanks!

@Dominic-Stafford
Copy link
Contributor

I'll try to have a look next week (unless @theofil has time before then). @iarspider has the CLANG/C++ version or anything else like this changed which might have triggered the issue?

@iarspider
Copy link
Contributor Author

has the CLANG/C++ version or anything else like this changed which might have triggered the issue?

I don't think so.

@theofil
Copy link
Contributor

theofil commented Sep 6, 2024

Hi, I'll will also try to look at it next week starting on Tuesday.

a) Could we get some instructions how to reproduce the problem in an lxplus session ?
b) was the gridpack changed, after the last version of the working RelVals that have been produced w/o problems ?

best,
K.

@makortel
Copy link
Contributor

makortel commented Sep 6, 2024

The recipe to reproduce would be along

cmssw-el8
cmsrel CMSSW_14_2_CLANG_X_2024-09-05-2300
cd CMSSW_14_2_CLANG_X_2024-09-05-2300/src
cmsenv
runTheMatrix.py -l 535.0 -t 4

@makortel
Copy link
Contributor

makortel commented Sep 6, 2024

has the CLANG/C++ version or anything else like this changed which might have triggered the issue?

I don't think so.

In case there is a connection to #45510 (as wondered in #45510 (comment)), we updated the C++ standard from 17 to 20 around Jul 12 (cms-sw/cmsdist#9288), and #45510 was opened on Jul 19. But I guess we have lost the information on whether the problem reported in #45510 started on 07-18-2300 IB or earlier (or on the first occurrence of the problem reported in this issue).

@theofil
Copy link
Contributor

theofil commented Sep 6, 2024

@makortel thanks for the code.

indeed the test is passed for CMSSW_14_2_X_2024-09-06-1100 but fails for CMSSW_14_2_CLANG_X_2024-09-05-2300 but both IBs have clang version 18.1.6, at least this is what I get from clang --version while on singularity

ideas ?

@makortel
Copy link
Contributor

makortel commented Sep 6, 2024

The only difference between the default IB and the CLANG IB is that in the default IB the CMSSW code is compiled with gcc, and in CLANG IB with clang. The externals should be compiled with gcc in both cases (and be the same binaries).

Some thoughts

  • Does the problem reproduce with one thread?
  • Is the input to ThePEG exactly the same between default and CLANG IBs? (the stack hints towards reading LHE file, so probably the input is the same?)
  • One could try gdb, but in order to be useful it might need ThePEG to be built with debug symbols
  • valgrind might reveal something useful (but will be slow)
    • One could also check if the problem reproduces with cmsRunGlibC or cmsRunTC (that use other allocators, if this is a memory problem they may behave differently, or even give some diagnostics)

@Dominic-Stafford
Copy link
Contributor

I've had a bit of a look at this, and found very weirdly that I can reproduce it locally when I do runTheMatrix.py -el 535, but then when I go into the run directory and rerun the cfg with cmsRun (for instance to try valgrind or gdb), I get a different error:

09-Sep-2024 15:10:42 CEST  Initiating request to open LHE file thread0/cmsgrid_final.lhe
09-Sep-2024 15:10:42 CEST  Successfully opened LHE file thread0/cmsgrid_final.lhe
09-Sep-2024 15:10:42 CEST  Initiating request to open LHE file thread0/cmsgrid_final.lhe
09-Sep-2024 15:10:42 CEST  Successfully opened LHE file thread0/cmsgrid_final.lhe
%MSG-w LogicError:  LheWeightValidation:lheWeightValidation@beginRun  09-Sep-2024 15:10:42 CEST Run: 1
::getByLabel: An attempt was made to read a Run product before endRun() was called.
The product is of type 'LHERunInfoProduct'.
The specified ModuleLabel was 'externalLHEProducer'.
The specified productInstanceName was ''.

%MSG
%MSG-w LogicError:  Herwig7HadronizerFilter:generator@beginRun  09-Sep-2024 15:10:42 CEST Run: 1
::getByLabel: An attempt was made to read a Run product before endRun() was called.
The product is of type 'LHERunInfoProduct'.
The specified ModuleLabel was 'externalLHEProducer'.
The specified productInstanceName was ''.

%MSG
* A warning exception occurred in the initialization of EventGenerator: 
No information about the energy of incoming particles were found in LesHouchesReader 'LesHouchesReader'.
* A warning exception occurred in the initialization of EventGenerator: 
No information about the weighting scheme was found. The events produced by LesHouchesReader LesHouchesReader may not be sampled correctly.
* A warning exception occurred in the initialization of EventGenerator: 
LesHouchesReader LesHouchesReader has the IDWTUP flag set to 0, which does not correspond
to the weight option -2 set in the LesHouchesEventHandler LesHouchesHandler.

Use the following handler setting instead:
  set LesHouchesHandler:WeightOption 0
Will try to make intelligent guesses to get correct statistics. In most cases this should be sufficient. Unset <interface>WeightWarnings</interface> to avoid this message
* A warning exception occurred in the initialization of EventGenerator: 
The file associated with 'LesHouchesReader' does not contain a proper formatted Les Houches event file. The events may not be properly sampled.
Error: The sum of the cross sections of the readers in the LesHouchesEventHandler 'LesHouchesHandler' was zero.
Error: The object '/Herwig/Partons/PDFSet_nnlo' was not created as another object with that name already exists.
Error: The object '/Herwig/Partons/PDFSet_lo' was not created as another object with that name already exists.
Error: The object '/Herwig/EventHandlers/LesHouchesHandler' was not created as another object with that name already exists.
Error: The object '/Herwig/Cuts/NoCuts' was not created as another object with that name already exists.
Error: The object '/Herwig/Partons/LHAPDF' was not created as another object with that name already exists.
Error: The object '/Herwig/EventHandlers/LesHouchesReader' was not created as another object with that name already exists.
* A warning exception occurred in the initialization of EventGenerator: 
No information about the energy of incoming particles were found in LesHouchesReader 'LesHouchesReader'.
* A warning exception occurred in the initialization of EventGenerator: 
No information about the weighting scheme was found. The events produced by LesHouchesReader LesHouchesReader may not be sampled correctly.
* A warning exception occurred in the initialization of EventGenerator: 
LesHouchesReader LesHouchesReader has the IDWTUP flag set to 0, which does not correspond
to the weight option -2 set in the LesHouchesEventHandler LesHouchesHandler.

Use the following handler setting instead:
  set LesHouchesHandler:WeightOption 0
Will try to make intelligent guesses to get correct statistics. In most cases this should be sufficient. Unset <interface>WeightWarnings</interface> to avoid this message
* A warning exception occurred in the initialization of EventGenerator: 
The file associated with 'LesHouchesReader' does not contain a proper formatted Les Houches event file. The events may not be properly sampled.
* A warning exception occurred in the initialization of EventGenerator: 
No information about the weighting scheme was found. The events produced by LesHouchesReader LesHouchesReader may not be sampled correctly.
* A warning exception occurred in the initialization of EventGenerator: 
LesHouchesReader LesHouchesReader has the IDWTUP flag set to 0, which does not correspond
to the weight option -2 set in the LesHouchesEventHandler LesHouchesHandler.

Use the following handler setting instead:
  set LesHouchesHandler:WeightOption 0
Will try to make intelligent guesses to get correct statistics. In most cases this should be sufficient. Unset <interface>WeightWarnings</interface> to avoid this message
* A warning exception occurred in the initialization of EventGenerator: 
The file associated with 'LesHouchesReader' does not contain a proper formatted Les Houches event file. The events may not be properly sampled.
Error: the optional weights names for the LesHouchesEventHandler do not match 'LesHouchesHandler'
Herwig: EventGenerator not available.
Check if 'InterfaceMatchboxTest.run' is a valid run file.


A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Mo 9. Sep 15:10:44 CEST 2024
Thread 2 (Thread 0x7f9450f75700 (LWP 1778195) "cmsRun"):
#0  0x00007f94784856a2 in waitpid () from /lib64/libpthread.so.0
#1  0x00007f9472d981f1 in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x00007f9478b0fa73 in std::execute_native_thread_routine (__p=0x7f945290f7a0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#3  0x00007f947847b1ca in start_thread () from /lib64/libpthread.so.0
#4  0x00007f94780d68d3 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f94775f3680 (LWP 1776592) "cmsRun"):
#0  0x00007f94781cfac1 in poll () from /lib64/libc.so.6
#1  0x00007f9472d984bd in (anonymous namespace)::full_read(int, char*, unsigned long, int) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x00007f9472d97f54 in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  0x00007f9472d978bf in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f9440a6a099 in (anonymous namespace)::HerwigGenericRun(Herwig::HerwigUI const&, bool) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-05-2300/external/el8_amd64_gcc12/lib/libHerwigAPI.so.2
#6  0x00007f9440a6be06 in Herwig::API::prepareRun(Herwig::HerwigUI const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-05-2300/external/el8_amd64_gcc12/lib/libHerwigAPI.so.2
#7  0x00007f9441016533 in Herwig7Interface::callHerwigGenerator() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libGeneratorInterfaceHerwig7Interface.so
#8  0x00007f944101682f in Herwig7Interface::initGenerator() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libGeneratorInterfaceHerwig7Interface.so
#9  0x00007f9441065577 in Herwig7Hadronizer::initializeForExternalPartons() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#10 0x00007f94410738be in edm::HadronizerFilter<Herwig7Hadronizer, gen::ExternalDecayDriver>::beginLuminosityBlockProduce(edm::LuminosityBlock&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#11 0x00007f947ad72fd8 in edm::one::EDFilterBase::doBeginLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#12 0x00007f947ad5c7dd in edm::WorkerT<edm::one::EDFilterBase>::implDoBegin(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#13 0x00007f947ac31745 in edm::workerhelper::CallImpl<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::call(edm::Worker*, edm::StreamID, edm::LumiTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::GlobalContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#14 0x00007f947ac3160a in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x00007f947ac31401 in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x00007f947ac2ffdb in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#17 0x00007f947ac30a0f in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#18 0x00007f947ac307c5 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#19 0x00007f947a97e3d9 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::$_0>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#20 0x00007f947935b3e1 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7f9475ed3e00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#21 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7f9475ed3e00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#22 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#23 0x00007f947ac1c668 in void tbb::detail::d0::try_call_proxy<tbb::detail::d1::task_group_base::wait()::{lambda()#1}>::on_completion<tbb::detail::d1::task_group_base::wait()::{lambda()#2}>(tbb::detail::d1::task_group_base::wait()::{lambda()#2}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#24 0x00007f947ac1a9c5 in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#25 0x00007f947abf8af0 in edm::EventProcessor::processRuns() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#26 0x00007f947abf5deb in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#27 0x000055ecea2a3b6e in tbb::detail::d1::task_arena_function<main::$_0::operator()() const::{lambda()#1}, void>::operator()() const ()
#28 0x00007f94793479ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.cpp:688
#29 0x000055ecea2a2d06 in main::$_0::operator()() const ()
#30 0x000055ecea2a07ff in main ()

Current Modules:

Module: Herwig7HadronizerFilter:generator (crashed)

A fatal system signal has occurred: segmentation violation
Segmentation fault (core dumped)

That seems to be that Herwig has crashed due to some issue reading the LHE file (though I'm not exactly sure why as the lhe file is the same as the one that ran without issues in a non-CLANG build), and then we keep going despite not having produced the run file, causing a seg fault). For this new issue I'd definitely lay the blame at the fact we skip over any errors from Herwig here -it may or may not be the root cause of the full issue, but it certainly makes it harder to debug. We've kept this block for a long time as it's supposed to get around an issue with Herwig being called before the externalLHEProducer, but I think we should really get rid of it, as running past errors where Herwig would just exit could be the root of what we're seeing here, then deal with the issue with the sequence of calls if it's still occuring. I probably won't have time to try this in the next few days, so if you have time @theofil that would be good, otherwise I'll try to by the end of the week.

@smuzaffar
Copy link
Contributor

smuzaffar commented Sep 11, 2024

I have checked our opensearch and found that workflow 535 ran successfully for CMSSW_14_1_CLANG_X_2024-07-11-2300 IB. The first failure was in CMSSW_14_1_CLANG_X_2024-07-12-2300 but the error code was 256 and many other workflows also failed with exit code 256 that day. The first day workflow 535 failed with this segmentation error (exit code 62720) was CMSSW_14_1_CLANG_X_2024-07-16-2300. cmssw changes between 2024-07-11-2300 to 2024-07-12-2300 should be while cmssw changes between 2024-07-12-2300 to 2024-07-16-2300

@theofil
Copy link
Contributor

theofil commented Sep 12, 2024

I haven't yet found the origin of the problem, but I can reply to this question:

  • Does the problem reproduce with one thread?

yes

@smuzaffar thanks a lot for the info. I see that the

91c2ca3

could relevant to the crash we see. I will try to have a look if this is really where the problem starts. Would replacing the relval_2017.py of the IB that is crashing, with the reval_2017.py from the working IB be a sensible check or there could be other things breaking behind ?

Apart from the software changes we see in the

e232a9b...c103a34

are there other differences between the 2 IBs for what concerns their builds ?

@smuzaffar
Copy link
Contributor

smuzaffar commented Sep 12, 2024

As @makortel mentioned we also have updated c++ standard (to c++20) for July 12th IB.

By the way, build herwig7 and sherpa in debug mode, I get this stacktrace for workflow 535/step1

#3  0x00007fa46a6138bf in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007fa436425a15 in std::_Vector_base<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::_Vector_impl_data::_Vector_impl_data (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:100
#6  std::_Vector_base<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::_Vector_impl::_Vector_impl (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:139
#7  std::_Vector_base<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::_Vector_base (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:312
#8  std::vector<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::vector (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:526
#9  ThePEG::Particle::parents (this=<optimized out>) at ../include/ThePEG/EventRecord/Particle.h:159
#10 (anonymous namespace)::recursionNotNull (bin=..., p=...) at LesHouchesReader.cc:719
#11 0x00007fa436439265 in ThePEG::LesHouchesReader::createPartonBinInstances (this=0x7fa433865000) at LesHouchesReader.cc:731
#12 0x00007fa436432c66 in ThePEG::LesHouchesReader::getXComb (this=0x7fa433865000) at LesHouchesReader.cc:443
#13 0x00007fa436432ec4 in ThePEG::LesHouchesReader::getSubProcess (this=0x7fa433865000) at LesHouchesReader.cc:458
#14 0x00007fa436434307 in ThePEG::LesHouchesReader::readEvent (this=0x7fa433865000) at LesHouchesReader.cc:576
#15 0x00007fa43642d754 in ThePEG::LesHouchesReader::scan (this=0x7fa433865000) at LesHouchesReader.cc:305
#16 0x00007fa436431e42 in ThePEG::LesHouchesReader::initialize (this=<optimized out>, eh=...) at LesHouchesReader.cc:272
#17 0x00007fa43645bd59 in ThePEG::LesHouchesFileReader::initialize (this=0x7fa433865000, eh=...) at LesHouchesFileReader.cc:462
#18 0x00007fa436468466 in ThePEG::LesHouchesEventHandler::initialize (this=0x7fa40a74c400) at LesHouchesEventHandler.cc:87
#19 0x00007fa436686375 in ThePEG::EventGenerator::doinit (this=0x7fa44bc0ac00) at EventGenerator.cc:262
#20 0x00007fa436689b75 in ThePEG::InterfacedBase::init (this=0x7fa44bc0ac00) at ../include/ThePEG/Interface/InterfacedBase.h:246
#21 ThePEG::EventGenerator::setup (this=this@entry=0x7fa44bc0ac00, newRunName=..., newObjects=..., newParticles=..., newMatchers=...) at EventGenerator.cc:175
#22 0x00007fa4366cdd3a in ThePEG::Repository::makeRun (eg=..., name=...) at Repository.cc:316
#23 0x00007fa4366d054c in ThePEG::Repository::exec (command=..., os=...) at Repository.cc:786
#24 0x00007fa4366d0f3f in ThePEG::Repository::execAndCheckReply (line=..., os=...) at Repository.cc:510
#25 0x00007fa4366d1249 in ThePEG::Repository::read (is=..., os=..., prompt=...) at Repository.cc:566
#26 0x00007fa4366d16ad in ThePEG::Repository::read (filename=..., os=...) at Repository.cc:452
#27 0x00007fa437d953b9 in (anonymous namespace)::HerwigGenericRead (ui=...) at HerwigAPI.cc:146
#28 0x00007fa43833f4e8 in Herwig7Interface::callHerwigGenerator (this=this@entry=0x7fa43feb1190) at src/GeneratorInterface/Herwig7Interface/src/Herwig7Interface.cc:149
#29 0x00007fa43833da50 in Herwig7Interface::initRepository (this=0x7fa43feb1190, pset=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/src/FWCore/MessageLogger/interface/MessageLogger.h:78
#30 0x00007fa43838e568 in Herwig7Hadronizer::initializeForExternalPartons (this=this@entry=0x7fa43feb10a0) at src/GeneratorInterface/Herwig7Interface/plugins/Herwig7Hadronizer.cc:109
#31 0x00007fa43839c8be in edm::HadronizerFilter<Herwig7Hadronizer, gen::ExternalDecayDriver>::beginLuminosityBlockProduce (this=0x7fa43feb1000, lumi=..., es=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/src/GeneratorInterface/Core/interface/HadronizerFilter.h:367
#32 0x00007fa473842fd8 in edm::one::EDFilterBase::doBeginLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#33 0x00007fa47382c7dd in edm::WorkerT<edm::one::EDFilterBase>::implDoBegin(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so

@smuzaffar
Copy link
Contributor

Note that all these failing workflows (511, 535, 537, 538 and 539) in CLANG IBs are herwig7

@smuzaffar
Copy link
Contributor

One could also check if the problem reproduces with cmsRunGlibC or cmsRunTC (that use other allocators, if this is a memory problem they may behave differently, or even give some diagnostics)

failed for both cmsRunGlibC or cmsRunTC.
Also failed in single thread mode

@theofil
Copy link
Contributor

theofil commented Sep 13, 2024

I had very little progress so far unfortunately.

I compiled two versions of Herwig under

CMSSW_14_2_X_2024-09-06-1100
CMSSW_14_2_CLANG_X_2024-09-05-2300

and run standalone Herwig MC generation, checking if we can generate simple processes without reading external LHE files. We cannot generate any event in CMSSW_14_2_CLANG_X_2024-09-05-2300 we get immediately a segmentation fault while attempting to make the 1st event, but everything is OK in CMSSW_14_2_X_2024-09-06-1100 and MC generation finishes normally. This confirms what earlier Dominic said, despite that in the first error messages we see complains about reading LHE files, this has nothing to do with the crash we have later on. (Actually we get these messages even when things work.)

While compiling the code in the two releases, I see many warnings regarding to the ThePEG regarding arithmetic operations that I was not used to see before, but that all seem innocent in the CMSSW_14_2_X_2024-09-06-1100 warnings case.

However in the CMSSW_14_2_CLANG_X_2024-09-05-2300.txt warnings we see for fist time warning regarding the creation of the RCPtr pointer in particular the /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/external/thepeg/2.2.2-330d679d0765729c295842b54c3a747c/include/ThePEG/Pointer/RCPtr.h:152:15: note: in implicit copy constructor for 'ThePEG::EventInfoBase' first required here 152 | ptr = new T(t);

which later appears in the crash messages.

This to me confirms that the problem is not related with the CMSSW HerwigInterface and there is not much we can do there, but rather with the external package ThePEG, which is needed by Herwig generator.

Is there a reason why we build CMSSW with clang while the external packages, like ThePEG are still built with gcc ? Is it sound to use the same binary of the ThePEG in the two cases ? Is it possible to try to have the ThePEG built also with clang instead of gcc when CMSSW is built with clang ?

@Dominic-Stafford
Copy link
Contributor

Hi, I've had a bit more of a look at this. The first thing to note is it's happening also for standalone Herwig (no external LHE), and there it's happening consistently, so the block in the interface I mentioned earlier definitely isn't to blame. Running with valgrind, it gives the same segfault, but before that it gives a warning about Unsupported clone() flags: 0x311, I'm not sure if that's at all enlightening. But I second @theofil's conclusion, this seems most likely to be coming from building the clang CMSSW against gcc externals, is it possible to have a consistent build?

@dan131riley
Copy link

The Unsupported clone() message comes from the invocation of the CMSSW signal handler after the segfault--so it isn't at all enlightening.

@makortel
Copy link
Contributor

However in the CMSSW_14_2_CLANG_X_2024-09-05-2300.txt warnings we see for fist time warning regarding the creation of the RCPtr pointer in particular the /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/external/thepeg/2.2.2-330d679d0765729c295842b54c3a747c/include/ThePEG/Pointer/RCPtr.h:152:15: note: in implicit copy constructor for 'ThePEG::EventInfoBase' first required here 152 | ptr = new T(t);

This is a compilation warning about ThePEG::EventInfoBase relying on implicitly generated copy constructor, that is deprecated (already since C++11) when the class has a user-defined copy assignment operator. I see the EventInfoBase has

  EventInfoBase & operator=(const EventInfoBase &) = delete;

so the warning is effectively asking to add an accompanying

  EventInfoBase (const EventInfoBase &) = default;

I'd suggest to define the move constructor and assignment operator as well.

I don't see how these warnings could affect the generated binaries though.

On the other hand, some other questions came to my mind. @smuzaffar Did we rebuild all/relevant the externals with C++20? (or did we inherit those from the CPP20 IB flavor "for free"?)

On a quick look it also seems to me we are not passing the C++ standard to ThePEG. Should we?

@smuzaffar
Copy link
Contributor

@smuzaffar Did we rebuild all/relevant the externals with C++20? (or did we inherit those from the CPP20 IB flavor "for free"?)

we might have picked them from CPP20 IB flavor but by now most of externals should have been rebuilt as we have change a lot base level packages which should have re-trigger the build of these externals.

we are not passing the C++ standard to ThePEG. Should we?

I can open a cmsdist PR to test it

@smuzaffar
Copy link
Contributor

Failing CLANG IB relvals passed when ThePeg and Herwig7 were built with c++20. Only enabling c++20 for ThePeg fixes Relvals 511.0, 535.0, 539.0 (537.0, 538.0 were still failing with error [a]). Building both ThePeg and Herwig7 with c++20 has fixed all the failing relvals.

[a]

#5  0x0000152885494025 in (anonymous namespace)::recursionNotNull(ThePEG::Pointer::TransientConstRCPtr<ThePEG::PartonBin>, ThePEG::Pointer::TransientConstRCPtr<ThePEG::Particle>) () from /cvmfs/cms-ci.cern.ch/week0/PR_1b470bab/el8_amd64_gcc12/external/herwig7/7.2.2-a3bc7fe13a4e477f3a15bc0323bfac9d/lib/Herwig/HwFxFx.so
#6  0x00001528854a78c5 in ThePEG::FxFxReader::createPartonBinInstances() () from /cvmfs/cms-ci.cern.ch/week0/PR_1b470bab/el8_amd64_gcc12/external/herwig7/7.2.2-a3bc7fe13a4e477f3a15bc0323bfac9d/lib/Herwig/HwFxFx.so
#7  0x00001528854a11e6 in ThePEG::FxFxReader::getXComb() () from /cvmfs/cms-ci.cern.ch/week0/PR_1b470bab/el8_amd64_gcc12/external/herwig7/7.2.2-a3bc7fe13a4e477f3a15bc0323bfac9d/lib/Herwig/HwFxFx.so
#8  0x00001528854a1444 in ThePEG::FxFxReader::getSubProcess() () from /cvmfs/cms-ci.cern.ch/week0/PR_1b470bab/el8_amd64_gcc12/external/herwig7/7.2.2-a3bc7fe13a4e477f3a15bc0323bfac9d/lib/Herwig/HwFxFx.so
#9  0x00001528854a2907 in ThePEG::FxFxReader::readEvent() () from /cvmfs/cms-ci.cern.ch/week0/PR_1b470bab/el8_amd64_gcc12/external/herwig7/7.2.2-a3bc7fe13a4e477f3a15bc0323bfac9d/lib/Herwig/HwFxFx.so
#10 0x000015288549bd66 in ThePEG::FxFxReader::scan() () from /cvmfs/cms-ci.cern.ch/week0/PR_1b470bab/el8_amd64_gcc12/external/herwig7/7.2.2-a3bc7fe13a4e477f3a15bc0323bfac9d/lib/Herwig/HwFxFx.so
#11 0x00001528854a03c2 in ThePEG::FxFxReader::initialize(ThePEG::FxFxEventHandler&) () from /cvmfs/cms-ci.cern.ch/week0/PR_1b470bab/el8_amd64_gcc12/external/herwig7/7.2.2-a3bc7fe13a4e477f3a15bc0323bfac9d/lib/Herwig/HwFxFx.so
#12 0x00001528854779c9 in ThePEG::FxFxFileReader::initialize(ThePEG::FxFxEventHandler&) () from /cvmfs/cms-ci.cern.ch/week0/PR_1b470bab/el8_amd64_gcc12/external/herwig7/7.2.2-a3bc7fe13a4e477f3a15bc0323bfac9d/lib/Herwig/HwFxFx.so
#13 0x000015288548614e in ThePEG::FxFxEventHandler::initialize() () from /cvmfs/cms-ci.cern.ch/week0/PR_1b470bab/el8_amd64_gcc12/external/herwig7/7.2.2-a3bc7fe13a4e477f3a15bc0323bfac9d/lib/Herwig/HwFxFx.so
#14 0x000015288c3ff2e6 in ThePEG::EventGenerator::doinit() () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9456/42019/CMSSW_14_2_CLANG_X_2024-10-06-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30

@makortel
Copy link
Contributor

makortel commented Oct 8, 2024

I have a feeling this discovery suggests we should, in principle, compile all CMSSW and externals with as close compilation flags as possible.

@dan131riley
Copy link

I have a feeling this discovery suggests we should, in principle, compile all CMSSW and externals with as close compilation flags as possible.

I think it's really weird that c++20 vs c++1x changes this behavior. To me this smells like undefined behavior somewhere the initialization process.

@makortel
Copy link
Contributor

makortel commented Oct 8, 2024

I have a feeling this discovery suggests we should, in principle, compile all CMSSW and externals with as close compilation flags as possible.

I think it's really weird that c++20 vs c++1x changes this behavior.

We have some code in headers that does one thing for one C++ version, and a different thing for an earlier C++ version. I wouldn't be surprised that other libraries, including libstdc++, would do similar things. This setup can result in different code being generated for the "same" function in different shared objects, leading to an ODR violation.

@smuzaffar
Copy link
Contributor

I do see some compilation warnings [a] when we build ThePEG. May be these points to some code issue in thepeg itself

[a]

AbstractSSSSVertex.cc:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
FastJetFinder.cc:129:24: warning: 'recomb_scheme' may be used uninitialized [-Wmaybe-uninitialized]
LesHouchesReader.h:508:16: warning: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'class std::__cxx11::basic_string<char>' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
LesHouchesReader.h:508:16: warning: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'class std::map<std::__cxx11::basic_string<char>, double>' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
LesHouchesReader.h:508:16: warning: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'struct std::pair<double, double>' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
LesHouchesReader.h:508:16: warning: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'struct std::pair<int, int>' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
Matcher.tcc:19:1: warning: pointer may be used after 'void operator delete(void*, std::size_t)' [-Wuse-after-free]
ThePEG/Pointer/RCPtr.h:103:7: warning: pointer used after 'void operator delete(void*, std::size_t)' [-Wuse-after-free]
ThePEG/Pointer/RCPtr.h:302:52: warning: pointer may be used after 'void operator delete(void*, std::size_t)' [-Wuse-after-free]
ThePEG/Pointer/ReferenceCounted.h:88:7: warning: pointer used after 'void operator delete(void*, std::size_t)' [-Wuse-after-free]
ThePEG/Vectors/LorentzVector.h:729:51: warning: arithmetic between enumeration type 'ThePEG::Direction<0>::Dir' and floating-point type 'double' is deprecated [-Wdeprecated-enum-float-conversion]
ThePEG/Vectors/LorentzVector.h:740:61: warning: arithmetic between enumeration type 'ThePEG::Direction<0>::Dir' and floating-point type 'double' is deprecated [-Wdeprecated-enum-float-conversion]

@makortel
Copy link
Contributor

makortel commented Oct 8, 2024

Most of those warnings are quite scary...

@Dominic-Stafford
Copy link
Contributor

Thanks for getting to the bottom of this @smuzaffar. I'll check I can reproduce these errors in a stand-alone install and if so contact the Herwig authors about these warnings - I think at least some of them may have been resolved in more recent releases, maybe we can get them to give us a patch to fix them in 7.2. For reference, which gcc version did you use to build cmsdist?

@smuzaffar
Copy link
Contributor

For reference, which gcc version did you use to build cmsdist?

I used our default gcc i.e 12.3.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants