Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SegFaults in TensorFlow usage by L1TMuonEndCapTrackProducer #6618

Closed
Dr15Jones opened this issue Feb 3, 2021 · 8 comments
Closed

SegFaults in TensorFlow usage by L1TMuonEndCapTrackProducer #6618

Dr15Jones opened this issue Feb 3, 2021 · 8 comments

Comments

@Dr15Jones
Copy link

In CMSSW_11_3_X_2021-02-03-0800 we are seeing many release validation jobs failing with segmentation faults coming from the use of TensorFlow called from L1TMuonEndCapTrackProducer.

@Dr15Jones
Copy link
Author

An example stack trace can be seen here:https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc900/CMSSW_11_3_X_2021-02-03-0800/pyRelValMatrixLogs/run/136.779_RunMuOnia2016H+RunMuOnia2016H+HLTDR2_2016+RECODR2_2016reHLT_skimMuOnia_Prompt+HARVESTDR2/step2_RunMuOnia2016H+RunMuOnia2016H+HLTDR2_2016+RECODR2_2016reHLT_skimMuOnia_Prompt+HARVESTDR2.log#/214-214

#3  0x00002b57c070d7d4 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002b57fcc3d8bd in tensorflow::Tensor::AllocatedBytes() const () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/external/slc7_amd64_gcc900/lib/libtensorflow_framework.so.2
#6  0x00002b57fa9edd81 in tensorflow::DirectSession::Run(tensorflow::RunOptions const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::RunMetadata*, tensorflow::thread::ThreadPoolOptions const&) () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/external/slc7_amd64_gcc900/lib/libtensorflow_cc.so.2
#7  0x00002b57f1bcba9a in tensorflow::run(tensorflow::Session*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::thread::ThreadPoolOptions const&) () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/lib/slc7_amd64_gcc900/libPhysicsToolsTensorFlow.so
#8  0x00002b57f1bcbac9 in tensorflow::run(tensorflow::Session*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::thread::ThreadPoolInterface*) () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/lib/slc7_amd64_gcc900/libPhysicsToolsTensorFlow.so
#9  0x00002b57f1bcbb2a in tensorflow::run(tensorflow::Session*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/lib/slc7_amd64_gcc900/libPhysicsToolsTensorFlow.so
#10 0x00002b57f1b0ee3d in PtAssignmentEngineDxy::call_tensorflow_dxy(std::array<float, 23ul> const&, std::array<float, 2ul>&) const () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/lib/slc7_amd64_gcc900/libL1TriggerL1TMuonEndCap.so
#11 0x00002b57f1b04282 in PtAssignment::process(std::vector<l1t::EMTFTrack, std::allocator<l1t::EMTFTrack> >&) () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/lib/slc7_amd64_gcc900/libL1TriggerL1TMuonEndCap.so
#12 0x00002b57f1b114ff in SectorProcessor::process_single_bx(int, std::vector<L1TMuon::TriggerPrimitive, std::allocator<L1TMuon::TriggerPrimitive> > const&, std::vector<l1t::EMTFHit, std::allocator<l1t::EMTFHit> >&, std::vector<l1t::EMTFTrack, std::allocator<l1t::EMTFTrack> >&, std::deque<std::vector<l1t::EMTFHit, std::allocator<l1t::EMTFHit> >, std::allocator<std::vector<l1t::EMTFHit, std::allocator<l1t::EMTFHit> > > >&, std::deque<std::vector<l1t::EMTFTrack, std::allocator<l1t::EMTFTrack> >, std::allocator<std::vector<l1t::EMTFTrack, std::allocator<l1t::EMTFTrack> > > >&, std::map<std::array<int, 3ul>, int, std::less<std::array<int, 3ul> >, std::allocator<std::pair<std::array<int, 3ul> const, int> > >&) const () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/lib/slc7_amd64_gcc900/libL1TriggerL1TMuonEndCap.so
#13 0x00002b57f1b11dcd in SectorProcessor::process(edm::EventID const&, std::vector<L1TMuon::TriggerPrimitive, std::allocator<L1TMuon::TriggerPrimitive> > const&, std::vector<l1t::EMTFHit, std::allocator<l1t::EMTFHit> >&, std::vector<l1t::EMTFTrack, std::allocator<l1t::EMTFTrack> >&) const () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/lib/slc7_amd64_gcc900/libL1TriggerL1TMuonEndCap.so
#14 0x00002b57f1b26a10 in TrackFinder::process(edm::Event const&, edm::EventSetup const&, std::vector<l1t::EMTFHit, std::allocator<l1t::EMTFHit> >&, std::vector<l1t::EMTFTrack, std::allocator<l1t::EMTFTrack> >&) () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/lib/slc7_amd64_gcc900/libL1TriggerL1TMuonEndCap.so
#15 0x00002b5806faed7c in L1TMuonEndCapTrackProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02666/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-03-0800/lib/slc7_amd64_gcc900/pluginL1TriggerL1TMuonEndCapPlugins.so

@Dr15Jones
Copy link
Author

assign l1

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2021

New categories assigned: l1

@rekovic,@jmduarte you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2021

A new Issue was created by @Dr15Jones Chris Jones.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@smuzaffar
Copy link
Contributor

integration of cms-sw/cmssw#32641 and cms-data/L1Trigger-L1TMuon#19 might be the reason behind these failures

@mrodozov
Copy link
Contributor

this was fixed ?

@eyigitba
Copy link

@mrodozov yes this specific instance is fixed. Although there is still a discussion ongoing in cms-sw/cmssw#32894.

@smuzaffar
Copy link
Contributor

closing this in favor of cms-sw/cmssw#32894 issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants