Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random comparison difference in HLT/SiStrip/ControlView #41200

Open
makortel opened this issue Mar 27, 2023 · 21 comments
Open

Random comparison difference in HLT/SiStrip/ControlView #41200

makortel opened this issue Mar 27, 2023 · 21 comments

Comments

@makortel
Copy link
Contributor

The HLT/SiStrip/ControlView/{ClusterStoNCorr_OnTrack_FECCratevsFECSlot,ClusterStoNCorr_OnTrack_FECSlotVsFECRing_TECP} histograms showed differences in workflow 11634.911 in PR tests of #41186 (comment) . The PR itself is very unlikely to be the cause of the differences. The differences have also not been visible in other recent PR tests, so these differences have likely random origin. The purpose of this issue is to nevertheless document them, in case they are visible in other tests later on.
image
image

The 11634.911 is the DD4Hep workflow that, IIUC, reads the geometry from the XML file instead from the CondDB. These differences may be evidence of some rare non-reproducibility in DD4Hep code path (that we have observed, but not really solved, before).

@makortel
Copy link
Contributor Author

assign geometry

@cmsbuild
Copy link
Contributor

New categories assigned: geometry

@mdhildreth,@Dr15Jones,@makortel,@bsunanda,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@VinInn
Copy link
Contributor

VinInn commented Mar 28, 2023

is architecture dependence excluded (aka INTEL vs AMD)?

@makortel
Copy link
Contributor Author

Good point. I checked the PR test and baseline runTheMatrix output of #41186 (comment), and both were run on Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz.

@VinInn
Copy link
Contributor

VinInn commented Mar 28, 2023

I'm running valgrind on step1 and see a bunch of these

==1882951== Invalid read of size 8
==1882951==    at 0x40F3AA2E: vecgeom::cxx::CommonUnplacedVolumeImplHelper<vecgeom::cxx::PolyhedronImplementation<(EInnerRadii)0, (EPhiCutout)0>, vecgeom::cxx::VUnplacedVolume>::SafetyToIn(vecgeom::cxx::Vector3D<double> const&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951==    by 0x40EF79F6: G4UAdapter<vecgeom::cxx::UnplacedPolyhedron>::DistanceToIn(CLHEP::Hep3Vector const&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951==    by 0x40FC659C: G4VoxelNavigation::ComputeStep(CLHEP::Hep3Vector const&, CLHEP::Hep3Vector const&, double, double&, G4NavigationHistory&, bool&, CLHEP::Hep3Vector&, bool&, bool&, G4VPhysicalVolume**, int&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951==    by 0x40CDF04A: G4Navigator::ComputeStep(CLHEP::Hep3Vector const&, CLHEP::Hep3Vector const&, double, double&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951==    by 0x40E52FA6: G4Transportation::AlongStepGetPhysicalInteractionLength(G4Track const&, double, double, double&, G4GPILSelection*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951==    by 0x40E4B87B: G4TrackingManager::ProcessOneTrack(G4Track*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951==    by 0x40C38D19: G4EventManager::DoProcessing(G4Event*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951==    by 0x40989BB9: RunManagerMTWorker::produce(edm::Event const&, edm::EventSetup const&, RunManagerMT&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951==    by 0x40995831: omt::ThreadHandoff::Functor<OscarMTProducer::produce(edm::Event&, edm::EventSetup const&)::{lambda()#1}>::execute() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951==    by 0x4097A919: omt::ThreadHandoff::threadLoop(void*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951==    by 0x70861C9: start_thread (in /usr/lib64/libpthread-2.28.so)
==1882951==    by 0x72D7E72: clone (in /usr/lib64/libc-2.28.so)
==1882951==  Address 0x54664188 is 24 bytes before an unallocated block of size 0 in arena "client"
==1882951==

need to be understood if related specifically to DD4HEP

@VinInn
Copy link
Contributor

VinInn commented Mar 28, 2023

my valgrind command

valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --tool=memcheck \
--suppressions=$ROOTSYS/etc/valgrind-root.supp \
--suppressions=$CMSSW_RELEASE_BASE/src/Utilities/ReleaseScripts/data/cms-valgrind-memcheck.supp cmsRun $1

@VinInn
Copy link
Contributor

VinInn commented Mar 28, 2023

not sure if the valgrind report is actually still related to this
https://sft.its.cern.ch/jira/projects/VECGEOM/issues/VECGEOM-600?filter=allopenissues

@civanch
Copy link
Contributor

civanch commented Mar 28, 2023

@VinInn , this issue was understood as a compiler bug when -O3 optimisation is used. The solution was to use -O2 optimisation for VecGeom. However, I am not sure if the problem Matti are reporting here is the same.

@VinInn
Copy link
Contributor

VinInn commented Mar 28, 2023

I got the report above in latest 13_1_X nighty. Maybe understood, not solved apparently.
VecGeom not vectorized is a bit incongruous...

@VinInn
Copy link
Contributor

VinInn commented Mar 28, 2023

one more valgrind message in step2 (most probably for a different issue)

==1891808== Invalid free() / delete / delete[] / realloc()
==1891808==    at 0x403BF6C: free (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/external/valgrind/3.17.0-7ca83817e7379e83453f913e11e14834/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1891808==    by 0x48F90DDB: edm::Wrapper<ZVertexSoAHeterogeneousHost<131072> >::~Wrapper() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libCUDADataFormatsVe
rtex.so)
==1891808==    by 0x48F90DF3: edm::Wrapper<ZVertexSoAHeterogeneousHost<131072> >::~Wrapper() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libCUDADataFormatsVe
rtex.so)
==1891808==    by 0x4DD56FA: edm::productholderindexhelper::getContainedTypeFromWrapper(edm::TypeID const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-0277
8/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libDataFormatsProvenance.so)
==1891808==    by 0x4DDB31F: edm::ProductRegistry::initializeLookupTables(std::set<edm::TypeID, std::less<edm::TypeID>, std::allocator<edm::TypeID> > const*, std::set<edm::TypeID, std::less<edm::TypeID>, std::allocator<edm::TypeID> > const*,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libDataFormatsProvenance.
so)
==1891808==    by 0x4DD15EF: edm::ProductRegistry::setFrozen(std::set<edm::TypeID, std::less<edm::TypeID>, std::allocator<edm::TypeID> > const&, std::set<edm::TypeID, std::less<edm::TypeID>, std::allocator<edm::TypeID> > const&, std::__cxx11:
:basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libDataFormatsProvenance.so)
==1891808==    by 0x4C161B6: edm::Schedule::finishSetup(edm::ParameterSet&, edm::service::TriggerNamesService const&, edm::ProductRegistry&, edm::BranchIDListHelper&, edm::ProcessBlockHelperBase&, edm::ThinnedAssociationsHelper&, edm::SubProc
essParentageHelper const*, std::shared_ptr<edm::ActivityRegistry>, std::shared_ptr<edm::ProcessConfiguration>, bool, edm::PreallocationConfiguration const&, edm::ProcessContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64
_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libFWCoreFramework.so)
==1891808==    by 0x4C2676D: edm::ScheduleItems::finishSchedule(edm::ScheduleItems::MadeModules, edm::ParameterSet&, edm::service::TriggerNamesService const&, bool, edm::PreallocationConfiguration const&, edm::ProcessContext const*, edm::Proc
essBlockHelperBase&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libFWCoreFramework.so)
==1891808==    by 0x4B68977: edm::EventProcessor::init(std::shared_ptr<edm::ProcessDesc>&, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13
_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libFWCoreFramework.so)
==1891808==    by 0x4B6BAD0: edm::EventProcessor::EventProcessor(std::shared_ptr<edm::ProcessDesc>, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch
/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libFWCoreFramework.so)
==1891808==    by 0x40C0AC: tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023
-03-28-1100/bin/el8_amd64_gcc11/cmsRun)
==1891808==    by 0x63D3846: tbb::detail::r1::task_arena_impl::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) (arena.cpp:694)
==1891808==  Address 0xaa644380 is in a rw- anonymous segment
==1891808==

@makortel
Copy link
Contributor Author

makortel commented Apr 4, 2023

Another occurrence in #41274 (comment), this time in workflow 12434.0

@makortel makortel changed the title Random comparison difference in HLT/SiStrip/ControlView in 11634.911 Random comparison difference in HLT/SiStrip/ControlView Apr 4, 2023
@makortel
Copy link
Contributor Author

makortel commented Apr 4, 2023

assign dqm

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2023

New categories assigned: dqm

@micsucmed,@rvenditti,@emanueleusai,@syuvivida,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor Author

makortel commented Apr 4, 2023

FYI @cms-sw/trk-dpg-l2

@makortel
Copy link
Contributor Author

Another occurrence in #41328 (comment), this time in workflow 12434.0

@makortel
Copy link
Contributor Author

makortel commented May 2, 2023

Another occurrence in #41460 (comment) in workflow 12434.0

@makortel
Copy link
Contributor Author

makortel commented Jun 6, 2023

Another occurrence in #41876 (comment) in workflow 12434.0

@makortel
Copy link
Contributor Author

Another occurrence in cms-sw/cmsdist#8545 (comment) in workflow 12434.0 (although there because of an update of the compiler minor(?) differences in generated code can not be excluded)

@makortel
Copy link
Contributor Author

Another occurrence in #42075 (comment) in workflow 12434.0.

@mmusich
Copy link
Contributor

mmusich commented Jul 31, 2023

type trk

@cmsbuild cmsbuild added the trk label Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants