Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTD reconstruction: speed up TrackExtenderWithMTD #36793

Merged
merged 5 commits into from
Feb 21, 2022

Conversation

fabiocos
Copy link
Contributor

@fabiocos fabiocos commented Jan 25, 2022

PR description:

This PR proposes an update of TrackExtenderWithMTD following the presentation at the TRK POG meeting https://indico.cern.ch/event/1108413/contributions/4662751/attachments/2373693/4054455/TRK_MTD_20220117.pdf and the following discussion.

About 1/3 of the time used by this module is taken by retrieving the Trajectory for a given general track using the TrackTransformer, see https://cms-reco-profiling.web.cern.ch/cms-reco-profiling/cgi-bin/igprof-navigator/releases/CMSSW_12_2_0_pre3/34834.21/step3/cpu_endjob/92 . This can be avoided by reusing the Trajectory optionally stored in a transient form into the event by the iterative tracking, an option by default disabled.
Of course this comes at some cost in memory usage, which should be balanced with the CPU gain benefit.

The activation of the flags to save the trajectories in the event if made only for the selection of modules entering into the definition of the generalTracks output, which at present constitutes the input of the TrackExtenderWithMTD

PR validation:

Tests have been done using the benchmark workflow 34834.21 (D76 scenario TTbar with PU, prod-like), using a home-made MinBias pile-up sample of 10k events. A simple memory and CPU profiling (single stream mode) on 100 events shows a gain in CPU:

TimeReport> Time report complete in 5809.45 seconds
 Time Summary: 
 - Min event:   34.1598
 - Max event:   83.5143
 - Avg event:   55.9624
 - Total loop:  5753.35
 - Total init:  55.8676
 - Total job:   5809.45
 - EventSetup Lock: 0
 - EventSetup Get:  0
 Event Throughput: 0.0173812 ev/s
 CPU Summary: 
 - Total loop:     5633.37
 - Total init:     30.6902
 - Total extra:    0
 - Total children: 0.184089
 - Total job:      5664.15
 Processing Summary: 
 - Number of Events:  100
 - Number of Global Begin Lumi Calls:  1
 - Number of Global Begin Run Calls: 1

vs previous

TimeReport> Time report complete in 6020.16 seconds
 Time Summary: 
 - Min event:   32.6102
 - Max event:   85.8682
 - Avg event:   57.9717
 - Total loop:  5935.13
 - Total init:  84.8239
 - Total job:   6020.16
 - EventSetup Lock: 0
 - EventSetup Get:  0
 Event Throughput: 0.0168488 ev/s
 CPU Summary: 
 - Total loop:     5861.23
 - Total init:     34.564
 - Total extra:    0
 - Total children: 0.168424
 - Total job:      5895.85
 Processing Summary: 
 - Number of Events:  100
 - Number of Global Begin Lumi Calls:  1
 - Number of Global Begin Run Calls: 1

with similar CPU load in both tests, and an extra memory load quantified in about 5%:

MemoryReport> Peak virtual size 6426.03 Mbytes
 Key events increasing vsize: 
[0] run: 0 lumi: 0 event: 0  vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[1] run: 1 lumi: 1 event: 1  vsize = 5458.03 deltaVsize = 0 rss = 3278.55 delta = 0
[2] run: 1 lumi: 1 event: 2  vsize = 5970.03 deltaVsize = 512 rss = 3375.28 delta = 96.7305
[7] run: 1 lumi: 1 event: 7  vsize = 6426.03 deltaVsize = 456 rss = 3375.51 delta = 0.234375
[0] run: 0 lumi: 0 event: 0  vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[9] run: 1 lumi: 1 event: 9  vsize = 6426.03 deltaVsize = 0 rss = 3393.05 delta = 17.5352
[8] run: 1 lumi: 1 event: 8  vsize = 6426.03 deltaVsize = 0 rss = 3178.45 delta = -197.059
[7] run: 1 lumi: 1 event: 7  vsize = 6426.03 deltaVsize = 456 rss = 3375.51 delta = 0.234375

vs previous

MemoryReport> Peak virtual size 6418.28 Mbytes
 Key events increasing vsize: 
[1] run: 1 lumi: 1 event: 1  vsize = 5458.28 deltaVsize = 0 rss = 3093.7 delta = 0
[7] run: 1 lumi: 1 event: 7  vsize = 5906.28 deltaVsize = 448 rss = 3178.19 delta = 84.4922
[19] run: 1 lumi: 1 event: 19  vsize = 6418.28 deltaVsize = 512 rss = 3202.4 delta = 24.207
[0] run: 0 lumi: 0 event: 0  vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[0] run: 0 lumi: 0 event: 0  vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[21] run: 1 lumi: 1 event: 21  vsize = 6418.28 deltaVsize = 0 rss = 2888.15 delta = -314.25
[20] run: 1 lumi: 1 event: 20  vsize = 6418.28 deltaVsize = 0 rss = 3077.96 delta = -124.441
[19] run: 1 lumi: 1 event: 19  vsize = 6418.28 deltaVsize = 512 rss = 3202.4 delta = 24.207

An igprof PERF_TICKS profiling on 10 events shows the expected variation in the internal module CPU use, with the modue weighting now for 2.2% of the total CPU (vs more than 3% previously):

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
            2.2  .........     119.17 / 4'386.72     edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) [16]
[118]       2.2     119.17       1.68 / 117.49     TrackExtenderWithMTDT<std::vector<reco::Track, std::allocator<reco::Track> > >::produce(edm::Event&, edm::EventSetup const&)
            0.8  .........      44.98 / 59.01        TrackTransformer::transform(reco::TransientTrack const&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > >&) const [187]
            0.5  .........      29.13 / 29.13        TrackExtenderWithMTDT<std::vector<reco::Track, std::allocator<reco::Track> > >::tryETLLayers(TrajectoryStateOnSurface const&, Trajectory const&, float, float, (anonymous namespace)::TrackSegments const&, edmNew::DetSetVector<MTDTrackingRecHit> const&, MTDDetLayerGeometry const*, MagneticField const*, Propagator const*, reco::BeamSpot const&, float, bool, (anonymous namespace)::MTDHitMatchingInfo&) const [clone .constprop.0] [344]
            0.3  .........      15.16 / 15.16        TrackExtenderWithMTDT<std::vector<reco::Track, std::allocator<reco::Track> > >::buildTrack(reco::Track const&, Trajectory const&, Trajectory const&, reco::BeamSpot const&, MagneticField const*, Propagator const*, bool, float&, float&, float&, float&, float&, float&) const [926]
            0.2  .........       9.94 / 15.63        (anonymous namespace)::trackPathLength(Trajectory const&, TrajectoryStateClosestToBeamLine const&, Propagator const*, float&, (anonymous namespace)::TrackSegments&) [833]
            0.1  .........       7.55 / 16.30        (anonymous namespace)::getTrajectoryStateClosestToBeamLine(Trajectory const&, reco::BeamSpot const&, Propagator const*, TrajectoryStateClosestToBeamLine&) [676]
            0.0  .........       2.27 / 31.25        TrackExtenderWithMTDT<std::vector<reco::Track, std::allocator<reco::Track> > >::fillMatchingHits(DetLayer const*, TrajectoryStateOnSurface const&, Trajectory const&, float, float, (anonymous namespace)::TrackSegments const&, edmNew::DetSetVector<MTDTrackingRecHit> const&, Propagator const*, reco::BeamSpot const&, float const&, bool, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > >&, (anonymous namespace)::MTDHitMatchingInfo&) const [325]
            0.0  .........       1.91 / 2.11         TrackTransformer::getTransientRecHits(reco::TransientTrack const&) const [2341]
            0.0  .........       0.90 / 58.40        std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() [190]
            0.0  .........       0.88 / 0.91         Traj2TrackHits::operator()(Trajectory const&, edm::OwnVector<TrackingRecHit, edm::ClonePolicy<TrackingRecHit> >&, std::vector<LocalTrajectoryParameters, std::allocator<LocalTrajectoryParameters> >&, std::vector<unsigned char, std::allocator<unsigned char> >&) const [3214]
            0.0  .........       0.54 / 0.54         TrackExtenderWithMTDT<std::vector<reco::Track, std::allocator<reco::Track> > >::buildTrackExtra(Trajectory const&) const [3891]
            0.0  .........       0.52 / 1.04         edm::AssociationMap<edm::OneToOne<std::vector<Trajectory, std::allocator<Trajectory> >, std::vector<reco::Track, std::allocator<reco::Track> >, unsigned short> >::get(unsigned long) const [3050]
            0.0  .........       0.49 / 0.49         void std::vector<reco::TrackExtra, std::allocator<reco::TrackExtra> >::_M_realloc_insert<reco::TrackExtra>(__gnu_cxx::__normal_iterator<reco::TrackExtra*, std::vector<reco::TrackExtra, std::allocator<reco::TrackExtra> > >, reco::TrackExtra&&) [4043]
            0.0  .........       0.40 / 0.95         reco::HitPattern::appendHit(TrackingRecHit const&, TrackerTopology const&) [3147]
            0.0  .........       0.33 / 5.01         reco::TransientTrack::TransientTrack(edm::Ref<std::vector<reco::Track, std::allocator<reco::Track> >, reco::Track, edm::refhelper::FindUsingAdvance<std::vector<reco::Track, std::allocator<reco::Track> >, reco::Track> > const&, MagneticField const*, edm::ESHandle<GlobalTrackingGeometry> const&) [1721]
            0.0  .........       0.33 / 0.34         non-virtual thunk to reco::TrackTransientTrack::outermostMeasurementState() const [4570]
            0.0  .........       0.30 / 0.30         std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > >::~vector() [4781]
            0.0  .........       0.28 / 114.60       operator delete(void*, unsigned long) [121]
            0.0  .........       0.27 / 0.27         edm::refitem::GetRefPtrImpl<std::vector<Trajectory, std::allocator<Trajectory> >, Trajectory, edm::refhelper::FindUsingAdvance<std::vector<Trajectory, std::allocator<Trajectory> >, Trajectory>, unsigned int>::getRefPtr_(edm::RefCore const&, unsigned int) [4916]
            0.0  .........       0.26 / 2.39         TransientTrackBuilder::build(edm::Ref<std::vector<reco::Track, std::allocator<reco::Track> >, reco::Track, edm::refhelper::FindUsingAdvance<std::vector<reco::Track, std::allocator<reco::Track> >, reco::Track> > const&) const [2248]
(...)

The DQM histogram comparison shows a very moderate difference in the output of MTD tracks and vertices tests, and 4D vertices. Since the code should just replace the Trajectory from the original track fit with its reconstruction from the TrackTransformer I would conclude that some possible small difference might be expected there. In any case the overall behaviour of the code looks very well consistent between the two versions, even though not strictly identical.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36793/27938

  • This PR adds an extra 48KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fabiocos (Fabio Cossutti) for master.

It involves the following packages:

  • RecoMTD/TrackExtender (upgrade, reconstruction)
  • RecoTracker/FinalTrackSelectors (reconstruction)
  • RecoTracker/IterativeTracking (reconstruction)

@clacaputo, @cmsbuild, @AdrianoDee, @srimanob, @slava77, @jpata can you please review it and eventually sign? Thanks.
@felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @ebrondol, @gpetruc, @mmusich, @mtosi, @dgulhan this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@fabiocos
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-996284/21973/summary.html
COMMIT: 42b9d10
CMSSW: CMSSW_12_3_X_2022-01-24-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/36793/21973/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 104 differences found in the comparisons
  • DQMHistoTests: Total files compared: 43
  • DQMHistoTests: Total histograms compared: 3449324
  • DQMHistoTests: Total failures: 59
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3449243
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 42 files compared)
  • Checked 181 log files, 42 edm output root files, 43 DQM output files
  • TriggerResults: no differences found

@slava77
Copy link
Contributor

slava77 commented Jan 25, 2022

related to the memory use:
It would be better to also check the memory use with MEM_LIVE from igprof.

IIRC, the trajectory data is fully copied down to the generalTracks. This gives an opportunity to use the framework early delete feature (hooks are in place in Configuration/StandardSequences/python/earlyDeleteSettings_cff.py).

@slava77
Copy link
Contributor

slava77 commented Jan 25, 2022

enable profiling

@slava77
Copy link
Contributor

slava77 commented Jan 25, 2022

@cmsbuild please test

@fabiocos
Copy link
Contributor Author

@slava77 I had given a try to MEM_LIVE checks in igprof, but I was getting a ridiculously large output of kind of warning messages...

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-996284/21987/summary.html
COMMIT: 42b9d10
CMSSW: CMSSW_12_3_X_2022-01-25-1100/slc7_amd64_gcc10
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/36793/21987/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 111 differences found in the comparisons
  • DQMHistoTests: Total files compared: 43
  • DQMHistoTests: Total histograms compared: 3449324
  • DQMHistoTests: Total failures: 71
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3449231
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 42 files compared)
  • Checked 181 log files, 42 edm output root files, 43 DQM output files
  • TriggerResults: no differences found

@fabiocos
Copy link
Contributor Author

not a big surprise, the bulk of the difference is in the KF code as far as I can see, as expected
https://cmssdt.cern.ch/SDT/cgi-bin/igprof-navigator/CMSSW_12_3_X_2022-01-25-1100/slc7_amd64_gcc10/profiling/23434.21/PR-996284/21987/igprofMEM_step3.19/103

It is a matter of balance, we may squeeze the overall time but at the price of having some more stuff in memory. I selected only what strictly needed by the TrackExtenderWithMTD , there is a customisation fragment to enable the trajectory in event for every producer

https://github.com/cms-sw/cmssw/blob/master/RecoTracker/FinalTrackSelectors/python/customiseStoreTrajectoryInEvent.py

but I expect it would make the problem worse

@perrotta
Copy link
Contributor

please test
(let refresh the nine day old tests before merging)

@perrotta
Copy link
Contributor

please abort

@perrotta
Copy link
Contributor

enable none
(try to speed up by disabling extra tests: profiling was already inspected)

@perrotta
Copy link
Contributor

please test

@perrotta
Copy link
Contributor

@smuzaffar , it seems that the bot is waiting to start the profiling: bust shouldn't I have prevented it with "enable none" in #36793 (comment) ?
In any case, tests seem stuck since a while. When I launched them lastly,my intention was simply to check that there were no counterindications in merging this PR coming from any of the other PRs merged in the last 9 days (when this one was tested last): by looking at the bot log I would say that everything works, besides that supposed attempt to start the profiling. Therefore; I could even merge this PR anyhow, but let wait for a few hours, just in case someone wants to check what's happening with the bot

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-996284/22528/summary.html
COMMIT: dda4ecb
CMSSW: CMSSW_12_3_X_2022-02-20-1400/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/36793/22528/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 124 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3965143
  • DQMHistoTests: Total failures: 65
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3965056
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 204 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

@perrotta , there is a known issue with bot that if a test was previously run for a commit then bot assumes that it will re-run again and waits for its results. I will fix this once find some time. For now I have manually set mark the profiling status successful

@perrotta
Copy link
Contributor

Thank you @smuzaffar!

@perrotta
Copy link
Contributor

+1

@makortel
Copy link
Contributor

This PR might have caused an assertion failure in workflow 23234.9, see #37018.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants