Igprof analysis of 11_1_0_p3_ROOT618 : trklet::TrackletEngineDisplaced::readTables() #30742

tommasoboccali · 2020-07-16T08:05:31Z

Dear all,
an igprof analysis of L1+RECO here shows a large allocation (~ 1 GB) in

cmssw/L1Trigger/TrackFindingTracklet/src/TrackletEngineDisplaced.cc

Line 393 in 9010c72

void TrackletEngineDisplaced::readTables() {

(direct link HERE)

please have a look at whether this is meaningful.

cheers

cmsbuild · 2020-07-16T08:05:54Z

A new Issue was created by @tommasoboccali Tommaso Boccali.

@Dr15Jones, @silviodonato, @dpiparo, @smuzaffar, @makortel can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

tommasoboccali · 2020-07-16T08:28:20Z

assign l1,upgrade

silviodonato · 2020-07-16T08:55:26Z

assign l1,upgrade

cmsbuild · 2020-07-16T08:55:48Z

New categories assigned: upgrade,l1

@benkrikler,@rekovic,@kpedro88 you have been requested to review this Pull request/Issue and eventually sign? Thanks

davidlange6 · 2020-07-16T11:01:40Z

Can you suggest a workflow to use for testing? On Jul 16, 2020, at 10:05 AM, Tommaso Boccali <[email protected]<mailto:[email protected]>> wrote: Dear all, an igprof analysis of L1+RECO here<https://dpiparo.web.cern.ch/dpiparo/cgi-bin/igprof-navigator/L1_Reco_tt_evt1.mp> shows a large allocation (~ 1 GB) in https://github.com/cms-sw/cmssw/blob/9010c72dccae06e78cb0ec045bc1c829cf1afd54/L1Trigger/TrackFindingTracklet/src/TrackletEngineDisplaced.cc#L393 (direct link HERE<https://dpiparo.web.cern.ch/dpiparo/cgi-bin/igprof-navigator/L1_Reco_tt_evt1.mp/52>) please have a look at whether this is meaningful. cheers — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#30742>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ2D77LOJQ4UNENG2ATR32YF3ANCNFSM4O3YZXXQ>.

tommasoboccali · 2020-07-16T11:08:38Z

this works for me:

CMSSW_11_1_0_p3_ROOT618 + PR 30684

generate pset with

cmsDriver.py step1 --conditions auto:phase2_realistic_T15 -n 100 --era Phase2C9 --eventcontent FEVTDEBUGHLT --runUnscheduled --filein file:/eos/cms/store/relval/CMSSW_11_0_0/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU25ns_110X_mcRun4_realistic_v3_2026D49PU200-v2/10000/01054EE2-1B51-C449-91A2-5202A60D16A3.root -s RAW2DIGI,L1TrackTrigger,L1 --datatier FEVTDEBUGHLT --customise SLHCUpgradeSimulations/Configuration/aging.customise_aging_1000,L1Trigger/Configuration/customisePhase2TTNoMC.customisePhase2TTNoMC,Configuration/DataProcessing/Utils.addMonitoring --geometry Extended2026D49 --fileout file:step1.root --customise_command "process.SimpleMemoryCheck = cms.Service('SimpleMemoryCheck',ignoreTotal = cms.untracked.int32(1))" --no_exec --nThreads 8 --python step1_L1_ProdLike.py

running on 1 event is enough

PS: I was also looking here ...indeed in most of the lines

cmssw/L1Trigger/TrackFindingTracklet/src/TrackletEngineDisplaced.cc

Line 407 in 9010c72

istringstream iss(line);

"line" is an empry string.

a

if (line =="") continue;

might suffice

tommasoboccali · 2020-07-16T11:17:31Z

without the continue

++++++ finished: global begin run for module: label = 'TTTracksFromExtendedTrackletEmulation' id = 23
memory usage: 1739.8 MB allocated, 871.6 MB retained, 1739.8 MB peak

with
if (line =="") continue;

before the table resize:

++++++ finished: global begin run for module: label = 'TTTracksFromExtendedTrackletEmulation' id = 23
memory usage: 673.3 MB allocated, 185.1 MB retained, 673.4 MB peak

I did not check for physics results, though

tommasoboccali · 2020-07-16T11:20:57Z

PS: even if it generates a smaller Memory consumption, so not on the radar here, I suspect thesa can be done for

cmssw/L1Trigger/TrackFindingTracklet/src/TripletEngine.cc

Line 433 in 9010c72

void TripletEngine::readTables() {

davidlange6 · 2020-07-16T11:21:16Z

I would guess the empty lines are important as the index is used in the algorithm - but thats easily avoided On Jul 16, 2020, at 1:17 PM, Tommaso Boccali <[email protected]<mailto:[email protected]>> wrote: without the continue ++++++ finished: global begin run for module: label = 'TTTracksFromExtendedTrackletEmulation' id = 23 memory usage: 1739.8 MB allocated, 871.6 MB retained, 1739.8 MB peak with if (line =="") continue; before the table resize: ++++++ finished: global begin run for module: label = 'TTTracksFromExtendedTrackletEmulation' id = 23 memory usage: 673.3 MB allocated, 185.1 MB retained, 673.4 MB peak I did not check for physics results, though — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#30742 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ6OVGQIPUTWWYDXUULR33OVXANCNFSM4O3YZXXQ>.

davidlange6 · 2020-07-16T13:21:30Z

One idea is to clean something like this up
https://github.com/cms-sw/cmssw/compare/CMSSW_11_1_X...davidlange6:dl_tracklet_200716?expand=1

the printouts show
New: VSIZE 8157.83 0 RSS 5402.94 15.2266
vs
Old: VSIZE 10437.2 0 RSS 6860.28 2.57812

davidlange6 · 2020-07-16T13:22:29Z

of course the strings are really just decoded event ids of sorts

Dr15Jones · 2020-07-16T13:27:34Z

@davidlange6 there is no need to call clear in the destructor. The member data is going to be deleted anyway.

Dr15Jones · 2020-07-16T13:32:03Z

@davidlange6 the result returned by lower_bound is not guaranteed to be an iterator to the value. If the value is not present the returned iterator is not the same as end, it is the iterator just before where the value should be. Therefore you still need to test that the value pointed to by the iterator matches index.

davidlange6 · 2020-07-16T13:32:08Z

Yes - I was just blindly copying what was done before:) On Jul 16, 2020, at 3:27 PM, Chris Jones <[email protected]<mailto:[email protected]>> wrote: @davidlange6<https://github.com/davidlange6> there is no need to call clear in the destructor. The member data is going to be deleted anyway. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#30742 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ62SLZMFARH4RBLVD3R3355RANCNFSM4O3YZXXQ>.

davidlange6 · 2020-07-16T13:33:58Z

@davidlange6 the result returned by lower_bound is not guaranteed to be an iterator to the value. If the value is not present the returned iterator is not the same as end, it is the iterator just before where the value should be. Therefore you still need to test that the value pointed to by the iterator matches index.

ah - then thats a bug! let me fix

davidlange6 · 2020-07-16T13:47:53Z

I updated the branch with Chris's comments.. (it seems there should be a better way)

davidlange6 · 2020-07-16T14:06:53Z

new VSIZE 7800.44 56 RSS 4879.73 27.8438
vs
old VSIZE 10437.2 0 RSS 6860.28 2.57812

(modulo new bugs introduced)

kpedro88 · 2020-07-16T14:46:13Z

@skinnari FYI

davidlange6 · 2020-07-16T15:51:46Z

I added some printouts that suggest the changes do not change the output.

Single threaded RSS savings seems to be 1GB - if anything the prototyped version is faster (5.0 seconds instead of 5.1 seconds for 20 events)

tommasoboccali · 2020-07-16T15:56:27Z

in any case that module is a "one" and not a "stream" - so the gain single / multi threads is the same. It used to be a stream but we changed it last week (it was asking for 1x8 GB !!!!)

…

On Thu, Jul 16, 2020 at 5:52 PM David Lange ***@***.***> wrote: I added some printouts that suggest the changes do not change the output. Single threaded RSS savings seems to be 1GB - if anything the prototyped version is faster (5.0 seconds instead of 5.1 seconds for 20 events) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#30742 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7HMYI5DIHW3U5ZZ2FMD33R34O2FANCNFSM4O3YZXXQ> .

-- Tommaso Boccali INFN Pisa

davidlange6 · 2020-07-16T16:06:33Z

As Tommaso and I just discussed - that should destroy the CPU efficiency in a L1 only job. On Jul 16, 2020, at 5:56 PM, Tommaso Boccali <[email protected]<mailto:[email protected]>> wrote: in any case that module is a "one" and not a "stream" - so the gain single / multi threads is the same. It used to be a stream but we changed it last week (it was asking for 1x8 GB !!!!)

…

On Thu, Jul 16, 2020 at 5:52 PM David Lange ***@***.******@***.***>> wrote: I added some printouts that suggest the changes do not change the output. Single threaded RSS savings seems to be 1GB - if anything the prototyped version is faster (5.0 seconds instead of 5.1 seconds for 20 events) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#30742 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7HMYI5DIHW3U5ZZ2FMD33R34O2FANCNFSM4O3YZXXQ> .

-- Tommaso Boccali INFN Pisa — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#30742 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQZSZCIYNQ536RLM7N3R34PL3ANCNFSM4O3YZXXQ>.

Dr15Jones · 2020-07-16T16:08:34Z

If the table_ is now only setup in the constructor, then it could be effectively 'const'. If that is the case, then the code could be switched back to a ::stream module using a GlobalCache to hold the table which is then shared across all the stream copies.

tommasoboccali · 2020-07-17T06:34:47Z

I did a fast test about the

"
As Tommaso and I just discussed - that should destroy the CPU efficiency in a L1 only job.
"

running only L1 with 8 threads (100 events) -- with the module in "one" configuration.

It is certainly true we do not get 100% cu eff, but it not really "destroyed":
I get during event loop

WALL: Total loop: 1541.18
CPU: Total loop: 10055.5

so eff = 10055/(8*1541) = 82%

tommasoboccali · 2020-07-17T07:29:47Z

I did a test @ 4 threads:

WALL: Total loop: 2625
CPU: Total loop: 9694

eff = 92%.

RES max is 7400 ... so it would be still a possibility if we want to increase cpu eff

davidlange6 · 2020-07-17T07:38:16Z

A matter of definition:) 82% in a controlled environment is not good for production. (Anyway its not a current production setup, so no issue..) On Jul 17, 2020, at 8:35 AM, Tommaso Boccali <[email protected]<mailto:[email protected]>> wrote: I did a fast test about the " As Tommaso and I just discussed - that should destroy the CPU efficiency in a L1 only job. " running only L1 with 8 threads (100 events) -- with the module in "one" configuration. It is certainly true we do not get 100% cu eff, but it not really "destroyed": I get during event loop * WALL: Total loop: 1541.18 * CPU: Total loop: 10055.5 so eff = 10055/(8*1541) = 82% — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#30742 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ72INONYGTSAEI6BZ3R37WJNANCNFSM4O3YZXXQ>.

davidlange6 · 2020-07-17T07:51:41Z

It seems the table in question is several layers of classes down from the producer itself. Looks like significant code surgery to me (and probably doesn’t work for the module in general?) Refactoring needed.. On Jul 17, 2020, at 9:30 AM, Tommaso Boccali <[email protected]<mailto:[email protected]>> wrote: I did a test @ 4 threads: WALL: Total loop: 2625 CPU: Total loop: 9694 eff = 92%. RES max is 7400 ... so it would be still a possibility if we want to increase cpu eff — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#30742 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ2EUUBZFQKUE2TZV4DR374XXANCNFSM4O3YZXXQ>.

aehart · 2020-07-18T15:21:23Z

@skinnari, @tomalin, and I discussed this issue and #30744, and decided that the best solution for now was to disable the tables in the TrackletEngineDisplaced and TripletEngine. I have now implemented this in PR #30818, and it seems to have the desired effect on memory usage.

Please let me know if there's anything else I can do to help resolve these issues.

srimanob · 2021-04-11T11:40:12Z

@skinnari @tomalin @aehart @davidlange6
Can this ticket be closed? I understand that currently the tables are currently disabled by default. Or you would like to keep this open together with #30744

tomalin · 2021-04-11T16:53:22Z

@srimanob Yes, it can be closed. The tables are indeed disabled by default.

srimanob · 2021-04-12T08:08:32Z

+Upgrade

cecilecaillol · 2022-04-11T15:10:09Z

+l1

cmsbuild · 2022-04-11T15:10:33Z

This issue is fully signed and ready to be closed.

cmsbuild added the pending-assignment label Jul 16, 2020

cmsbuild added l1-pending pending-signatures upgrade-pending and removed pending-assignment labels Jul 16, 2020

makortel mentioned this issue Jul 16, 2020

Igprof analysis of 11_1_0_p3_ROOT618 : CSCMotherboard::CSCMotherboard #30745

Open

aehart mentioned this issue Jul 18, 2020

Disabled the TED and TRE tables by default. #30818

Closed

cmsbuild added upgrade-approved and removed upgrade-pending labels Apr 12, 2021

cmsbuild added fully-signed l1-approved and removed l1-pending pending-signatures labels Apr 11, 2022

qliphy closed this as completed Apr 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Igprof analysis of 11_1_0_p3_ROOT618 : trklet::TrackletEngineDisplaced::readTables() #30742

Igprof analysis of 11_1_0_p3_ROOT618 : trklet::TrackletEngineDisplaced::readTables() #30742

tommasoboccali commented Jul 16, 2020

cmsbuild commented Jul 16, 2020

tommasoboccali commented Jul 16, 2020

silviodonato commented Jul 16, 2020

cmsbuild commented Jul 16, 2020

davidlange6 commented Jul 16, 2020 via email

tommasoboccali commented Jul 16, 2020

tommasoboccali commented Jul 16, 2020

tommasoboccali commented Jul 16, 2020

davidlange6 commented Jul 16, 2020 via email

davidlange6 commented Jul 16, 2020

davidlange6 commented Jul 16, 2020

Dr15Jones commented Jul 16, 2020

Dr15Jones commented Jul 16, 2020

davidlange6 commented Jul 16, 2020 via email

davidlange6 commented Jul 16, 2020

davidlange6 commented Jul 16, 2020

davidlange6 commented Jul 16, 2020

kpedro88 commented Jul 16, 2020

davidlange6 commented Jul 16, 2020

tommasoboccali commented Jul 16, 2020 via email

davidlange6 commented Jul 16, 2020 via email

Dr15Jones commented Jul 16, 2020

tommasoboccali commented Jul 17, 2020

tommasoboccali commented Jul 17, 2020

davidlange6 commented Jul 17, 2020 via email

davidlange6 commented Jul 17, 2020 via email

aehart commented Jul 18, 2020

srimanob commented Apr 11, 2021

tomalin commented Apr 11, 2021

srimanob commented Apr 12, 2021

cecilecaillol commented Apr 11, 2022

cmsbuild commented Apr 11, 2022

Igprof analysis of 11_1_0_p3_ROOT618 : trklet::TrackletEngineDisplaced::readTables() #30742

Igprof analysis of 11_1_0_p3_ROOT618 : trklet::TrackletEngineDisplaced::readTables() #30742

Comments

tommasoboccali commented Jul 16, 2020

cmsbuild commented Jul 16, 2020

tommasoboccali commented Jul 16, 2020

silviodonato commented Jul 16, 2020

cmsbuild commented Jul 16, 2020

davidlange6 commented Jul 16, 2020 via email

tommasoboccali commented Jul 16, 2020

tommasoboccali commented Jul 16, 2020

tommasoboccali commented Jul 16, 2020

davidlange6 commented Jul 16, 2020 via email

davidlange6 commented Jul 16, 2020

davidlange6 commented Jul 16, 2020

Dr15Jones commented Jul 16, 2020

Dr15Jones commented Jul 16, 2020

davidlange6 commented Jul 16, 2020 via email

davidlange6 commented Jul 16, 2020

davidlange6 commented Jul 16, 2020

davidlange6 commented Jul 16, 2020

kpedro88 commented Jul 16, 2020

davidlange6 commented Jul 16, 2020

tommasoboccali commented Jul 16, 2020 via email

davidlange6 commented Jul 16, 2020 via email

Dr15Jones commented Jul 16, 2020

tommasoboccali commented Jul 17, 2020

tommasoboccali commented Jul 17, 2020

davidlange6 commented Jul 17, 2020 via email

davidlange6 commented Jul 17, 2020 via email

aehart commented Jul 18, 2020

srimanob commented Apr 11, 2021

tomalin commented Apr 11, 2021

srimanob commented Apr 12, 2021

cecilecaillol commented Apr 11, 2022

cmsbuild commented Apr 11, 2022