-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Igprof analysis of 11_1_0_p3_ROOT618 : trklet::TrackletEngineDisplaced::readTables() #30742
Comments
A new Issue was created by @tommasoboccali Tommaso Boccali. @Dr15Jones, @silviodonato, @dpiparo, @smuzaffar, @makortel can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign l1,upgrade |
1 similar comment
assign l1,upgrade |
New categories assigned: upgrade,l1 @benkrikler,@rekovic,@kpedro88 you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Can you suggest a workflow to use for testing?
On Jul 16, 2020, at 10:05 AM, Tommaso Boccali <[email protected]<mailto:[email protected]>> wrote:
Dear all,
an igprof analysis of L1+RECO here<https://dpiparo.web.cern.ch/dpiparo/cgi-bin/igprof-navigator/L1_Reco_tt_evt1.mp> shows a large allocation (~ 1 GB) in
https://github.com/cms-sw/cmssw/blob/9010c72dccae06e78cb0ec045bc1c829cf1afd54/L1Trigger/TrackFindingTracklet/src/TrackletEngineDisplaced.cc#L393
(direct link HERE<https://dpiparo.web.cern.ch/dpiparo/cgi-bin/igprof-navigator/L1_Reco_tt_evt1.mp/52>)
please have a look at whether this is meaningful.
cheers
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#30742>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ2D77LOJQ4UNENG2ATR32YF3ANCNFSM4O3YZXXQ>.
|
this works for me: CMSSW_11_1_0_p3_ROOT618 + PR 30684 generate pset with cmsDriver.py step1 --conditions auto:phase2_realistic_T15 -n 100 --era Phase2C9 --eventcontent FEVTDEBUGHLT --runUnscheduled --filein file:/eos/cms/store/relval/CMSSW_11_0_0/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU25ns_110X_mcRun4_realistic_v3_2026D49PU200-v2/10000/01054EE2-1B51-C449-91A2-5202A60D16A3.root -s RAW2DIGI,L1TrackTrigger,L1 --datatier FEVTDEBUGHLT --customise SLHCUpgradeSimulations/Configuration/aging.customise_aging_1000,L1Trigger/Configuration/customisePhase2TTNoMC.customisePhase2TTNoMC,Configuration/DataProcessing/Utils.addMonitoring --geometry Extended2026D49 --fileout file:step1.root --customise_command "process.SimpleMemoryCheck = cms.Service('SimpleMemoryCheck',ignoreTotal = cms.untracked.int32(1))" --no_exec --nThreads 8 --python step1_L1_ProdLike.py running on 1 event is enough PS: I was also looking here ...indeed in most of the lines
"line" is an empry string. a if (line =="") continue; might suffice |
without the continue ++++++ finished: global begin run for module: label = 'TTTracksFromExtendedTrackletEmulation' id = 23 with before the table resize: ++++++ finished: global begin run for module: label = 'TTTracksFromExtendedTrackletEmulation' id = 23 I did not check for physics results, though |
PS: even if it generates a smaller Memory consumption, so not on the radar here, I suspect thesa can be done for
|
I would guess the empty lines are important as the index is used in the algorithm - but thats easily avoided
On Jul 16, 2020, at 1:17 PM, Tommaso Boccali <[email protected]<mailto:[email protected]>> wrote:
without the continue
++++++ finished: global begin run for module: label = 'TTTracksFromExtendedTrackletEmulation' id = 23
memory usage: 1739.8 MB allocated, 871.6 MB retained, 1739.8 MB peak
with
if (line =="") continue;
before the table resize:
++++++ finished: global begin run for module: label = 'TTTracksFromExtendedTrackletEmulation' id = 23
memory usage: 673.3 MB allocated, 185.1 MB retained, 673.4 MB peak
I did not check for physics results, though
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#30742 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ6OVGQIPUTWWYDXUULR33OVXANCNFSM4O3YZXXQ>.
|
One idea is to clean something like this up the printouts show |
of course the strings are really just decoded event ids of sorts |
@davidlange6 there is no need to call clear in the destructor. The member data is going to be deleted anyway. |
@davidlange6 the result returned by |
Yes - I was just blindly copying what was done before:)
On Jul 16, 2020, at 3:27 PM, Chris Jones <[email protected]<mailto:[email protected]>> wrote:
@davidlange6<https://github.com/davidlange6> there is no need to call clear in the destructor. The member data is going to be deleted anyway.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#30742 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ62SLZMFARH4RBLVD3R3355RANCNFSM4O3YZXXQ>.
|
ah - then thats a bug! let me fix |
I updated the branch with Chris's comments.. (it seems there should be a better way) |
new VSIZE 7800.44 56 RSS 4879.73 27.8438 (modulo new bugs introduced) |
@skinnari FYI |
I added some printouts that suggest the changes do not change the output. Single threaded RSS savings seems to be 1GB - if anything the prototyped version is faster (5.0 seconds instead of 5.1 seconds for 20 events) |
in any case that module is a "one" and not a "stream" - so the gain single
/ multi threads is the same.
It used to be a stream but we changed it last week (it was asking for 1x8
GB !!!!)
…On Thu, Jul 16, 2020 at 5:52 PM David Lange ***@***.***> wrote:
I added some printouts that suggest the changes do not change the output.
Single threaded RSS savings seems to be 1GB - if anything the prototyped
version is faster (5.0 seconds instead of 5.1 seconds for 20 events)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#30742 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7HMYI5DIHW3U5ZZ2FMD33R34O2FANCNFSM4O3YZXXQ>
.
--
Tommaso Boccali
INFN Pisa
|
As Tommaso and I just discussed - that should destroy the CPU efficiency in a L1 only job.
On Jul 16, 2020, at 5:56 PM, Tommaso Boccali <[email protected]<mailto:[email protected]>> wrote:
in any case that module is a "one" and not a "stream" - so the gain single
/ multi threads is the same.
It used to be a stream but we changed it last week (it was asking for 1x8
GB !!!!)
…On Thu, Jul 16, 2020 at 5:52 PM David Lange ***@***.******@***.***>> wrote:
I added some printouts that suggest the changes do not change the output.
Single threaded RSS savings seems to be 1GB - if anything the prototyped
version is faster (5.0 seconds instead of 5.1 seconds for 20 events)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#30742 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7HMYI5DIHW3U5ZZ2FMD33R34O2FANCNFSM4O3YZXXQ>
.
--
Tommaso Boccali
INFN Pisa
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#30742 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQZSZCIYNQ536RLM7N3R34PL3ANCNFSM4O3YZXXQ>.
|
If the table_ is now only setup in the constructor, then it could be effectively 'const'. If that is the case, then the code could be switched back to a ::stream module using a |
I did a fast test about the " running only L1 with 8 threads (100 events) -- with the module in "one" configuration. It is certainly true we do not get 100% cu eff, but it not really "destroyed":
so eff = 10055/(8*1541) = 82% |
I did a test @ 4 threads: WALL: Total loop: 2625 eff = 92%. RES max is 7400 ... so it would be still a possibility if we want to increase cpu eff |
A matter of definition:) 82% in a controlled environment is not good for production. (Anyway its not a current production setup, so no issue..)
On Jul 17, 2020, at 8:35 AM, Tommaso Boccali <[email protected]<mailto:[email protected]>> wrote:
I did a fast test about the
"
As Tommaso and I just discussed - that should destroy the CPU efficiency in a L1 only job.
"
running only L1 with 8 threads (100 events) -- with the module in "one" configuration.
It is certainly true we do not get 100% cu eff, but it not really "destroyed":
I get during event loop
* WALL: Total loop: 1541.18
* CPU: Total loop: 10055.5
so eff = 10055/(8*1541) = 82%
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#30742 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ72INONYGTSAEI6BZ3R37WJNANCNFSM4O3YZXXQ>.
|
It seems the table in question is several layers of classes down from the producer itself. Looks like significant code surgery to me (and probably doesn’t work for the module in general?) Refactoring needed..
On Jul 17, 2020, at 9:30 AM, Tommaso Boccali <[email protected]<mailto:[email protected]>> wrote:
I did a test @ 4 threads:
WALL: Total loop: 2625
CPU: Total loop: 9694
eff = 92%.
RES max is 7400 ... so it would be still a possibility if we want to increase cpu eff
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#30742 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ2EUUBZFQKUE2TZV4DR374XXANCNFSM4O3YZXXQ>.
|
@skinnari, @tomalin, and I discussed this issue and #30744, and decided that the best solution for now was to disable the tables in the TrackletEngineDisplaced and TripletEngine. I have now implemented this in PR #30818, and it seems to have the desired effect on memory usage. Please let me know if there's anything else I can do to help resolve these issues. |
@skinnari @tomalin @aehart @davidlange6 |
@srimanob Yes, it can be closed. The tables are indeed disabled by default. |
+Upgrade |
+l1 |
This issue is fully signed and ready to be closed. |
Dear all,
an igprof analysis of L1+RECO here shows a large allocation (~ 1 GB) in
cmssw/L1Trigger/TrackFindingTracklet/src/TrackletEngineDisplaced.cc
Line 393 in 9010c72
(direct link HERE)
please have a look at whether this is meaningful.
cheers
The text was updated successfully, but these errors were encountered: