-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disabled the TED and TRE tables by default. #30818
Conversation
A new Pull Request was created by @aehart (Andrew Hart) for CMSSW_11_1_X. It involves the following packages: L1Trigger/TrackFindingTracklet @cmsbuild, @rekovic, @benkrikler can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@aehart this code will be ran on the grid. For this reason, before running any production, we need to know
|
I would propose that this PR removes the python config parameter that can enable the massive data structure via a "simple python change pull request" (TM). |
@dpiparo I tested a single-threaded (because L1FPGATrackProducer is an
If you need any additional information or if you want the tests run again with different parameters, please let me know. @davidlange6 I'm not sure I understand what you mean. These TED and TRE tables, which were responsible for the huge memory usage before, can now only be enabled by flipping a switch in Settings.h and recompiling. |
On Jul 20, 2020, at 4:51 PM, Andrew Hart <[email protected]<mailto:[email protected]>> wrote:
@dpiparo<https://github.com/dpiparo> I tested a single-threaded (because L1FPGATrackProducer is an edm::one::EDProducer) job running only the L1TrackTrigger step over 10 ttbar events with 200 PU:
1. I'm not sure what the best way to measure CPU and real time is, but I used the FastTimerService to write out a summary JSON. With that I see the following efficiencies:
* total: 213794.773163 / 217487.481959 = 98.3%
* TTTracksFromTrackletEmulation: 17710.210555 / 17716.911111 = 100%
* TTTracksFromExtendedTrackletEmulation: 21206.114799 / 21257.489168 = 99.8%
So the extended algorithm is marginally less efficient than the baseline algorithm, which was not touched, but the overall efficiency of this step looks pretty good to me. The efficiency reported by time for the same job is 90%, I guess because it includes additional CMSSW overhead that the FastTimerService is able to account for. But if there's a better way to measure these times, please let me know.
2.
But do your FastTimerService results include the framework stalling while your module to runs? Since its the slowest thing in L1 and a one module, those “CMSSW overheads” are likely partially from L1FPGATrackProducer)
1. For the RSS footprint, I used igprof and the IgProfService to output the memory profile after 5/10 events, and the total usage at that point is 1.4 GB. This agrees with what I observe with top during running.
If you need any additional information or if you want the tests run again with different parameters, please let me know.
@davidlange6<https://github.com/davidlange6> I'm not sure I understand what you mean. These TED and TRE tables, which were responsible for the huge memory usage before, can now only be enabled by flipping a switch in Settings.h and recompiling.
Great- I misunderstood before
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#30818 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ2WMYSHBXR4I6LRSHDR4RKYLANCNFSM4PAA36GQ>.
|
please test |
The tests are being triggered in jenkins.
|
I'm honestly not sure. I admit to using it without fully understanding how it works (shame 🔔). But looking at the results that it spits out (which you can find at /afs/cern.ch/user/a/ahart/public/1thread/resources.json on LXPLUS), I see CPU and real times for each module that runs, including PoolSource and PoolOutputModule, as well as an "other" category, which includes some kind of overhead. And then the times for the "total" are just the sums for all the modules plus the "other" category. Is there a better way to measure the timing? |
+1 |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
@aehart please make a PR also for master (ie. CMSSW_11_2_X) |
@silviodonato Done: #30847 |
Pull request #30818 was updated. @cmsbuild, @rekovic, @benkrikler can you please check and sign again. |
please test |
The tests are being triggered in jenkins.
|
+1 |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
Pull request #30818 was updated. @cmsbuild, @rekovic, @benkrikler can you please check and sign again. |
Since we started the re-reco and re-L1 of the HLT TDR samples, I think we can close this PR. |
I leave it to @aehart to comment further. I'm surprised by your decision to reject this PR. I understood that even with single-threaded running, these tables waste about 2 GB of memory, which must make running on the GRID challenging. And Andrew Hart showed that disabling them doesn't adversely affect the displaced L1 tracking efficiency. (For prompt L1 tracking, these tables are irrelevant). |
I'm also surprised by this decision. My understanding was that this was a necessary and urgent change to enable the extended L1 tracking to run for production jobs. If the extended tracking will not be run for production jobs, then it doesn't matter either way. But I thought the goal was to have it run. |
@rekovic @silviodonato we are all surprised that this was not integrated, as it seemed it was an urgent and necessary requirement for running production jobs? can someone please clarify, so that we know how to proceed? should this be opened as a PR to master instead at least? |
@skinnari regarding the last question, there is an open PR to the master branch: #30847 It would still be good though to get clarification on the decision made for this PR and what we need to do to get #30847 merged @rekovic @silviodonato |
PR description:
Addresses issues #30742 and #30744. After discussion with @skinnari and @tomalin, we decided that the tables in the TrackletEngineDisplaced and TripletEngine can be disabled by default. These tables are meant to reduce the occupancy for downstream modules while minimally impacting the ultimate tracking efficiency. This will be necessary for the eventual FPGA design, but given the problems with these tables in terms of memory usage and the effect on tracking efficiency, we think disabling them in the emulation is the best solution for now.
I also increased the maximum number of tracklets in the case of the extended algorithm. This is a temporary change that is necessary to handle the additional tracklets that will eventually be removed by the now-disabled tables.
Finally, there are a few other minor changes meant to reduce the memory usage of these tables. Several other changes are foreseen in the near future. But since the tables are disabled by default now, this is somewhat moot for the time being.
PR validation:
I ran the L1TrackNtupleMaker analyzer over 100 ttbar events with 200 PU. With the baseline, non-extended algorithm, the results are exactly identical, as expected. With the extended algorithm, I see approximately half the memory usage in
top
, and the memory usage does not appear to grow over time as seen before.I also ran over a sample of displaced muons with no PU with the extended algorithm. With the tables disabled, as is now the default, we see good efficiency for tracks with d0 up to ~5 cm.