Make 0.6 GeV Default pT Cut #397

GNiendorf · 2024-05-03T17:07:39Z

PR for discussion of moving the current pT cut from 0.8 to 0.6 GeV.

GNiendorf · 2024-05-03T17:08:03Z

/run standalone
/run CMSSW

slava77 · 2024-05-03T17:19:38Z

I did not remember during the meeting that there was no toggle to go to 0.6 from 0.8.

Perhaps we can first test what @VourMa proposed, to check if 0.6 GeV bin files can be used safely (no physics change and no significant slowdown) with the old default 0.8 GeV.

github-actions · 2024-05-03T17:22:47Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     43.1    324.7    124.0     68.7     93.5    546.3    128.4    156.3    104.3      2.2    1591.5    1002.1+/- 266.5     433.5   explicit_cache[s=4] (master)
   avg     50.6    329.8    373.5    209.2    412.9   1215.4    326.8    747.8    244.0      2.3    3912.3    2646.3+/- 836.5    1020.0   explicit_cache[s=4] (this PR)

GNiendorf · 2024-05-03T17:27:19Z

Perhaps we can first test what @VourMa proposed, to check if 0.6 GeV bin files can be used safely (no physics change and no significant slowdown) with the old default 0.8 GeV.

Sure, sounds like a good check. I'll make another commit after the CI finishes running and rerun the CI with only the files changed.

slava77 · 2024-05-03T17:46:33Z

this confirms that fakes are not localized to pt<0.8 GeV. The effect is apparently from the if (pt > 5) passThrough; logic.
I don't think this should block/prevent the low-pt variant to become the default.
But this re-iterates the necessity to review the pass-through selection logic.

GNiendorf · 2024-05-03T17:59:01Z

Yes, that's what I saw before. pT3's contribute most to the fakerate increase at high pT.

slava77 · 2024-05-03T18:07:21Z

Yes, that's what I saw before. pT3's contribute most to the fakerate increase at high pT.

indeed.
the main update since then is that pT is more consistently defined and the step at 5 GeV is now explicit.

GNiendorf · 2024-05-03T18:42:51Z

Breakdown plot from above, I see what you're saying now @slava77. @ariostas Any chance you could change the CI to also include the breakdown plots for the master branch in the tar file? Right now it just seems like the comparison plots and the breakdown plots for the PR.

github-actions · 2024-05-03T18:42:57Z

There was a problem while building and running with CMSSW. The logs can be found here.

GNiendorf · 2024-05-03T18:47:00Z

/run standalone
/run CMSSW

github-actions · 2024-05-03T18:59:57Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     43.3    322.7    121.8     68.9     93.4    544.2    128.0    158.7    103.6      2.5    1587.1     999.6+/- 266.4     431.1   explicit_cache[s=4] (master)
   avg     43.6    321.5    154.1     69.8     92.6    541.6    122.5    179.0    101.9      1.8    1628.2    1043.1+/- 282.4     441.1   explicit_cache[s=4] (this PR)

ariostas · 2024-05-03T19:07:03Z

There was a problem while building and running with CMSSW. The logs can be found here.

I just updated the CI to use the new version of CMSSW. @GNiendorf I'll restart the cmssw check.

Any chance you could change the CI to also include the breakdown plots for the master branch in the tar file?

I did it this way to keep the size of the archive as small as possible. You could look at the commit history, and check the breakdown plots of the previous PR. But if there's strong reasons to keep both PR and master for each PR I'm happy to implement that.

GNiendorf · 2024-05-03T20:10:06Z

I did it this way to keep the size of the archive as small as possible. You could look at the commit history, and check the breakdown plots of the previous PR. But if there's strong reasons to keep both PR and master for each PR I'm happy to implement that.

Oh yeah fair point.

github-actions · 2024-05-03T20:30:14Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

GNiendorf · 2024-05-03T20:33:56Z

@slava77 Timing with 0.6 Maps

Current Master

GNiendorf · 2024-05-03T21:29:59Z

/run CMSSW

github-actions · 2024-05-03T22:20:27Z

There was a problem while building and running with CMSSW. The logs can be found here.

slava77 · 2024-05-03T22:29:48Z

There was a problem while building and running with CMSSW. The logs can be found here.

From

Module: LSTOutputConverter:highPtTripletStepTrackCandidates (crashed)

this may need some manual debugging.

It looks like the 0.6 GeV maps are mostly OK; the CPU variant LS kernel is apparently slower (it's less visible/significant in the GPU case)

slava77 · 2024-05-07T18:13:54Z

There was a problem while building and running with CMSSW. The logs can be found here.

From
Module: LSTOutputConverter:highPtTripletStepTrackCandidates (crashed)
this may need some manual debugging.

It looks like the 0.6 GeV maps are mostly OK; the CPU variant LS kernel is apparently slower (it's less visible/significant in the GPU case)

@GNiendorf
perhaps you can reproduce this locally.
@ariostas since you know the CI setup, you may get there faster.
after it crashes locally, just run under the GDB.
USER_CXXFLAGS="-g" scram b ... can be used to add the debug symbols

GNiendorf · 2024-05-07T21:17:13Z

Given that this is an issue within the CMSSW setup probably, maybe @ariostas or @VourMa could take a look?

ariostas · 2024-05-07T21:20:25Z

@GNiendorf yeah, I'll take a look

ariostas · 2024-05-08T15:47:54Z

I opened a PR to fix the issue in SegmentLinking/cmssw#24.

Also, looking at the logs you see that for one of the events you get this warning:

*********************************************************
* Warning: Pixel line segments will be truncated.       *
* You need to increase N_MAX_PIXEL_SEGMENTS_PER_MODULE. *
*********************************************************

So it seems like it's pretty close to the edge and it might be worth increasing that a bit.

I'll run the CI with the above PR to make sure that it works.

/run cmssw 24

github-actions · 2024-05-08T16:53:53Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

slava77 · 2024-05-08T22:32:19Z

The PR was built and ran successfully with CMSSW. Here are some plots.

@ariostas please remind me why the reference is not shown in TrackLooper+cmssw PR test? Can we show the master as is as a reference?

ariostas · 2024-05-08T22:37:29Z

Currently it doesn't do the comparison if using a PR or a different branch, but yeah I should change it so that it does the comparison as long as the cmssw version is the same.

GNiendorf · 2024-05-28T18:51:37Z

/run standalone
/run CMSSW

slava77 · 2024-05-28T19:04:27Z

SDL/Quintuplet.h

        else if (category_number == 0 && eta_number == 1)
-          occupancy = 414;
+          occupancy = 86;


are these numbers lower due to a more restrictive target?
We had 99.99% for T5; 99.9% for T3; 99% for segments, 99.99% for minidoublets.

github-actions · 2024-05-28T19:06:00Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     45.3    326.0    124.2     50.1     97.2    504.6    133.8    159.1    100.7      2.0    1543.0     993.1+/- 259.4     424.7   explicit_cache[s=4] (master)
   avg     55.5    333.0    381.6    178.7    556.3   1134.7    362.0    793.0    236.9      9.0    4040.7    2850.5+/- 972.8    1077.1   explicit_cache[s=4] (this PR)

GNiendorf · 2024-05-28T19:07:17Z

It seems like the T5 occupancies were set incredibly high. @YonsiG Do you know what percent they were set to? Even at 99.99% I find that the low pT occupancies are lower than the current occupancies.

slava77 · 2024-05-28T19:12:22Z

@GNiendorf
is your stat analysis based on non-zero occupancies?
or does it increase empty modules?

GNiendorf · 2024-05-28T19:37:16Z

@GNiendorf is your stat analysis based on non-zero occupancies? or does it increase empty modules?

I changed it to only consider non-zero occupancies.

GNiendorf · 2024-05-28T19:48:01Z

/run standalone
/run CMSSW

github-actions · 2024-05-28T20:03:41Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     45.3    322.8    123.9     48.6     96.4    502.0    134.7    157.0     99.6      1.6    1532.0     984.7+/- 262.4     421.4   explicit_cache[s=4] (master)
   avg     57.8    333.7    398.5    185.9    576.2   1232.9    365.1    794.2    251.1     16.2    4211.6    2920.9+/- 985.2    1133.8   explicit_cache[s=4] (this PR)

github-actions · 2024-05-28T22:25:24Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

GNiendorf · 2024-05-28T23:07:49Z

Plots for 1000 events: https://www.classe.cornell.edu/~gsn27/www/www/PRlowpT/

Seems like the efficiency difference is pretty minimal now at high pT @slava77

slava77 · 2024-05-29T12:57:32Z

SDL/Segment.h

        else if (category_number == 1 && eta_number == 1)
-          occupancy = 128;
+          occupancy = 653;


some updates are going up quite a bit, like here. On the other hand some others like in T5 https://github.com/SegmentLinking/TrackLooper/pull/397/files#diff-d9d3d7c519ab64f27b6e501281a0eb282be964d2802ddca40387348a68bb6cb0R3182 are going down

Cat1 is layers 4,5,6. It looks like something like layer 6 or even 5 already opens up a lot.
On one hand, an additional category may be useful.
On the other, it would be good to see some plots of the segment angles (is it phi change): there was a discussion to truncate this so that it doesn't hit pi/2 or even larger.

Cat1 is layers 4,5,6. It looks like something like layer 6 or even 5 already opens up a lot.

given this is a segment; the possible layers are 4 and 5.

slava77 · 2024-05-29T13:00:14Z

@GNiendorf
do you have plots showing how the new occupancy numbers were derived?

GNiendorf · 2024-05-29T16:22:30Z

@GNiendorf do you have plots showing how the new occupancy numbers were derived?

https://github.com/SegmentLinking/TrackLooper/blob/changept/scripts/compute_occupancies.ipynb

slava77 · 2024-05-29T17:43:21Z

https://github.com/SegmentLinking/TrackLooper/blob/changept/scripts/compute_occupancies.ipynb

file_path = "occ_1000_p06.root"

is this 1K events?

GNiendorf · 2024-05-29T17:45:50Z

https://github.com/SegmentLinking/TrackLooper/blob/changept/scripts/compute_occupancies.ipynb

file_path = "occ_1000_p06.root"

is this 1K events?

Yes, from PU200RelVal

GNiendorf · 2024-05-29T20:55:46Z

If I increase the occupancies by 10x from where they are in this PR, I get the following:

So it seems like the efficiency at high pT is recovered fully if you put the occupancies high enough.

slava77 · 2024-05-29T22:27:22Z

If I increase the occupancies by 10x from where they are in this PR, I get the following:
So it seems like the efficiency at high pT is recovered fully if you put the occupancies high enough.

if the increase goes far above what we have in the master, it could be that what we observe is in part just a recovery of the inefficiency brought in by truncation initially.

How does master compare with just the increase of the occupancy cutoffs (without the change in the pt cut)?

GNiendorf · 2024-05-29T22:49:52Z

If I just increase the occupancies on master by 10x I get the following.

GNiendorf · 2024-05-31T02:14:45Z

This PR Timing

Master Timing

Max Memory Usage, Master (L40, 175 events, 8 streams, caching allocator on)
5621MiB / 46068MiB

Max Memory Usage, This PR (same setup)
9419MiB / 46068MiB

slava77 · 2024-05-31T13:15:47Z

Does 46068MiB correspond to the total memory of the card less some restricted value? (specs appear to say the L40 has 48 GB)?

How well is the caching allocator shared between modules of the same process?

What is happening in case of multiple processes? I think that we can expect between 8 and 16 thread jobs populating a 256-core node with at best 2 cards. So, from 8 to as many as 32 (or more?) processes will talk to the same card.

@dan131riley
perhaps you know this already, if memory is more restricted that what we usually see in our single job per node tests. What's a good way to test?

dan131riley · 2024-05-31T13:50:13Z

48GB for the L40 is correct, but the CUDA device driver allocates some of that. We actually see the same thing on the host system specs, max available mem is always less that the actual physical mem.

The caching allocator works a lot like malloc(). Within a process, memory available to the caching allocator is shared between modules so long as modules free caching allocator memory in a timely fashion. Memory allocated by one process is held onto by the caching allocator, so isn't normally available to other processes. I don't remember offhand if there is a call to force releasing caching allocator memory--if there is, I wouldn't expect it to be very effective because the caching allocator is likely to fragment its address space.

Last I heard I thought the preferred operating mode with GPUs (at least for HLT) was one process per socket with a GPU bound to each socket. I don't know how that's expected to scale up to 96 cores per socket, but I can't imagine it going well. Nvidia does have the "CUDA Multi Process Service management" (MPS) for sharing GPU resources between processes--I've used it in the past for benchmarking, but not recently. I don't know if it figures in the HLT plans, and Nvidia seems to keep tweaking how it works with each new architecture generation.

slava77 · 2024-05-31T14:08:48Z

Last I heard I thought the preferred operating mode with GPUs (at least for HLT) was one process per socket with a GPU bound to each socket.

Ah, indeed, it's not 8 or 16 threads.
I got 32 threads and 24 streams for the current HLT from Andrea in https://mattermost.web.cern.ch/cms-tsg/pl/pfc9i5nt8pf788btfg7y8ybowy in January.

GNiendorf · 2024-06-10T20:29:05Z

Closing, see new PR: SegmentLinking/cmssw#39

make 0.6GeV Default

Loading
Loading status checks…

526e3af

GNiendorf force-pushed the changept branch from b8d955c to 526e3af Compare May 3, 2024 21:29

ariostas mentioned this pull request May 8, 2024

Fix handling empty TrajectorySeedCollection in LSTOutputConverter SegmentLinking/cmssw#24

Closed

GNiendorf added 2 commits May 9, 2024 12:20

Merge branch 'master' into changept

Loading
Loading status checks…

c93de91

raise n_max_pixel_seg

Loading
Loading status checks…

f3af401

slava77 reviewed May 28, 2024

View reviewed changes

set t5 and md occ to 99.99%

Loading
Loading status checks…

d7d4c17

SegmentLinking deleted a comment from github-actions bot May 28, 2024

reset t3 occ at 99.9%

Loading
Loading status checks…

84853e6

slava77 reviewed May 29, 2024

View reviewed changes

dummy commit to show occ tuning script

Loading
Loading status checks…

8b75100

Merge branch 'master' into changept

Loading
Loading status checks…

f68c1e7

Merge branch 'master' into changept

Loading
Loading status checks…

010a151

GNiendorf closed this Jun 10, 2024

Make 0.6 GeV Default pT Cut #397

Make 0.6 GeV Default pT Cut #397

Conversation

GNiendorf commented May 3, 2024

GNiendorf commented May 3, 2024

slava77 commented May 3, 2024

github-actions bot commented May 3, 2024

GNiendorf commented May 3, 2024

slava77 commented May 3, 2024

GNiendorf commented May 3, 2024

slava77 commented May 3, 2024

GNiendorf commented May 3, 2024 • edited Loading

github-actions bot commented May 3, 2024

GNiendorf commented May 3, 2024

github-actions bot commented May 3, 2024

ariostas commented May 3, 2024

GNiendorf commented May 3, 2024

github-actions bot commented May 3, 2024

GNiendorf commented May 3, 2024

GNiendorf commented May 3, 2024

github-actions bot commented May 3, 2024

slava77 commented May 3, 2024

slava77 commented May 7, 2024

GNiendorf commented May 7, 2024 • edited Loading

ariostas commented May 7, 2024

ariostas commented May 8, 2024

github-actions bot commented May 8, 2024

slava77 commented May 8, 2024

ariostas commented May 8, 2024

GNiendorf commented May 28, 2024

slava77 May 28, 2024

Choose a reason for hiding this comment

github-actions bot commented May 28, 2024

GNiendorf commented May 28, 2024

slava77 commented May 28, 2024

GNiendorf commented May 28, 2024 • edited Loading

GNiendorf commented May 28, 2024

github-actions bot commented May 28, 2024

github-actions bot commented May 28, 2024

GNiendorf commented May 28, 2024

slava77 May 29, 2024

Choose a reason for hiding this comment

slava77 May 29, 2024

Choose a reason for hiding this comment

slava77 commented May 29, 2024

GNiendorf commented May 29, 2024

slava77 commented May 29, 2024

GNiendorf commented May 29, 2024

GNiendorf commented May 29, 2024

slava77 commented May 29, 2024

GNiendorf commented May 29, 2024

GNiendorf commented May 31, 2024 • edited Loading

slava77 commented May 31, 2024

dan131riley commented May 31, 2024

slava77 commented May 31, 2024

GNiendorf commented Jun 10, 2024

GNiendorf commented May 3, 2024 •

edited

Loading

GNiendorf commented May 7, 2024 •

edited

Loading

GNiendorf commented May 28, 2024 •

edited

Loading

GNiendorf commented May 31, 2024 •

edited

Loading