Reduce the ECAL and HCAL GPU memory usage #39577

fwyzard · 2022-10-03T12:26:04Z

PR description:

Allocate memory buffers based on the actual number of events, instead of always allocating the maximum size.

Update the HLT menus to remove the obsolete parameters, if they are present.

Reduces the total GPU memory from running the HLT with 4 jobs with 32 threads and 32 streams by about 25%:

release	total	reserved	used	free
CMSSW_12_4_9	15360 MB	449 MB	10678 - 10090 MB	4231 - 4819 MB
with #39580	15360 MB	449 MB	7366 - 8056 MB	7543 - 6853 MB

Thanks to @VinInn for finding the issue and for the changes.

PR validation:

The full HLT menu runs on GPU (with 12.4.9 plus #39580) without issues.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

To be backported to 12.4.x and 12.5.x for data taking.

Allocate memory buffers based on the actual number of events, instead of always allocating the maximum size.

fwyzard · 2022-10-03T12:31:33Z

@mariadalfonso @thomreis can you double check these changes ?

cmsbuild · 2022-10-03T12:34:42Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39577/32363

This PR adds an extra 28KB to repository

cmsbuild · 2022-10-03T12:35:08Z

A new Pull Request was created by @fwyzard (Andrea Bocci) for master.

It involves the following packages:

RecoLocalCalo/EcalRecProducers (reconstruction)
RecoLocalCalo/HcalRecProducers (reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks.
@youyingli, @apsallid, @rchatter, @argiro, @missirol, @bsunanda, @thomreis, @simonepigazzini, @mariadalfonso, @abdoulline this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

fwyzard · 2022-10-03T12:38:53Z

enable gpu

fwyzard · 2022-10-03T12:38:58Z

please test

cmsbuild · 2022-10-03T12:47:00Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39577/32364

This PR adds an extra 32KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File HLTrigger/Configuration/python/customizeHLTforCMSSW.py modified in PR(s): Tracker Traits and Enabling Phase2 for Inner Tracker Reconstruction on GPU #38761

cmsbuild · 2022-10-03T12:47:25Z

Pull request #39577 was updated. @Martin-Grunewald, @missirol, @mandrenguyen, @clacaputo can you please check and sign again.

thomreis · 2022-10-03T15:11:42Z

Looks good to me for the ECAL part.
For the sizeEB and sizeEE themselves uint16_t would be sufficient in fact but not for the sum of the two.

mariadalfonso · 2022-10-03T17:31:16Z

looks good from HCAL point of view ! thanks for making the change

cmsbuild · 2022-10-03T19:20:48Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8d272e/27931/summary.html
COMMIT: 25d2630
CMSSW: CMSSW_12_6_X_2022-10-02-2300/el8_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/39577/27931/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-8d272e/41834.0_TTbar_14TeV+2026D94+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal

Summary:

No significant changes to the logs found
Reco comparison results: 9 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3432650
DQMHistoTests: Total failures: 9
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3432619
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
Checked 204 log files, 49 edm output root files, 49 DQM output files
TriggerResults: no differences found

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
Reco comparison had 3 failed jobs
DQMHistoTests: Total files compared: 4
DQMHistoTests: Total histograms compared: 19876
DQMHistoTests: Total failures: 74
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 19802
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
Checked 12 log files, 9 edm output root files, 4 DQM output files
TriggerResults: no differences found

missirol · 2022-10-03T19:59:10Z

+hlt

mandrenguyen · 2022-10-04T06:56:49Z

assign heterogeneous

cmsbuild · 2022-10-04T06:57:14Z

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

mandrenguyen · 2022-10-04T06:57:33Z

+reconstruction
CPU wfs unmodified. GPU wfs look to have "the usual" differences, but I let heterogeneous explicitly sign off just in case.

fwyzard · 2022-10-04T09:28:03Z

+heterogeneous

cmsbuild · 2022-10-04T09:28:26Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

perrotta · 2022-10-04T10:18:02Z

+1

VinInn and others added 2 commits October 3, 2022 14:24

Reduce ECAL memory usage

fdceff5

Reduce the ECAL and HCAL GPU memory usage

e8b5797

Allocate memory buffers based on the actual number of events, instead of always allocating the maximum size.

cmsbuild added this to the CMSSW_12_6_X milestone Oct 3, 2022

cmsbuild added code-checks-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Oct 3, 2022

cmsbuild added code-checks-approved and removed code-checks-pending labels Oct 3, 2022

Remove the obsolete ECAL and HCAL rechit parameters from the HLT menu

25d2630

cmsbuild added code-checks-pending hlt-pending and removed code-checks-approved labels Oct 3, 2022

cmsbuild added tests-started and removed tests-pending labels Oct 3, 2022

This was referenced Oct 3, 2022

Reduce the ECAL and HCAL GPU memory usage [12.5.x] #39579

Merged

Reduce the ECAL and HCAL GPU memory usage [12.4.x] #39580

Merged

cmsbuild added code-checks-approved and removed code-checks-pending labels Oct 3, 2022

cmsbuild added tests-approved and removed tests-started labels Oct 3, 2022

cmsbuild added hlt-approved and removed hlt-pending labels Oct 3, 2022

cmsbuild added the heterogeneous-pending label Oct 4, 2022

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Oct 4, 2022

cmsbuild added fully-signed heterogeneous-approved and removed pending-signatures heterogeneous-pending labels Oct 4, 2022

cmsbuild added orp-approved and removed orp-pending labels Oct 4, 2022

cmsbuild merged commit ff07d94 into cms-sw:master Oct 4, 2022

fwyzard deleted the reduce_ECAL_HCAL_GPU_memory_usage branch October 11, 2022 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the ECAL and HCAL GPU memory usage #39577

Reduce the ECAL and HCAL GPU memory usage #39577

fwyzard commented Oct 3, 2022 •

edited

Loading

fwyzard commented Oct 3, 2022

cmsbuild commented Oct 3, 2022

cmsbuild commented Oct 3, 2022

fwyzard commented Oct 3, 2022

fwyzard commented Oct 3, 2022

cmsbuild commented Oct 3, 2022

cmsbuild commented Oct 3, 2022

thomreis commented Oct 3, 2022

mariadalfonso commented Oct 3, 2022

cmsbuild commented Oct 3, 2022

missirol commented Oct 3, 2022

mandrenguyen commented Oct 4, 2022

cmsbuild commented Oct 4, 2022

mandrenguyen commented Oct 4, 2022

fwyzard commented Oct 4, 2022

cmsbuild commented Oct 4, 2022

perrotta commented Oct 4, 2022

Reduce the ECAL and HCAL GPU memory usage #39577

Reduce the ECAL and HCAL GPU memory usage #39577

Conversation

fwyzard commented Oct 3, 2022 • edited Loading

PR description:

PR validation:

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

fwyzard commented Oct 3, 2022

cmsbuild commented Oct 3, 2022

cmsbuild commented Oct 3, 2022

fwyzard commented Oct 3, 2022

fwyzard commented Oct 3, 2022

cmsbuild commented Oct 3, 2022

cmsbuild commented Oct 3, 2022

thomreis commented Oct 3, 2022

mariadalfonso commented Oct 3, 2022

cmsbuild commented Oct 3, 2022

Comparison Summary

GPU Comparison Summary

missirol commented Oct 3, 2022

mandrenguyen commented Oct 4, 2022

cmsbuild commented Oct 4, 2022

mandrenguyen commented Oct 4, 2022

fwyzard commented Oct 4, 2022

cmsbuild commented Oct 4, 2022

perrotta commented Oct 4, 2022

fwyzard commented Oct 3, 2022 •

edited

Loading