Fix PFClusterSoAProducer to read a device collection #46830

fwyzard · 2024-11-30T09:46:11Z

PR description:

Fix PFClusterSoAProducer to read a device collection instead of a host collection, when running on a GPU backend.

Note:this is a quick workaround to let the device code use the device collection, while being able to access the actual number of pf rechits on the host side. It should replaced with a better and more general implementation, and the use of the host collection should be removed.

PR validation:

Full 2024 HLT menu works with these changes, both on CPU and on GPU.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

May be backported to 14.2.x or earlier if there is interest.

fwyzard · 2024-11-30T09:46:18Z

enable gpu

fwyzard · 2024-11-30T09:46:21Z

please test

cmsbuild · 2024-11-30T09:46:31Z

cms-bot internal usage

cmsbuild · 2024-11-30T09:52:15Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46830/42850

cmsbuild · 2024-11-30T09:52:36Z

A new Pull Request was created by @fwyzard for master.

It involves the following packages:

RecoParticleFlow/PFClusterProducer (reconstruction)

@jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@felicepantaleo, @hatakeyamak, @lgray, @missirol, @mmarionncern, @rovere, @sameasy, @seemasharmafnal this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

fwyzard · 2024-11-30T10:59:21Z

please test

cmsbuild · 2024-11-30T10:59:31Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46830/42851

cmsbuild · 2024-11-30T10:59:45Z

Pull request #46830 was updated. @jfernan2, @mandrenguyen can you please check and sign again.

fwyzard · 2024-11-30T11:08:59Z

please hold

fwyzard · 2024-11-30T11:15:29Z

OK, the proposed fix cannot work, because we need to know the number of rechits on the host:

      if (pfRecHits->metadata().size() != 0)
        nRH = pfRecHits->size();

fwyzard · 2024-11-30T11:16:08Z

I guess this is a common enough pattern that we should find a general solution 🤔

cmsbuild · 2024-11-30T12:37:16Z

-1

Failed Tests: RelVals-GPU
Size: This PR adds an extra 32KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0aff9f/43166/summary.html
COMMIT: c0252e9
CMSSW: CMSSW_15_0_X_2024-11-29-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46830/43166/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-GPU

12834.42312834.423_TTbar_14TeV+2024_Patatrack_HCALOnlyGPUandAlpaka_Validation/step3_TTbar_14TeV+2024_Patatrack_HCALOnlyGPUandAlpaka_Validation.log
12834.42212834.422_TTbar_14TeV+2024_Patatrack_HCALOnlyAlpaka_Validation/step3_TTbar_14TeV+2024_Patatrack_HCALOnlyAlpaka_Validation.log

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 10 differences found in the comparisons
DQMHistoTests: Total files compared: 46
DQMHistoTests: Total histograms compared: 3484682
DQMHistoTests: Total failures: 521
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3484141
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 45 files compared)
Checked 202 log files, 172 edm output root files, 46 DQM output files
TriggerResults: found differences in 1 / 44 workflows

fwyzard · 2024-11-30T12:47:21Z

please test

fwyzard · 2024-11-30T12:49:23Z

please unhold

fwyzard · 2024-11-30T12:49:34Z

assign heterogeneous

fwyzard · 2024-11-30T12:49:41Z

@makortel what do you think ?

cmsbuild · 2024-11-30T12:50:10Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46830/42852

cmsbuild · 2024-11-30T12:50:32Z

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2024-11-30T12:50:34Z

Pull request #46830 was updated. @fwyzard, @jfernan2, @makortel, @mandrenguyen can you please check and sign again.

cmsbuild · 2024-11-30T14:34:57Z

+1

Size: This PR adds an extra 32KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0aff9f/43169/summary.html
COMMIT: 2bba89f
CMSSW: CMSSW_15_0_X_2024-11-29-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46830/43169/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially added 2 lines to the logs
Reco comparison results: 71 differences found in the comparisons
DQMHistoTests: Total files compared: 46
DQMHistoTests: Total histograms compared: 3484682
DQMHistoTests: Total failures: 1255
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3483407
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 45 files compared)
Checked 202 log files, 172 edm output root files, 46 DQM output files
TriggerResults: found differences in 1 / 44 workflows

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 7
DQMHistoTests: Total histograms compared: 53058
DQMHistoTests: Total failures: 54
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 53004
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
Checked 24 log files, 30 edm output root files, 7 DQM output files
TriggerResults: no differences found

mmusich · 2024-12-02T12:18:37Z

type bug-fix

makortel · 2024-12-02T17:18:26Z

@makortel what do you think ?

Looks reasonable for a (better) workaround.

I guess this is a common enough pattern that we should find a general solution 🤔

Agreed, how about a new GitHub issue on this topic?

(IIRC in the pixel code the approach was to allocate memory and launch kernels based on the capacity of the containers rather than the "number of elements", and the "number of elements" was used only on the device code to terminate the loops early)

jfernan2 · 2024-12-03T14:43:42Z

+1

fwyzard · 2024-12-06T09:44:06Z

See #46887 for an alternative (and hopefully better) approach.

fwyzard · 2024-12-09T07:05:28Z

Here is the impact of this PR on the HCAL+PF reconstruction, measured on a machine with 2× AMD Bergamo CPUs and 4× NVIDIA L4 GPUs.

baseline

Running 4 times over 20500 events with 16 jobs, each with 32 threads, 24 streams, and 1 GPUs
  7717.3 ±   0.2 ev/s (20000 events, 96.5% overlap),   7708.2 ±   0.2 ev/s (⩾ 17570 events, overlap-only)
  7731.3 ±   0.2 ev/s (20000 events, 96.6% overlap),   7726.6 ±   0.2 ev/s (⩾ 17680 events, overlap-only)
  7744.3 ±   0.2 ev/s (20000 events, 97.2% overlap),   7738.5 ±   0.2 ev/s (⩾ 17880 events, overlap-only)
  7737.4 ±   0.2 ev/s (20000 events, 96.9% overlap),   7730.5 ±   0.2 ev/s (⩾ 17690 events, overlap-only)
 --------------------
  7732.6 ±  11.5 ev/s,   7725.9 ±  12.8 ev/s (⩾ 17570 events, overlap-only)

with this PR

Running 4 times over 20500 events with 16 jobs, each with 32 threads, 24 streams, and 1 GPUs
  8543.5 ±   0.2 ev/s (20000 events, 98.6% overlap),   8541.6 ±   0.2 ev/s (⩾ 19300 events, overlap-only)
  8533.4 ±   0.1 ev/s (20000 events, 98.0% overlap),   8532.1 ±   0.1 ev/s (⩾ 19070 events, overlap-only)
  8546.9 ±   0.1 ev/s (20000 events, 98.4% overlap),   8545.7 ±   0.1 ev/s (⩾ 19240 events, overlap-only)
  8538.7 ±   0.1 ev/s (20000 events, 98.5% overlap),   8538.3 ±   0.1 ev/s (⩾ 19250 events, overlap-only)
 --------------------
  8540.6 ±   5.9 ev/s,   8539.4 ±   5.8 ev/s (⩾ 19070 events, overlap-only)

i.e. a 10% speed up.

fwyzard · 2024-12-09T07:07:05Z

Closing in favour of #46887.

cmsbuild added this to the CMSSW_15_0_X milestone Nov 30, 2024

cmsbuild added reconstruction-pending pending-signatures orp-pending tests-started code-checks-pending labels Nov 30, 2024

cmsbuild added code-checks-approved and removed code-checks-pending labels Nov 30, 2024

fwyzard force-pushed the fix_PFClusterSoAProducer_input_collection branch from cb06dce to c0252e9 Compare November 30, 2024 10:58

cmsbuild added tests-pending code-checks-pending and removed tests-started code-checks-approved labels Nov 30, 2024

cmsbuild added tests-started code-checks-approved and removed tests-pending code-checks-pending labels Nov 30, 2024

cmsbuild removed the tests-started label Nov 30, 2024

cmsbuild added tests-started code-checks-pending and removed tests-rejected code-checks-approved labels Nov 30, 2024

cmsbuild added the heterogeneous-pending label Nov 30, 2024

cmsbuild added code-checks-approved and removed code-checks-pending labels Nov 30, 2024

cmsbuild added tests-approved and removed tests-started labels Nov 30, 2024

cmsbuild mentioned this pull request Dec 1, 2024

Extend CMSSW to a distributed application over MPI #32632

Open

cmsbuild added the bug-fix label Dec 2, 2024

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Dec 3, 2024

fwyzard mentioned this pull request Dec 6, 2024

Fix PFClusterSoAProducer to read a device collection #46887

Merged

fwyzard closed this Dec 9, 2024

fwyzard deleted the fix_PFClusterSoAProducer_input_collection branch January 21, 2025 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PFClusterSoAProducer to read a device collection #46830

Fix PFClusterSoAProducer to read a device collection #46830

fwyzard commented Nov 30, 2024 •

edited

Loading

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

cmsbuild commented Nov 30, 2024 •

edited

Loading

cmsbuild commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

fwyzard commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

mmusich commented Dec 2, 2024

makortel commented Dec 2, 2024

jfernan2 commented Dec 3, 2024

fwyzard commented Dec 6, 2024

fwyzard commented Dec 9, 2024

fwyzard commented Dec 9, 2024

Fix PFClusterSoAProducer to read a device collection #46830

Fix PFClusterSoAProducer to read a device collection #46830

Conversation

fwyzard commented Nov 30, 2024 • edited Loading

PR description:

PR validation:

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

cmsbuild commented Nov 30, 2024 • edited Loading

cmsbuild commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

fwyzard commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

RelVals-GPU

Comparison Summary

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

fwyzard commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

cmsbuild commented Nov 30, 2024

Comparison Summary

GPU Comparison Summary

mmusich commented Dec 2, 2024

makortel commented Dec 2, 2024

jfernan2 commented Dec 3, 2024

fwyzard commented Dec 6, 2024

fwyzard commented Dec 9, 2024

baseline

with this PR

fwyzard commented Dec 9, 2024

fwyzard commented Nov 30, 2024 •

edited

Loading

cmsbuild commented Nov 30, 2024 •

edited

Loading