Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Runtime Number of Hits for Alpaka Pixel Reconstruction #44773

Merged
merged 1 commit into from
Apr 18, 2024

Conversation

AdrianoDee
Copy link
Contributor

@AdrianoDee AdrianoDee commented Apr 18, 2024

This PR proposes a solution for #44769. The culprit there is

constexpr auto MAX_HITS = TrackerTraits::maxNumberOfHits;
for (uint32_t i : cms::alpakatools::independent_group_elements(acc, numberOfModules + 1)) {
if (clus_view[i].clusModuleStart() > MAX_HITS)
clus_view[i].clusModuleStart() = MAX_HITS;
}

where we cut the number of hits to "only":

static constexpr uint32_t maxNumberOfHits = 48 * 1024;

And the event of the crash has:

SiPixelClusterizerAlpaka results:
 > no. of digis: 301020
 > no. of active modules: 1794
 > no. of clusters: 50684
 > bpix2 offset: 19595

The poor man solution would be to rise the max to something safer such asstatic constexpr uint32_t maxNumberOfHits = 96 * 1024;. But seems to me a waste also because this procedure is anyway a remnant of that intermediate phase of the Alpaka port when we had no runtime sized histograms (before #43064).

So this PR involves a slightly bigger code refactoring in order to drop the fixed number of hits and use the runtime sized histograms where needed (namely for the hit to tuple map).

PR validation:

The setup in #44769 with this PR (on top of master or 14_0_X) doesn't crash anymore.

Backported to 14_0_X in #44774.

solves #44769.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 18, 2024

cms-bot internal usage

@mmusich
Copy link
Contributor

mmusich commented Apr 18, 2024

tagging @malbouis (ORM)

@malbouis
Copy link
Contributor

urgent

@malbouis
Copy link
Contributor

malbouis commented Apr 18, 2024

type hlt-int

@AdrianoDee
Copy link
Contributor Author

enable gpu

@rappoccio
Copy link
Contributor

I think as soon as the checks finish we can merge in the interest of time.

@AdrianoDee
Copy link
Contributor Author

test parameters:

  • enable = gpu
  • workflows = 12434.402,12434.403,12434.404
  • workflows_gpu = 12434.402,12434.403,12434.404
  • workflow_opts = -w upgrade
  • workflow_opts_gpu = -w upgrade

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44773/40004

Code check has found code style and quality issues which could be resolved by applying following patch(s)

- update for HitToTuple map in CA
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44773/40005

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @AdrianoDee for master.

It involves the following packages:

  • Geometry/CommonTopologies (geometry)
  • RecoLocalTracker/SiPixelClusterizer (reconstruction)
  • RecoLocalTracker/SiPixelRecHits (reconstruction)
  • RecoTracker/PixelSeeding (reconstruction)

@civanch, @Dr15Jones, @jfernan2, @makortel, @mdhildreth, @mandrenguyen, @cmsbuild, @bsunanda can you please review it and eventually sign? Thanks.
@threus, @mtosi, @dkotlins, @VourMa, @gpetruc, @ferencek, @felicepantaleo, @mmusich, @rovere, @dgulhan, @missirol, @JanFSchulte, @tsusa, @mroguljic, @tvami, @fabiocos, @VinInn, @GiacomoSguazzoni, @bsunanda this is something you requested to watch as well.
@rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@AdrianoDee
Copy link
Contributor Author

please test

@AdrianoDee
Copy link
Contributor Author

assign heterogeneous

@cmsbuild
Copy link
Contributor

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bfffbf/38935/summary.html
COMMIT: 29d1688
CMSSW: CMSSW_14_1_X_2024-04-18-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/44773/38935/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 5
  • DQMHistoTests: Total histograms compared: 71813
  • DQMHistoTests: Total failures: 1281
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 70532
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 4 files compared)
  • Checked 19 log files, 22 edm output root files, 5 DQM output files
  • TriggerResults: no differences found

@antoniovilela
Copy link
Contributor

+1
- Requested by ORM as urgent for data taking.

@antoniovilela
Copy link
Contributor

merge
- Requested by ORM as urgent for data taking.

@cmsbuild cmsbuild merged commit 8642310 into cms-sw:master Apr 18, 2024
15 checks passed
@mandrenguyen
Copy link
Contributor

+1

@fwyzard
Copy link
Contributor

fwyzard commented Apr 21, 2024

+heterogeneous

@AdrianoDee AdrianoDee deleted the runtime_nhits_141X branch April 30, 2024 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants