Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow up to the Alpaka integration in CMSSW #43853

Merged
merged 25 commits into from
Feb 12, 2024

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Feb 3, 2024

PR description:

HeterogeneousCore/AlpakaInterface

Rewrite the work division utilities:

  • rewrite the uniform element kernel loops.
  • rewrite the independent element kernel loops.
  • remove the obsolete alpakatools utilities.

Rewrite the zeroAndInit kernel using the alpakatools utilities.
Rename elements_with_stride to uniform_elements in user code.

RecoLocalTracker/SiPixelClusterizer

Implement fixes that were lost during the CUDA-to-Alpaka migration.

Collect the updates from #39711, #42618, #42977 and #43052, that were missing from the Alpaka implementation:

  • skip invalid or corrupted ROCs: make the Alpaka implementation of the pixel unpacker skip spurious ROCs, similar to the legacy and CUDA versions of the unpacker, and store the
    invalid ROC number error (errorType=36);
  • fix the decoding of channels with timeout errors (errorType=27);
  • synchronise the treatment of pixel errorType=26, 27, 30 with the legacy code;
  • disable printf statements at compile time.
RecoTracker/PixelSeeding

Do not call the fishbone for events with pixel hits only in BPIX1 (reimplement #35638).

RecoTracker/MeasurementDet

Add a check on the pixel ROC range, to avoid overflows in the (unsigned) range differences.

DataFormats/TrackingRecHitSoA

Extend TrackingRecHits collections for testing and their unit tests.

CondFormats/SiPixelObjects

Clean up includes and dependencies, and avoid including framework headers in the device compilation.

Geometry/CommonTopologies

Update comments to pixel topologies.

Other changes

Fix include guards after files were moved to different packages.
Clean up include files.
Simplify nested namespace declarations.

PR validation:

Unit tests pass.

Validated running a recent HLT menu over 20k events without issues, both on GPU and on CPU, on data and MC: the alpaka implementation running on CPU or GPU gives the same results as the CUDA implementation, within the same non-reproducibility margins (1 event or up to 1% for high-rate triggers).

Backport status

The backport to CMSSW_14_0_X for 2024 data taking is in #43879.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2024

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 3, 2024

assign heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2024

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43853/38691

ERROR: Build errors found during clang-tidy run.

RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToClusterKernel.dev.cc:636:41: error: expected '(' for function-style cast or type construction [clang-diagnostic-error]
  636 |                               CalibDigis<true>{},
      |                               ~~~~~~~~~~^
RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToClusterKernel.dev.cc:636:46: error: initializer list cannot be used on the right hand side of operator '>' [clang-diagnostic-error]
  636 |                               CalibDigis<true>{},
      |                                              ^~~
RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToClusterKernel.dev.cc:645:41: error: expected '(' for function-style cast or type construction [clang-diagnostic-error]
  645 |                               CalibDigis<false>{},
      |                               ~~~~~~~~~~^
RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToClusterKernel.dev.cc:645:47: error: initializer list cannot be used on the right hand side of operator '>' [clang-diagnostic-error]
  645 |                               CalibDigis<false>{},
      |                                               ^~~
Suppressed 571 warnings (571 in non-user code).
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 3, 2024

test parameters:

  • enable = gpu
  • workflows = 12434.402,12434.403,12434.404
  • workflows_gpu = 12434.402,12434.403,12434.404
  • workflow_opts = -w upgrade
  • workflow_opts_gpu = -w upgrade

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 3, 2024

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43853/38692

  • This PR adds an extra 36KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2024

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

Pull request #43853 was updated. @jfernan2, @francescobrivio, @subirsarkar, @consuegs, @perrotta, @srimanob, @saumyaphor4252, @mandrenguyen can you please check and sign again.

@srimanob
Copy link
Contributor

+Upgrade

@perrotta
Copy link
Contributor

+alca

@perrotta
Copy link
Contributor

+db

@mandrenguyen
Copy link
Contributor

are the tests stuck?

@smuzaffar
Copy link
Contributor

are the tests stuck?

No @mandrenguyen , tests are running. Jenkins is very busy today that is why tests are waiting for free nodes

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-536c0d/37363/summary.html
COMMIT: c7d3641
CMSSW: CMSSW_14_1_X_2024-02-12-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43853/37363/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 70 differences found in the comparisons
  • DQMHistoTests: Total files compared: 5
  • DQMHistoTests: Total histograms compared: 71813
  • DQMHistoTests: Total failures: 2870
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 68943
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 4 files compared)
  • Checked 19 log files, 22 edm output root files, 5 DQM output files
  • TriggerResults: no differences found

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @rappoccio, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@antoniovilela
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 6e8031b into cms-sw:master Feb 12, 2024
25 checks passed
@Martin-Grunewald
Copy link
Contributor

Martin-Grunewald commented Feb 13, 2024

@fwyzard
Is there a backport to 14_0 for this PR? Is it #43879 ?

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 13, 2024

Yes, #43879 .
I've updated the description with a link to it.

@fwyzard fwyzard deleted the alpaka_integration_branch branch March 2, 2024 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.