-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Porting Pixel Tracks to Alpaka [Not to Merge] #41117
Conversation
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41117/34753 ERROR: Build errors found during clang-tidy run.
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41117/34770 ERROR: Build errors found during clang-tidy run.
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41117/34807 ERROR: Build errors found during clang-tidy run.
|
@AdrianoDee in case you didnt notice: you'll need to do code checks |
4fb1dd5
to
d83a814
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41117/37986
|
Pull request #41117 was updated. @Martin-Grunewald, @sunilUIET, @fwyzard, @makortel, @mandrenguyen, @consuegs, @nothingface0, @mmusich, @mdhildreth, @jfernan2, @perrotta, @fabiocos, @francescobrivio, @AdrianoDee, @civanch, @srimanob, @syuvivida, @davidlange6, @saumyaphor4252, @rvenditti, @antoniovagnerini, @antoniovilela, @miquork, @cmsbuild, @rappoccio, @tjavaid can you please check and sign again. |
PR description:
Common work with @borzari and @nothingface0.
This PR will allow to run Pixel Tracks Reconstruction in Alpaka. It's still a work in progress and needs to be properly tested. We are opening it so that it is (more) public and may be reviewed by experts.
Will updated the description accordingly while updating the PR.
This includes #40932 with the latest comments received addressed.
This is not to merge and it's here for testing purposes. It has been split in 8 smaller PRs, to be merged in sequence, to ease the review:
(@ericcano)
21st November
Tested with #43064, everything is fine. Some general clean-up renaming:
DataFormats
now are in the formDataFormats/XYXSoA/
;XYZHost
,XYZDevice
,XYZsSoACollection
;*GPU
objects in Alpaka code either with*Device
or nothing (e.g.GPUAlgo
->Algo
);CopyToHost
methods to avoid useless specialization forHost
toHost
copy;ASSERT_DEVICE_MATCHES_HOST_COLLECTION
everywhereSET_PORTABLEHOSTCOLLECTION_READ_RULES
;std::conditional_t
for collection Host/Device definition.The resolution problem was solved by @borzari spotting this (a great catch!):
15th November
This now includes #43064 up to 5f9c2e6.
19th October
We will use this PR as a proxy for the full development in order to be able to run the integration tests. Changing the status to "Ready to review" to be able to run the bot commands and checks.
Module Naming
For the moment we applied the following rule for the naming:
CUDA
we simply drop theCUDA
suffix;Alpaka
to the module name.Where 2. usually applies to SoA to legacy converters.
Additional workflows
An
alpaka
process modifier is added togheter with a set of new workflows:*.55
running Pixel only in Alpaka;*.554
running Pixel only in Alpaka for profiling;*.557
running Pixel only in Alpaka for CPU vs GPU validation;A note: in order to cohabit with the CUDA workflows, for the modules providing the conversion to legacy formats, we had to live with the
SwitchProducedCUDA
logic. For example, for the local reco configurations,siPixelRecHitsPreSplitting
is defined as:and in order to be able to modify or replace it with
toModify
ortoReplaceWith
, thealpaka
modifier acts on thecpu
branch of theSwitchProducedCUDA
.This was the only way we found to keep the same naming for the final AoS products.
Run3 Physics Results
Find here all the validation plots from MTV for Run3 ttbar.
Results are almost perfectly overlapping with the exception for the$d_{xy}$ resolution that is degradated (see e.g. here). We are investigating this and should have spotted the culprit.
Run3 Througput
Running a profiling workflow on Run3 data (Run 370293) on
fu-c2a02-37-02
we see a degradation in performance (around 20% in througput).Note that when running a single EDM stream CUDA and Alpaka throughput are the same.
20th October
With 66f48f9 fixed tests (thanks to @ericcano). For the moment commented the
testOneHistoContainer
tests since the issue is solved in #43064.RecoTracker/PixelTrackFitting/testEigenGPUNoFit_t
fails also in a cleanCMSSW_13_3_X_2023-10-18-1100
.