Update to `relval_gpu.py`for Alpaka and Pixel Triplets Alpaka Workflows #45694

AdrianoDee · 2024-08-13T14:25:23Z

PR description:

This PR proposes to:

add *.406, *.407, *.408 Alpaka PixelOnly triplets wf (standard, profiling, validation);
add *.486 triplet equivalent of .482 (full reco + pixel-only);
remove the CUDA wfs from relval_gpu.py;
add 2024 and D110 Alpaka wfs to relval_gpu.py (with D110 fixed after Alpaka Pixel: Reading layerStart at Runtime and Variable thePitch* #45421);
keeping 2023 wfs for the moment as needed by the bot (12434.402,12434.403,12434.412,12434.422,12434.423). Once this is merged Update GPU RelVals to 2024 wfs cms-bot#2310 can go in and those can be removed;
fixing DQM pixel GPUvsCPU for Phase2 workflows (needed to run Phase2 Alpaka wfs).

AdrianoDee · 2024-08-13T14:25:34Z

enable gpu

cmsbuild · 2024-08-13T14:25:52Z

cms-bot internal usage

AdrianoDee · 2024-08-13T14:26:42Z

@fwyzard let me know if this makes sense to you.

cmsbuild · 2024-08-13T14:27:40Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45694/41280

There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py modified in PR(s): Add run3 tracking low pu era #33532, CMSSW Integration of LST #45117, TICLv5 : Superclustering DNN #45333, Clean up Phase-2 Geometry D86, D88, D91, D92, D93, D94, D97 #45370, Allowing Jumping BPix2->FPix2 For Pixel Doublets (and Alpaka Pixel Triplets *.406 wf) #45478, Update Pixel GPU DQM online client #45666, Add Phase2 PbPb reco process modifier #45683
- File RecoTracker/Configuration/python/customizePixelTracksForTriplets.py modified in PR(s): Allowing Jumping BPix2->FPix2 For Pixel Doublets (and Alpaka Pixel Triplets *.406 wf) #45478

cmsbuild · 2024-08-13T14:28:05Z

A new Pull Request was created by @AdrianoDee for master.

It involves the following packages:

Configuration/PyReleaseValidation (upgrade, pdmv)
RecoTracker/Configuration (reconstruction)

@AdrianoDee, @cmsbuild, @jfernan2, @kskovpen, @mandrenguyen, @miquork, @srimanob, @subirsarkar, @sunilUIET can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @JanFSchulte, @Martin-Grunewald, @VinInn, @VourMa, @dgulhan, @fabiocos, @felicepantaleo, @gpetruc, @makortel, @missirol, @mmusich, @mtosi, @rovere, @slomeo this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

AdrianoDee · 2024-08-13T14:36:08Z

please test

cmsbuild · 2024-08-13T14:47:20Z

-1

Failed Tests: Build
Size: This PR adds an extra 36KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-853b3c/40898/summary.html
COMMIT: 2d7742a
CMSSW: CMSSW_14_1_X_2024-08-13-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45694/40898/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

SyntaxError: invalid syntax

gmake[1]: *** [config/SCRAM/GMake/Makefile.rules:2004: CompilePython] Error 1
gmake[1]: Target 'PostBuild' not remade because of errors.
gmake[1]: Leaving directory '/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-08-13-1100'
gmake: *** [config/SCRAM/GMake/Makefile.rules:1890: src] Error 2
gmake: Target 'all' not remade because of errors.
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2
+ eval scram build outputlog '&&' '(python3' /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cms-bot/buildLogAnalyzer.py --logDir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-08-13-1100/tmp/el8_amd64_gcc12/cache/log/src '||' 'true)'
++ scram build outputlog
>> Entering Package Configuration/PyReleaseValidation

cmsbuild · 2024-08-13T15:01:39Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45694/41282

There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py modified in PR(s): Add run3 tracking low pu era #33532, CMSSW Integration of LST #45117, TICLv5 : Superclustering DNN #45333, Clean up Phase-2 Geometry D86, D88, D91, D92, D93, D94, D97 #45370, Allowing Jumping BPix2->FPix2 For Pixel Doublets (and Alpaka Pixel Triplets *.406 wf) #45478, Update Pixel GPU DQM online client #45666, Add Phase2 PbPb reco process modifier #45683
- File RecoTracker/Configuration/python/customizePixelTracksForTriplets.py modified in PR(s): Allowing Jumping BPix2->FPix2 For Pixel Doublets (and Alpaka Pixel Triplets *.406 wf) #45478

fwyzard · 2024-09-03T09:07:34Z

+heterogeneous

AdrianoDee · 2024-09-03T09:13:07Z

@cms-sw/dqm-l2 @cms-sw/upgrade-l2 any comment on this?

tjavaid · 2024-09-03T09:45:59Z

+1

spurious differences seen from unstable WF 12834.7 (open issue spurious differences in outputs of Run-3 wfs *.7 #39803 )

AdrianoDee · 2024-09-04T13:23:05Z

@cms-sw/upgrade-l2 sorry to bother but this would be useful for a series of things (removing IB failures, monitoring Alpaka wfs and CUDA removal). Thanks ( :

srimanob · 2024-09-04T16:51:36Z

+Upgrade

cmsbuild · 2024-09-04T16:52:00Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

fwyzard · 2024-09-04T21:29:05Z

@fwyzard

Indeed the memory usage has a non-negligible jump (4-5%) when running:
* `14_0_14` GRun menu;

* in `14_0_14_MULTIARCHS`;

* on TTBar PU with 2024 conditions (`140X_mcRun3_2024_realistic_v14`);

* 4 concurrent jobs with 16 threads, 16 streams.
My proposal here would be to define anyway the triplets wfs. Then I'm not sure if we want to have or not Phase2 PU triplet wf in the relval_gpu.py matrix given it will show up as a failure in the IBs. For the moment I've removed it. I've left an (hopefully) explicative comment in ZVertexDefinitions.h.

@AdrianoDee coming back to this: given the very visible impact on the memory, I guess we should resurrect #43952 even before figuring out a less cumbersome interface.

I'll try to rebase it to the latest master.

fwyzard · 2024-09-05T12:33:01Z

I'll try to rebase it to the latest master.

#45887 is a work-in-progress reimplementation, that also makes the ZVertexSoA collections run-time sized.

AdrianoDee · 2024-09-06T08:00:03Z

@cms-sw/orp-l2 gentle ping ( :

mandrenguyen · 2024-09-06T08:08:57Z

+1

srimanob · 2024-09-16T17:34:31Z

Configuration/PyReleaseValidation/README.md

@@ -33,11 +33,15 @@ The offsets currently in use are:
 * 0.402: Alpaka, pixel only quadruplets, portable
 * 0.403: Alpaka, pixel only quadruplets, portable vs. CPU validation
 * 0.404: Alpaka, pixel only quadruplets, portable profiling
+* 0.406: Alpaka, pixel only triplets, portable
+* 0.407: Alpaka, pixel only triplets, portable vs. CPU validation
+* 0.407: Alpaka, pixel only triplets, portable profiling


AdrianoDee marked this pull request as ready for review August 13, 2024 14:25

cmsbuild added this to the CMSSW_14_1_X milestone Aug 13, 2024

cmsbuild added reconstruction-pending pending-signatures tests-pending orp-pending pdmv-pending upgrade-pending code-checks-pending tracking labels Aug 13, 2024

cmsbuild added code-checks-approved and removed code-checks-pending labels Aug 13, 2024

cmsbuild added tests-started and removed tests-pending labels Aug 13, 2024

cmsbuild added tests-rejected and removed tests-started labels Aug 13, 2024

AdrianoDee force-pushed the relval_gpu_alpaka branch from 2d7742a to 4f4ed59 Compare August 13, 2024 14:58

cmsbuild added tests-pending code-checks-pending and removed tests-rejected code-checks-approved labels Aug 13, 2024

cmsbuild removed the code-checks-pending label Aug 13, 2024

AdrianoDee mentioned this pull request Aug 30, 2024

[14.1.X] Update Pixel GPU DQM online client #45806

Merged

cmsbuild added heterogeneous-approved and removed heterogeneous-pending labels Sep 3, 2024

cmsbuild added dqm-approved and removed dqm-pending labels Sep 3, 2024

cmsbuild added fully-signed upgrade-approved and removed pending-signatures upgrade-pending labels Sep 4, 2024

This was referenced Sep 5, 2024

Split ZVertexSoA in a run-time sized PortableMultiCollection #45887

Merged

Remove legacy CUDA modules for pixel track and vertex reconstruction #45853

Draft

cmsbuild added orp-approved and removed orp-pending labels Sep 6, 2024

cmsbuild merged commit 51e9ab7 into cms-sw:master Sep 6, 2024
15 checks passed

AdrianoDee deleted the relval_gpu_alpaka branch September 6, 2024 10:50

cmsbuild mentioned this pull request Sep 7, 2024

Tier0Handler: add maxTime, improve cert warning #45943

Merged

This was referenced Sep 9, 2024

[14_1_X] Update to relval_gpu.pyfor Alpaka and Pixel Triplets Alpaka Workflows #45958

Merged

[14_0_X] Update to relval_gpu.pyfor Alpaka and Pixel Triplets Alpaka Workflows #45959

Merged

makortel mentioned this pull request Sep 11, 2024

Assertion failure in PixelCPEFast<TrackerTraits>::fillParamsForGpu #45332

Open

srimanob reviewed Sep 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to `relval_gpu.py`for Alpaka and Pixel Triplets Alpaka Workflows #45694

Update to `relval_gpu.py`for Alpaka and Pixel Triplets Alpaka Workflows #45694

AdrianoDee commented Aug 13, 2024 •

edited

Loading

AdrianoDee commented Aug 13, 2024

cmsbuild commented Aug 13, 2024 •

edited

Loading

AdrianoDee commented Aug 13, 2024

cmsbuild commented Aug 13, 2024

cmsbuild commented Aug 13, 2024

AdrianoDee commented Aug 13, 2024

cmsbuild commented Aug 13, 2024

cmsbuild commented Aug 13, 2024

fwyzard commented Sep 3, 2024

AdrianoDee commented Sep 3, 2024

tjavaid commented Sep 3, 2024

AdrianoDee commented Sep 4, 2024

srimanob commented Sep 4, 2024

cmsbuild commented Sep 4, 2024

fwyzard commented Sep 4, 2024

fwyzard commented Sep 5, 2024

AdrianoDee commented Sep 6, 2024

mandrenguyen commented Sep 6, 2024

srimanob Sep 16, 2024

Update to relval_gpu.pyfor Alpaka and Pixel Triplets Alpaka Workflows #45694

Update to relval_gpu.pyfor Alpaka and Pixel Triplets Alpaka Workflows #45694

Conversation

AdrianoDee commented Aug 13, 2024 • edited Loading

PR description:

AdrianoDee commented Aug 13, 2024

cmsbuild commented Aug 13, 2024 • edited Loading

AdrianoDee commented Aug 13, 2024

cmsbuild commented Aug 13, 2024

cmsbuild commented Aug 13, 2024

AdrianoDee commented Aug 13, 2024

cmsbuild commented Aug 13, 2024

Build

cmsbuild commented Aug 13, 2024

fwyzard commented Sep 3, 2024

AdrianoDee commented Sep 3, 2024

tjavaid commented Sep 3, 2024

AdrianoDee commented Sep 4, 2024

srimanob commented Sep 4, 2024

cmsbuild commented Sep 4, 2024

fwyzard commented Sep 4, 2024

fwyzard commented Sep 5, 2024

AdrianoDee commented Sep 6, 2024

mandrenguyen commented Sep 6, 2024

srimanob Sep 16, 2024

Choose a reason for hiding this comment

Update to `relval_gpu.py`for Alpaka and Pixel Triplets Alpaka Workflows #45694

Update to `relval_gpu.py`for Alpaka and Pixel Triplets Alpaka Workflows #45694

AdrianoDee commented Aug 13, 2024 •

edited

Loading

cmsbuild commented Aug 13, 2024 •

edited

Loading