Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to relval_gpu.pyfor Alpaka and Pixel Triplets Alpaka Workflows #45694

Merged
merged 4 commits into from
Sep 6, 2024

Conversation

AdrianoDee
Copy link
Contributor

@AdrianoDee AdrianoDee commented Aug 13, 2024

PR description:

This PR proposes to:

  • add *.406, *.407, *.408 Alpaka PixelOnly triplets wf (standard, profiling, validation);
  • add *.486 triplet equivalent of .482 (full reco + pixel-only);
  • remove the CUDA wfs from relval_gpu.py;
  • add 2024 and D110 Alpaka wfs to relval_gpu.py (with D110 fixed after Alpaka Pixel: Reading layerStart at Runtime and Variable thePitch*  #45421);
  • keeping 2023 wfs for the moment as needed by the bot (12434.402,12434.403,12434.412,12434.422,12434.423). Once this is merged Update GPU RelVals to 2024 wfs cms-bot#2310 can go in and those can be removed;
  • fixing DQM pixel GPUvsCPU for Phase2 workflows (needed to run Phase2 Alpaka wfs).

@AdrianoDee AdrianoDee marked this pull request as ready for review August 13, 2024 14:25
@AdrianoDee
Copy link
Contributor Author

enable gpu

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 13, 2024

cms-bot internal usage

@AdrianoDee
Copy link
Contributor Author

@fwyzard let me know if this makes sense to you.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @AdrianoDee for master.

It involves the following packages:

  • Configuration/PyReleaseValidation (upgrade, pdmv)
  • RecoTracker/Configuration (reconstruction)

@AdrianoDee, @cmsbuild, @jfernan2, @kskovpen, @mandrenguyen, @miquork, @srimanob, @subirsarkar, @sunilUIET can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @JanFSchulte, @Martin-Grunewald, @VinInn, @VourMa, @dgulhan, @fabiocos, @felicepantaleo, @gpetruc, @makortel, @missirol, @mmusich, @mtosi, @rovere, @slomeo this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@AdrianoDee
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Size: This PR adds an extra 36KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-853b3c/40898/summary.html
COMMIT: 2d7742a
CMSSW: CMSSW_14_1_X_2024-08-13-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45694/40898/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

SyntaxError: invalid syntax

gmake[1]: *** [config/SCRAM/GMake/Makefile.rules:2004: CompilePython] Error 1
gmake[1]: Target 'PostBuild' not remade because of errors.
gmake[1]: Leaving directory '/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-08-13-1100'
gmake: *** [config/SCRAM/GMake/Makefile.rules:1890: src] Error 2
gmake: Target 'all' not remade because of errors.
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2
+ eval scram build outputlog '&&' '(python3' /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cms-bot/buildLogAnalyzer.py --logDir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-08-13-1100/tmp/el8_amd64_gcc12/cache/log/src '||' 'true)'
++ scram build outputlog
>> Entering Package Configuration/PyReleaseValidation


@cmsbuild
Copy link
Contributor

@fwyzard
Copy link
Contributor

fwyzard commented Sep 3, 2024

+heterogeneous

@AdrianoDee
Copy link
Contributor Author

@cms-sw/dqm-l2 @cms-sw/upgrade-l2 any comment on this?

@tjavaid
Copy link

tjavaid commented Sep 3, 2024

+1

@AdrianoDee
Copy link
Contributor Author

@cms-sw/upgrade-l2 sorry to bother but this would be useful for a series of things (removing IB failures, monitoring Alpaka wfs and CUDA removal). Thanks ( :

@srimanob
Copy link
Contributor

srimanob commented Sep 4, 2024

+Upgrade

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 4, 2024

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@fwyzard
Copy link
Contributor

fwyzard commented Sep 4, 2024

@fwyzard

Indeed the memory usage has a non-negligible jump (4-5%) when running:

* `14_0_14` GRun menu;

* in `14_0_14_MULTIARCHS`;

* on TTBar PU with 2024 conditions (`140X_mcRun3_2024_realistic_v14`);

* 4 concurrent jobs with 16 threads, 16 streams.

My proposal here would be to define anyway the triplets wfs. Then I'm not sure if we want to have or not Phase2 PU triplet wf in the relval_gpu.py matrix given it will show up as a failure in the IBs. For the moment I've removed it. I've left an (hopefully) explicative comment in ZVertexDefinitions.h.

image

@AdrianoDee coming back to this: given the very visible impact on the memory, I guess we should resurrect #43952 even before figuring out a less cumbersome interface.

I'll try to rebase it to the latest master.

@fwyzard
Copy link
Contributor

fwyzard commented Sep 5, 2024

I'll try to rebase it to the latest master.

#45887 is a work-in-progress reimplementation, that also makes the ZVertexSoA collections run-time sized.

@AdrianoDee
Copy link
Contributor Author

@cms-sw/orp-l2 gentle ping ( :

@mandrenguyen
Copy link
Contributor

+1

@@ -33,11 +33,15 @@ The offsets currently in use are:
* 0.402: Alpaka, pixel only quadruplets, portable
* 0.403: Alpaka, pixel only quadruplets, portable vs. CPU validation
* 0.404: Alpaka, pixel only quadruplets, portable profiling
* 0.406: Alpaka, pixel only triplets, portable
* 0.407: Alpaka, pixel only triplets, portable vs. CPU validation
* 0.407: Alpaka, pixel only triplets, portable profiling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants