Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Alpaka Harvesting for Phase2 #45963

Merged
merged 1 commit into from
Sep 11, 2024

Conversation

AdrianoDee
Copy link
Contributor

PR description:

In the latest IBs we have a couple of failures for Phase2 *.403 wfs due to the fact that the heterogenous harvesting step tries to run the RAW data harvester (serial and device). This PR proposes a small change to fix that.

PR validation:

runTheMatrix.py -w upgrade -l 29634.403 runs

@AdrianoDee
Copy link
Contributor Author

type bug-fix

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 9, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 9, 2024

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 9, 2024

A new Pull Request was created by @AdrianoDee for master.

It involves the following packages:

  • DQM/SiPixelHeterogeneous (dqm)

@antoniovagnerini, @cmsbuild, @nothingface0, @rvenditti, @syuvivida, @tjavaid can you please review it and eventually sign? Thanks.
@fioriNTU, @idebruyn, @jandrea, @mmusich, @threus this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@AdrianoDee
Copy link
Contributor Author

test parameters:

  • workflows_gpu =12834.403, 29834.403, 29634.496
  • relvals_opt_gpu = --what upgrade

@AdrianoDee AdrianoDee marked this pull request as ready for review September 9, 2024 16:14
@AdrianoDee
Copy link
Contributor Author

enable gpu

@AdrianoDee
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 9, 2024

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4a0d98/41416/summary.html
COMMIT: b9c2214
CMSSW: CMSSW_14_2_X_2024-09-09-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45963/41416/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

GPU Comparison Summary

There are some workflows for which there are errors in the baseline:
29834.403 step 4
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

@mmusich
Copy link
Contributor

mmusich commented Sep 10, 2024

urgent

  • fixes the IB failures

@mmusich
Copy link
Contributor

mmusich commented Sep 10, 2024

----- Begin Fatal Exception 09-Sep-2024 10:52:41 CEST-----------------------
An exception of category 'NoRecord' occurred while
   [0] Processing global end LuminosityBlock run: 1 luminosityBlock: 59
   [1] Calling method for module SiPixelPhase1Harvester/'siPixelPhase1RawDataHarvesterDevice'
Exception Message:
No "SiPixelFedCablingMapRcd" record found in the EventSetup.

 Please add an ESSource or ESProducer that delivers such a record.
----- End Fatal Exception -------------------------------------------------

wondering if the harvester should also be made fool-proof against missing input conditions (generally we do let harvesting run even upon missing inputs) @cms-sw/trk-dpg-l2 .

@AdrianoDee
Copy link
Contributor Author

@cms-sw/dqm-l2 gentle ping

@antoniovagnerini
Copy link

hi @AdrianoDee , we observe some small differences in the bin-by-bin GPU comparison for the WF 29834.403, eg. in the number of tracks, see for instance https://tinyurl.com/28wcaf43 , is this expected?

@AdrianoDee
Copy link
Contributor Author

Hi @antoniovagnerini thanks for the feedback. Yes, these are ascribable to the "usual" fluctuations we'd expect for this kind of GPU workflows.

@antoniovagnerini
Copy link

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @mandrenguyen, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants