Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crashing PixelOnlyGPU workflows (add missing CUDAServices) #31333

Merged
merged 1 commit into from
Sep 3, 2020

Conversation

silviodonato
Copy link
Contributor

PR description:

This PR fixes the crash of workflows 136.885502,136.888502,10824.502,10842.502,11634.502,11650.502 as reported in #31130 (comment) .
This fix loads HeterogeneousCore.CUDAServices.CUDAService_cfi in RecoVertex/BeamSpotProducer/python/BeamSpot_cff.py through the gpu Modifier.

PR validation:

runTheMatrix.py -l 136.885502 works

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

The code-checks are being triggered in jenkins.

@silviodonato
Copy link
Contributor Author

assign heterogeneous

@silviodonato
Copy link
Contributor Author

please test workflows 136.885502,136.888502,10824.502,10842.502,11634.502,11650.502

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31333/18093

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

New categories assigned: heterogeneous

@makortel,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

The tests are being triggered in jenkins.
Test Parameters:

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

A new Pull Request was created by @silviodonato (Silvio Donato) for master.

It involves the following packages:

RecoVertex/BeamSpotProducer

@perrotta, @makortel, @slava77, @christopheralanwest, @tocheng, @cmsbuild, @tlampen, @jpata, @fwyzard, @pohsun can you please review it and eventually sign? Thanks.
@makortel, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @ebrondol, @tocheng, @mmusich, @mtosi, @dgulhan this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

def _addCUDAServices(theProcess):
theProcess.load("HeterogeneousCore.CUDAServices.CUDAService_cfi")

modifyRecoVertexBeamSpotProducerBeamSpotAddCUDAService_ = gpu.makeProcessModifier( _addCUDAServices )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was sort of agreed in #28575 to place this in Configuration/StandardSequences/Services_cff. Unless we want to place this in every future cfi/cff that introduces a CUDA module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed a central place would be better.

By the way, how do we deal with this for the GCC 10 builds ?
The CUDAService does not build there (though maybe in principle it could), so there is no CUDAService_cfi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, how do we deal with this for the GCC 10 builds ?
The CUDAService does not build there (though maybe in principle it could), so there is no CUDAService_cfi.

It will indeed continue to fail in GCC 10 on workflows enabling gpu modifier (better than all workflows but not fully satisfactory). Once we agree with the immediate-term solution in #31261, the same pattern could be applied to CUDAService as well.

On the other hand, I'd think the gpu workflows would not make much sense on GCC 10 anyway, so should we look into disabling them in GCC 10 IBs? (in which case we could leave CUDAService untouched for now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better solution for the CUDAService could be to try and build it also on GCC 10.

With

diff --git a/HeterogeneousCore/CUDAServices/BuildFile.xml b/HeterogeneousCore/CUDAServices/BuildFile.xml
index e063a0dd4e7..945d272ba85 100644
--- a/HeterogeneousCore/CUDAServices/BuildFile.xml
+++ b/HeterogeneousCore/CUDAServices/BuildFile.xml
@@ -1,4 +1,4 @@
-<iftool name="cuda-gcc-support">
+<iftool name="cuda">
   <use name="FWCore/Framework"/>
   <use name="FWCore/ServiceRegistry"/>
   <use name="FWCore/ParameterSet"/>
diff --git a/HeterogeneousCore/CUDAServices/plugins/BuildFile.xml b/HeterogeneousCore/CUDAServices/plugins/BuildFile.xml
index 73a760fa117..0928247e400 100644
--- a/HeterogeneousCore/CUDAServices/plugins/BuildFile.xml
+++ b/HeterogeneousCore/CUDAServices/plugins/BuildFile.xml
@@ -1,4 +1,4 @@
-<iftool name="cuda-gcc-support">
+<iftool name="cuda">
   <use name="cuda"/>
   <use name="DataFormats/Common"/>
   <use name="DataFormats/Provenance"/>
@@ -11,6 +11,7 @@
   <use name="FWCore/ServiceRegistry"/>
   <use name="FWCore/Utilities"/>
   <use name="HeterogeneousCore/CUDAServices"/>
+
   <library file="*.cc" name="HeterogeneousCoreCUDAServicesPlugins">
     <flags EDM_PLUGIN="1"/>
   </library>
diff --git a/HeterogeneousCore/CUDAUtilities/BuildFile.xml b/HeterogeneousCore/CUDAUtilities/BuildFile.xml
index fc2752ff845..e7193e1ccfd 100644
--- a/HeterogeneousCore/CUDAUtilities/BuildFile.xml
+++ b/HeterogeneousCore/CUDAUtilities/BuildFile.xml
@@ -1,4 +1,4 @@
-<iftool name="cuda-gcc-support">
+<iftool name="cuda">
   <use name="cuda"/>
   <use name="eigen"/>
   <use name="FWCore/Utilities"/>

the CUDAService can be built and loaded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that <iftool name="cuda"> tells if we have the CUDA libraries, while <iftool name="cuda-gcc-support"> tells if we can compile the device code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #31334 .

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

+1
Tested at: c44a78a
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eeb108/9065/summary.html
CMSSW: CMSSW_11_2_X_2020-09-01-2300
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

Comparison job queued.

move modifier to Configuration/StandardSequences/python/Services_cff.py
@silviodonato
Copy link
Contributor Author

please test workflows 136.885502,136.888502,10824.502,10842.502,11634.502,11650.502

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

The tests are being triggered in jenkins.
Test Parameters:

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eeb108/9065/summary.html

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-eeb108/10824.502_TTbar_13+2018_Patatrack_PixelOnlyGPU+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-eeb108/10842.502_ZMM_13+2018_Patatrack_PixelOnlyGPU+ZMM_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-eeb108/11650.502_ZMM_14+2021_Patatrack_PixelOnlyGPU+ZMM_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-eeb108/136.885502_RunHLTPhy2018D+RunHLTPhy2018D+HLTDR2_2018+RECODR2_2018reHLT_Patatrack_PixelOnlyGPU+HARVEST2018_pixelTrackingOnly
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-eeb108/136.888502_RunJetHT2018D+RunJetHT2018D+HLTDR2_2018+RECODR2_2018reHLT_Patatrack_PixelOnlyGPU+HARVEST2018_pixelTrackingOnly

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2609667
  • DQMHistoTests: Total failures: 5
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2609639
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 34 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 149 log files, 22 edm output root files, 35 DQM output files

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

+1
Tested at: b282750
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eeb108/9068/summary.html
CMSSW: CMSSW_11_2_X_2020-09-01-2300
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eeb108/9068/summary.html

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-eeb108/10824.502_TTbar_13+2018_Patatrack_PixelOnlyGPU+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-eeb108/10842.502_ZMM_13+2018_Patatrack_PixelOnlyGPU+ZMM_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-eeb108/11650.502_ZMM_14+2021_Patatrack_PixelOnlyGPU+ZMM_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-eeb108/136.885502_RunHLTPhy2018D+RunHLTPhy2018D+HLTDR2_2018+RECODR2_2018reHLT_Patatrack_PixelOnlyGPU+HARVEST2018_pixelTrackingOnly
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-eeb108/136.888502_RunJetHT2018D+RunJetHT2018D+HLTDR2_2018+RECODR2_2018reHLT_Patatrack_PixelOnlyGPU+HARVEST2018_pixelTrackingOnly

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2609667
  • DQMHistoTests: Total failures: 5
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2609639
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 34 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 149 log files, 22 edm output root files, 35 DQM output files

@silviodonato
Copy link
Contributor Author

+operations

@fwyzard
Copy link
Contributor

fwyzard commented Sep 2, 2020

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2020

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented Sep 3, 2020

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants