Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load CUDAService from Services_cff, and only if gpu modifier is active #432

Merged

Conversation

makortel
Copy link

PR description:

Addresses cms-sw#28575.

PR validation:

Unit tests run, profiling workflow runs (without explicit load of CUDAService).

@AdrianoDee
Copy link

Validation summary

Reference release CMSSW_11_0_0_pre13 at 91be707
Development branch cms-patatrack/CMSSW_11_0_X_Patatrack at e41560b
Testing PRs:

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.501
  • tracking validation plots and summary for workflow 10824.502

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.501
  • tracking validation plots and summary for workflow 10824.502

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.501
  • tracking validation plots and summary for workflow 10824.502

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.885502.png
zoom-136.885502.png

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.501
  • development release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • testing release, workflow 10824.5
  • testing release, workflow 10824.501
  • testing release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.501
  • development release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • testing release, workflow 10824.5
  • testing release, workflow 10824.501
  • testing release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.501
  • development release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • testing release, workflow 10824.5
  • testing release, workflow 10824.501
  • testing release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/9a81de42a2a27335a8b70048031c655b2b7ea4b9/log .

@fwyzard
Copy link

fwyzard commented Jan 15, 2020

This breaks the .5 and .501 workflows.
It looks like the reason is that the BeamSpotToCUDA module is included in those, even though they do not enable running on the GPU.

@makortel
Copy link
Author

Thanks. So this

offlineBeamSpotTask = cms.Task(
offlineBeamSpot,
offlineBeamSpotCUDA
)

should be changed to something along

offlineBeamSpotTask = cms.Task(offlineBeamSpot)
_offlineBeamSpotTask_gpu = offlineBeamSpotTask.clone()
_offlineBeamSpotTask_gpu.add(offlineBeamSpotCUDA)
gpu.toReplaceWith(offlineBeamSpotTask, _offlineBeamSpotTask_gpu)

I'll include that in this PR once I finish updating #429.

@makortel
Copy link
Author

I'll include that in this PR once I finish updating #429.

Done.

@AdrianoDee
Copy link

AdrianoDee commented Jan 15, 2020 via email

@AdrianoDee
Copy link

Validation summary

Reference release CMSSW_11_0_0_pre13 at 91be707
Development branch cms-patatrack/CMSSW_11_0_X_Patatrack at e41560b
Testing PRs:

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.501
  • tracking validation plots and summary for workflow 10824.502

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.501
  • tracking validation plots and summary for workflow 10824.502

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.501
  • tracking validation plots and summary for workflow 10824.502

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.885502.png
zoom-136.885502.png

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.501
  • development release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • testing release, workflow 10824.5
  • testing release, workflow 10824.501
  • testing release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.501
  • development release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • testing release, workflow 10824.5
  • testing release, workflow 10824.501
  • testing release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.501
  • development release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • testing release, workflow 10824.5
  • testing release, workflow 10824.501
  • testing release, workflow 10824.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/6706d025121ec7ef85a53d727ab75afa038c34c2/log .

@makortel
Copy link
Author

Thanks, the .5 and .501 workflows work now, and there is no impact on the performance (as expected).

offlineBeamSpotCUDA
)
from Configuration.ProcessModifiers.gpu_cff import gpu
offlineBeamSpotCUDA = _beamSpotToCUDA.clone()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not suggesting to change this, just trying to understand: would it be equivalent to replace

from RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi import beamSpotToCUDA as _beamSpotToCUDA
offlineBeamSpotCUDA = _beamSpotToCUDA.clone()

with

from RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi import beamSpotToCUDA as offlineBeamSpotCUDA

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two are not equivalent. The first one creates a copy/clone of RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi.beamSpotToCUDA, so if the BeamSpot_cff.py would do something along

offlineBeamSpotCUDA.src = "foo"

that change does not propagate to other configurations making use of RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi.beamSpotToCUDA.

The second one uses the very same object as RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi.beamSpotToCUDA, and any changes to offlineBeamSpotCUDA do propagate to other configurations making use of RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi.beamSpotToCUDA, which could be perceived as unexpected.

In this specific case there is little practical difference, so the choice of cloning is more of a following the recommended general pattern (and also protects for the case that someone else would use the second approach and modify a parameter).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants