Pilot PR for the GPU attributes in workflow injected by runTheMatrix #33057

srimanob · 2021-03-03T16:53:35Z

PR description:

Backport of #33538
(But this is original PR with discussions on how the workflow will look like)

This PR is to converge on how the workflow with GPU attributes will look like. This follows https://docs.google.com/document/d/150k_VBbja1EK9HlxhXs544T0uhenbad8dnMadyNKlpg/edit?usp=sharing
https://docs.google.com/document/d/1shJAEaPDIWF0S3odHm3SSMERhvlTozyTKP8cgfFcOto/edit?usp=sharing
and on WM side:
dmwm/WMCore#10388

Default attributes when GPU is required are:

'GPUParams': {'CUDACapabilities': ['7.5'],
                         'CUDADriverVersion': '',
                         'CUDARuntime': '11.2',
                         'CUDARuntimeVersion': '',
                         'GPUMemory': '8000',
                         'GPUName': ''},

PR validation:

Please ignore the workflow name I use, we can use anything we want, this is to test the output only.

runTheMatrix.py --what upgrade -l 11650.502 --RequiresGPU required --wm init

give me the following workflow (*). However, uploading is not success yet, as we need to update wmcore to accept new attributes.

(*)

Only viewing request 11650.502
{'AcquisitionEra': 'CMSSW_11_3_X_2021-04-26-2300',
 'CMSSWVersion': 'CMSSW_11_3_X_2021-04-26-2300',
 'Campaign': 'CMSSW_11_3_X_2021-04-26-2300',
 'ConfigCacheUrl': 'https://cmsweb.cern.ch/couchdb',
 'DQMConfigCacheID': 1041,
 'DQMUploadUrl': 'https://cmsweb.cern.ch/dqm/relval',
 'DbsUrl': 'https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader',
 'EnableHarvesting': 'True',
 'GlobalTag': u'113X_mcRun3_2021_realistic_v10',
 'Group': 'ppd',
 'Memory': 3000,
 'Multicore': 1,
 'PrepID': 'CMSSW_11_3_X_2021-04-26-2300__1619513550-ZMM_14',
 'ProcessingString': u'113X_mcRun3_2021_realistic_v10',
 'ProcessingVersion': 1,
 'RequestPriority': 500000,
 'RequestString': 'RVCMSSW_11_3_X_2021-04-26-2300ZMM_14',
 'RequestType': 'TaskChain',
 'Requestor': 'srimanob',
 'ScramArch': 'slc7_amd64_gcc900',
 'SizePerEvent': 1234,
 'SubRequestType': 'RelVal',
 'Task1': {'AcquisitionEra': 'CMSSW_11_3_X_2021-04-26-2300',
           'ConfigCacheID': 1043,
           'EventStreams': 0,
           'EventsPerJob': 100,
           'EventsPerLumi': 100,
           'GPUParams': None,
           'GlobalTag': u'113X_mcRun3_2021_realistic_v10',
           'KeepOutput': True,
           'Memory': 3000,
           'Multicore': 1,
           'PrimaryDataset': 'RelValZMM_14',
           'ProcessingString': u'113X_mcRun3_2021_realistic_v10',
           'RequestNumEvents': 18000,
           'RequiresGPU': None,
           'Seeding': 'AutomaticSeeding',
           'SplittingAlgo': 'EventBased',
           'TaskName': 'ZMM_14TeV_TuneCP5_2021_GenSim'},
 'Task2': {'AcquisitionEra': 'CMSSW_11_3_X_2021-04-26-2300',
           'ConfigCacheID': 1044,
           'EventStreams': 0,
           'GPUParams': None,
           'GlobalTag': u'113X_mcRun3_2021_realistic_v10',
           'InputFromOutputModule': u'FEVTDEBUGoutput',
           'InputTask': 'ZMM_14TeV_TuneCP5_2021_GenSim',
           'KeepOutput': True,
           'LumisPerJob': 10,
           'Memory': 3000,
           'Multicore': 1,
           'ProcessingString': u'113X_mcRun3_2021_realistic_v10',
           'RequiresGPU': None,
           'SplittingAlgo': 'LumiBased',
           'TaskName': 'Digi_2021'},
 'Task3': {'AcquisitionEra': 'CMSSW_11_3_X_2021-04-26-2300',
           'ConfigCacheID': 1042,
           'EventStreams': 0,
           'GPUParams': {'CUDACapabilities': ['7.5'],
                         'CUDADriverVersion': '',
                         'CUDARuntime': '11.2',
                         'CUDARuntimeVersion': '',
                         'GPUMemory': '8000',
                         'GPUName': ''},
           'GlobalTag': u'113X_mcRun3_2021_realistic_v10',
           'InputFromOutputModule': u'FEVTDEBUGHLToutput',
           'InputTask': 'Digi_2021',
           'KeepOutput': True,
           'LumisPerJob': 10,
           'Memory': 3000,
           'Multicore': 1,
           'ProcessingString': u'113X_mcRun3_2021_realistic_v10',
           'RequiresGPU': 'required',
           'SplittingAlgo': 'LumiBased',
           'TaskName': 'Reco_Patatrack_PixelOnlyGPU_2021'},
 'TaskChain': 3,
 'TimePerEvent': 10}

if this PR is a backport please specify the original PR and why you need to backport that PR:

This is a backport of #33538

cmsbuild · 2021-03-03T17:18:43Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33057/21363

This PR adds an extra 24KB to repository

cmsbuild · 2021-03-03T17:19:06Z

A new Pull Request was created by @srimanob (Phat Srimanobhas) for master.

It involves the following packages:

Configuration/PyReleaseValidation

@jordan-martins, @chayanit, @wajidalikhan, @kpedro88, @cmsbuild, @srimanob can you please review it and eventually sign? Thanks.
@makortel, @Martin-Grunewald, @fabiocos, @slomeo this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

srimanob · 2021-03-03T17:29:16Z

hold

cmsbuild · 2021-03-03T17:29:37Z

Pull request has been put on hold by @srimanob
They need to issue an unhold command to remove the hold state or L1 can unhold it for all

srimanob · 2021-03-04T02:24:14Z

Please test

cmsbuild · 2021-03-04T09:46:06Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bcae87/13260/summary.html
COMMIT: 613e547
CMSSW: CMSSW_11_3_X_2021-03-03-1500/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/33057/13260/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 7 differences found in the comparisons
DQMHistoTests: Total files compared: 37
DQMHistoTests: Total histograms compared: 2750983
DQMHistoTests: Total failures: 12
DQMHistoTests: Total nulls: 1
DQMHistoTests: Total successes: 2750948
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.004 KiB( 36 files compared)
DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
Checked 156 log files, 37 edm output root files, 37 DQM output files

fwyzard · 2021-03-04T10:55:30Z

Wouldn’t it be more flexible to have a generic option to pass requirements to WM ?

Something like

runTheMatrix.py --what upgrade -l 23234.0 -b 'HelloGPU' --label 'HelloGPU' --wm force --wmAttributes 'gpuClass = server, gpuRuntime = cuda, gpuRuntimeVersion >= 11.2, gpuDriverVersion >= 460.32.03, gpuMemory >= 8

or

runTheMatrix.py --what upgrade -l 23234.0 -b 'HelloGPU' --label 'HelloGPU' --wm force --wmAttributes 'gpuClass = server AND gpuRuntime = cuda AND gpuRuntimeVersion >= 11.2 AND gpuDriverVersion >= 460.32.03 AND gpuMemory >= 8

?

This would avoid having to hard-code the same list of attributes in WM and in the runTheMAtrix command line syntax, and would allow the latter to use any attributes known to WM.

If the syntax is supported in WM, it would also allow a request like

runTheMatrix.py --what upgrade -l 23234.0 -b 'HelloGPU' --label 'HelloGPU' --wm force --wmAttributes '(gpuRuntime = cuda) AND ((gpuRuntimeVersion >= 11.2) OR (gpuDriverVersion >= 450.80.02) OR (gpuClass == server and gpuDriverVersion > 418.40.04))'

which could hardly be specified using command line options.

davidlange6 · 2021-03-04T11:35:13Z

Can one not derive a number of these requirements from the software environment itself?

srimanob · 2021-03-04T11:45:04Z

I think if we run in the production mode w/o resource constrain, then it should derive from software environment. However, if we would like to run in a very specific resource, e.g. for validation purpose, we should have a way to specify it. Or we need to communicate and assign manually every time we would like something specific.

Regarding the software environment, I assume if the job lands on a machine with CPU+GPU, we should allow CPU-only workflow to run. Not sure if that can be controlled as cmsDriver is the same. Or we don't need this option.

davidlange6 · 2021-03-04T11:51:49Z

I specifically mean things like minimum cuda runtime version (eg, once a cuda device is required by the config options)

…

On Mar 4, 2021, at 12:45 PM, Phat Srimanobhas ***@***.***> wrote: I think if we run in the production mode w/o resource constrain, then it should derive from software environment. However, if we would like to run in a very specific resource, e.g. for validation purpose, we should have a way to specify it. Or we need to communicate and assign manually every time we would like something specific. Regarding the software environment, I assume if the job lands on a machine with CPU+GPU, we should allow CPU-only workflow to run. Not sure if that can be controlled as cmsDriver is the same. Or we don't need this option. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

fwyzard · 2021-03-04T12:17:44Z

Can one not derive a number of these requirements from the software environment itself?

Not currently (the CUDA scram tool does not have all these information), but we could add them there (for example) or somewhere else that is relevant.

fwyzard · 2021-03-04T12:44:08Z

But it's a valid point, and it made me think of something else: instead of trying to specify all possible combinations of CUDA runtime version, driver version, GPU type, etc. - can we make the "server" side advertise something like a "CUDA supported version" ?

This does assume that some of the information is CMS-specific, so it could be either advertised by the site, or interpreted by our middleware.

For example, let's say we have sites A, B and C, with this hardware and software:

site A: Tesla P100 cards, CUDA 9.2, drivers 396.26
site B: GeForce 2080 cards, CUDA 10.2, drivers 440.33.01
site C: Tesla V100 cards, CUDA 11.0, drivers 450.51.05

From what I understand of the CUDA compatibility guide (https://docs.nvidia.com/deploy/cuda-compatibility/):

site A supports CUDA 9.2 (and older) runtime out of the box; the drivers are too old to support the compatibility drivers, so that's all one can use; it could support all recent CUDA versions if it were updated to at least CUDA 10.1 and the 418.39 drivers;
site B supports CUDA 10.2 (and older) runtime; the gaming cards do not support the compatibility drivers, so that's all one can use;
site C supports CUDA 11.0 (and older) runtime out of the box, and newer (up to and including 11.2.x) via the compatibility drivers.

CMSSW 11.x bundles the current version of the CUDA runtime and compatibility drivers, so on a datacenter class GPU it should be able to run as long as the system drivers are >= 418.39 (but, according to the documentation, not on a GeForce card).

So those three sites would support

site	max CUDA version	with compatibility drivers
`A`	9.2	9.2
`B`	10.2	10.2
`C`	11.0	11.2

If the values that are used to match the jobs to the sites are CMS-specific, the easiest would for the advertisement to take into account that CMSSW does ship with compatibility drivers, and advertise the last column.

If the values that are used to match the jobs to the sites are generic to all experiments, then the sites should advertise only the second column; it could be the CMS middleware that takes into account the GPU type and drivers version and builds the information in the last column.

Either way, it probably makes more sense to put those information on the WM side rather than in runTheMatrix.py, and it the definition of every job.

What do people think ?

davidlange6 · 2021-03-04T13:49:02Z

On Mar 4, 2021, at 1:44 PM, Andrea Bocci ***@***.***> wrote: But it's a valid point, and it made me think of something else: instead of trying to specify all possible combinations of CUDA runtime version, driver version, GPU type, etc. - can we make the "server" side advertise something like a "CUDA supported version" ? This does assume that some of the information is CMS-specific, so it could be either advertised by the site, or interpreted by our middleware. For example, let's say we have sites A, B and C, with this hardware and software: • site A: Tesla P100 cards, CUDA 9.2, drivers 396.26 • site B: GeForce 2080 cards, CUDA 10.2, drivers 440.33.01 • site C: Tesla V100 cards, CUDA 11.0, drivers 450.51.05 From what I understand of the CUDA compatibility guide (https://docs.nvidia.com/deploy/cuda-compatibility/): • site A supports CUDA 9.2 (and older) runtime out of the box; the drivers are too old to support the compatibility drivers, so that's all one can use; it could support all recent CUDA versions if it were updated to at least CUDA 10.1 and the 418.39 drivers; • site B supports CUDA 10.2 (and older) runtime; the gaming cards do not support the compatibility drivers, so that's all one can use; • site C supports CUDA 11.0 (and older) runtime out of the box, and newer (up to and including 11.2.x) via the compatibility drivers. CMSSW 11.x bundles the current version of the CUDA runtime and compatibility drivers, so on a datacenter class GPU it should be able to run as long as the system drivers are >= 418.39 (but, according to the documentation, not on a GeForce card). So those three sites would support site max CUDA version with compatibility drivers A 9.2 9.2 B 10.2 10.2 C 11.0 11.2 If the values that are used to match the jobs to the sites are CMS-specific, the easiest would for the advertisement to take into account that CMSSW does ship with compatibility drivers, and advertise the last column. If the values that are used to match the jobs to the sites are generic to all experiments, then the sites should advertise only the second column; it could be the CMS middleware that takes into account the GPU type and drivers version and builds the information in the last column. Either way, it probably makes more sense to put those information on the WM side rather than in runTheMatrix.py, and it the definition of every job. What do people think ?

I agree. CMSSW / the job submitter is the one that knows what is actually used to build and any special requirements on GPU type. So as you suggest, a job could specify what CUDA version, that compatibility drivers are included, and job specific GPU type/memory requirements. I looked around a bit at attributes in use CERN batch currently has options like regexp("V100", TARGET.CUDADeviceName) TARGET.CUDACapability =?= 7.5 And via htcondor it looks like sites will advertise attributes including DriverVersion: RuntimeVersion: DeviceName: Capability: Etc (Which must get prepended by CUDA for NVIDIA GPUs to match with the htcondor attributes I found in the CERN docs) (https://research.cs.wisc.edu/htcondor/manual/v8.5.1/12_Appendix_A.html)

…

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

rappoccio · 2021-03-04T14:49:28Z

Hi, Folks,

(BTW this conversation is happening both in this email thread, and on the PR, and there are some people not in common, so this is getting complicated to track. I guess I will respond in both places.)

From the PPD side, it will be abundantly easier if we have sensible defaults (i.e. we set everything to "None") unless explicitly requested by expert users as is currently in Phat's PR. If we have MC request managers, etc, putting in extremely complicated workflow definitions it will be a recipe for disaster. Highly nontrivial "magic incantations" like this (*) are a guarantee it will be broken ;).

Can we find a solution such that there is some set of defaults coded somewhere? Maybe we can specify configurations that actually exist somewhere? Like

MyFavoriteSite:

 'GPUClass': 'server',
 'GPUDriverVersion': '460.32.03',
 'GPUMemory': '8',
 'GPURuntime': 'cuda',
 'GPURuntimeVersion': '11.2',

MyLeastFavoriteSite:

 'GPUClass': 'server',
 'GPUDriverVersion': '456.32.03',
 'GPUMemory': '8',
 'GPURuntime': 'vidia',
 'GPURuntimeVersion': '11.46',

Then we have a single option like --gpuConfig MyLeastFavoriteSite

etc?

Cheers,
Sal

(*)

 '(gpuRuntime = cuda) AND ((gpuRuntimeVersion >= 11.2) OR (gpuDriverVersion >= 450.80.02) OR (gpuClass == server and gpuDriverVersion > 418.40.04))'

davidlange6 · 2021-03-04T14:55:32Z

Best to avoid private threads on stuff that affects a bunch of people… But I think the proposals of this thread are in the spirit of users needing to specify what is specific to their workflow (which they know and others won’t)

…

On Mar 4, 2021, at 3:49 PM, rappoccio ***@***.***> wrote: Hi, Folks, (BTW this conversation is happening both in this email thread, and on the PR, and there are some people not in common, so this is getting complicated to track. I guess I will respond in both places.) From the PPD side, it will be abundantly easier if we have sensible defaults (i.e. we set everything to "None") unless explicitly requested by expert users as is currently in Phat's PR. If we have MC request managers, etc, putting in extremely complicated workflow definitions it will be a recipe for disaster. Highly nontrivial "magic incantations" like this (*) are a guarantee it will be broken ;). Can we find a solution such that there is some set of defaults coded somewhere? Maybe we can specify configurations that actually exist somewhere? Like MyFavoriteSite: 'GPUClass': 'server', 'GPUDriverVersion': '460.32.03', 'GPUMemory': '8', 'GPURuntime': 'cuda', 'GPURuntimeVersion': '11.2', MyLeastFavoriteSite: 'GPUClass': 'server', 'GPUDriverVersion': '456.32.03', 'GPUMemory': '8', 'GPURuntime': 'vidia', 'GPURuntimeVersion': '11.46', Then we have a single option like --gpuConfig MyLeastFavoriteSite etc? Cheers, Sal (*) '(gpuRuntime = cuda) AND ((gpuRuntimeVersion >= 11.2) OR (gpuDriverVersion >= 450.80.02) OR (gpuClass == server and gpuDriverVersion > 418.40.04))' — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

cmsbuild · 2021-04-16T12:41:16Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bcae87/14267/summary.html
COMMIT: 54954c4
CMSSW: CMSSW_11_3_X_2021-04-15-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/33057/14267/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 4 differences found in the comparisons
DQMHistoTests: Total files compared: 38
DQMHistoTests: Total histograms compared: 2864426
DQMHistoTests: Total failures: 7
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 2864397
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 37 files compared)
Checked 160 log files, 37 edm output root files, 38 DQM output files
TriggerResults: no differences found

cmsbuild · 2021-04-27T08:48:03Z

Pull request #33057 was updated. @jordan-martins, @chayanit, @wajidalikhan, @kpedro88, @cmsbuild, @srimanob can you please check and sign again.

cmsbuild · 2021-04-27T08:56:26Z

Pull request #33057 was updated. @jordan-martins, @chayanit, @wajidalikhan, @kpedro88, @cmsbuild, @srimanob can you please check and sign again.

srimanob · 2021-04-27T08:57:03Z

Please test

cmsbuild · 2021-04-27T12:55:58Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bcae87/14608/summary.html
COMMIT: bac14f9
CMSSW: CMSSW_11_3_X_2021-04-26-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/33057/14608/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 8 differences found in the comparisons
DQMHistoTests: Total files compared: 38
DQMHistoTests: Total histograms compared: 2877046
DQMHistoTests: Total failures: 12
DQMHistoTests: Total nulls: 1
DQMHistoTests: Total successes: 2877011
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -0.004 KiB( 37 files compared)
DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
Checked 160 log files, 37 edm output root files, 38 DQM output files
TriggerResults: no differences found

makortel · 2021-04-27T12:31:17Z

Configuration/PyReleaseValidation/scripts/runTheMatrix.py

+                      dest='CUDADriverVersion',
+                      default='')
+
+    parser.add_option('--CUDARuntimeVersion',


What's the difference between this and --CUDARuntime above?

CUDARuntimeVersion is the version of the runtime installed on the machine.
CUDARuntime matches with CUDACompatibleRuntimes of node. I follow what is described on dmwm/WMCore#10393. Should we change it to match with the node's CUDACompatibleRuntimes ? @amaltaro

I add the discussion in dmwm/WMCore#10388 (comment)

I see now the difference is explained pretty well in #33057 (comment). Maybe a clarification of the difference in the help would be sufficient.

makortel · 2021-04-27T13:01:25Z

Configuration/PyReleaseValidation/scripts/runTheMatrix.py

+    parser.add_option('--CUDACapabilities',
+                      help='to specify CUDA capabilities. Default = 7.5 (for RequiresGPU = required).',
+                      dest='CUDACapabilities',
+                      default='7.5')


Why default only to 7.5? I would think to default to all compute capabilities supported by the release. Which then raises two questions (that go somewhat beyond this PR though):

At this point we'd really need one source for the supported compute capabilities, because it is needed also in cudaIsEnabled. An environment variable in cuda-toolfile? (not really my favorite but would be easy)

How to deal with different SCRAM_ARCHs supporting different sets of CUDA compute capabilities? E.g. our ARM build does not seem to support Pascal (6.x) while x86 and PPC do (cuda-flags.file).

Maybe also mention in the help that the value can be comma-separated.

I will edit the help. However, to handle the default value, please suggest (or we can pick them from somewhere automatically).

The 7.5 might be good-enough to get started, I suppose it depends mostly on what kind of hardware we are going to run on in the very near future. For a longer-term solution I opened an issue #33542.

The current default should be 6.0,6.1,6.2,7.0,7.2,7.5.

makortel · 2021-04-27T13:02:44Z

Configuration/PyReleaseValidation/scripts/runTheMatrix.py

-                     help='Coma separated list of workflow to be shown or ran. Possible keys are also '+str(predefinedSet.keys())+'. and wild card like muon, or mc',
-                     dest='testList',
-                     default=None
+                      help='Coma separated list of workflow to be shown or ran. Possible keys are also '+str(predefinedSet.keys())+'. and wild card like muon, or mc',


While you're at it

Suggested change

help='Coma separated list of workflow to be shown or ran. Possible keys are also '+str(predefinedSet.keys())+'. and wild card like muon, or mc',

help='Comma separated list of workflow to be shown or ran. Possible keys are also '+str(predefinedSet.keys())+'. and wild card like muon, or mc',

Fixed, we have fews. We should not be in coma anymore :)

makortel · 2021-04-27T13:03:58Z

Configuration/PyReleaseValidation/scripts/runTheMatrix.py

+    parser.add_option('--CUDARuntime',
+                      help='to specify CUDA runtime. Default = 11.2 (for RequiresGPU= required).',
+                      dest='CUDARuntime',
+                      default='11.2')


Should this also default to whatever the release uses?

Yeah, that can be done. I just put the default one here to make sure that it will not be an empty field when GPU is required.

Something like

scram tool info cuda | sed -n -e's/^Version *: *\([[:digit:]]\+\.[[:digit:]]\+\)\.[[:digit:]]\+/\1/p'

Though maybe we should add an environment variable in cuda.spec for this ?

cmsbuild · 2021-04-27T13:57:52Z

Pull request #33057 was updated. @jordan-martins, @chayanit, @wajidalikhan, @kpedro88, @cmsbuild, @srimanob can you please check and sign again.

srimanob · 2021-07-13T03:18:56Z

Close and will remake in master after converging on the workflow.

cmsbuild added this to the CMSSW_11_3_X milestone Mar 3, 2021

cmsbuild added code-checks-pending orp-pending pdmv-pending pending-signatures tests-pending upgrade-pending labels Mar 3, 2021

srimanob mentioned this pull request Mar 3, 2021

Additional fields for GPU workflows cms-PdmV/RelVal#14

Closed

cmsbuild added code-checks-approved and removed code-checks-pending labels Mar 3, 2021

cmsbuild added the hold label Mar 3, 2021

cmsbuild added tests-started and removed tests-pending labels Mar 4, 2021

cmsbuild added tests-approved and removed tests-started labels Mar 4, 2021

cmsbuild added tests-started and removed tests-pending labels Apr 16, 2021

cmsbuild added tests-approved and removed tests-started labels Apr 16, 2021

Norraphat added 3 commits April 27, 2021 10:08

first commit to add attributes of GPU to the workflow

032d326

update GPU attributes and runTheMatrix command

8b13bf4

update RequiresGPU to string

13545c7

srimanob force-pushed the 113_pilotGPUrunTheMatrix branch from 54954c4 to 13545c7 Compare April 27, 2021 08:47

cmsbuild added tests-pending and removed tests-approved labels Apr 27, 2021

fix default attributes

bac14f9

cmsbuild added tests-started and removed tests-pending labels Apr 27, 2021

srimanob mentioned this pull request Apr 27, 2021

Add GPUParams to runTheMatrix workflows #33538

Closed

cmsbuild added tests-approved and removed tests-started labels Apr 27, 2021

makortel reviewed Apr 27, 2021

View reviewed changes

Edit help, Out of Comae

32ab469

cmsbuild added tests-pending and removed tests-approved labels Apr 27, 2021

makortel mentioned this pull request Apr 27, 2021

How to provide supported CUDA compute capabilities and runtime version where needed? #33542

Open

srimanob closed this Jul 13, 2021

srimanob mentioned this pull request Sep 14, 2021

Add GPU workflow to runTheMatrix #35263

Merged

	help='Coma separated list of workflow to be shown or ran. Possible keys are also '+str(predefinedSet.keys())+'. and wild card like muon, or mc',
	help='Comma separated list of workflow to be shown or ran. Possible keys are also '+str(predefinedSet.keys())+'. and wild card like muon, or mc',

Pilot PR for the GPU attributes in workflow injected by runTheMatrix #33057

Pilot PR for the GPU attributes in workflow injected by runTheMatrix #33057

Conversation

srimanob commented Mar 3, 2021 • edited Loading

PR description:

PR validation:

if this PR is a backport please specify the original PR and why you need to backport that PR:

cmsbuild commented Mar 3, 2021

cmsbuild commented Mar 3, 2021

srimanob commented Mar 3, 2021

cmsbuild commented Mar 3, 2021

srimanob commented Mar 4, 2021

cmsbuild commented Mar 4, 2021

Comparison Summary

fwyzard commented Mar 4, 2021

davidlange6 commented Mar 4, 2021

srimanob commented Mar 4, 2021

davidlange6 commented Mar 4, 2021 via email

fwyzard commented Mar 4, 2021

fwyzard commented Mar 4, 2021

davidlange6 commented Mar 4, 2021 via email

rappoccio commented Mar 4, 2021 • edited Loading

davidlange6 commented Mar 4, 2021 via email

cmsbuild commented Apr 16, 2021

Comparison Summary

cmsbuild commented Apr 27, 2021

cmsbuild commented Apr 27, 2021

srimanob commented Apr 27, 2021

cmsbuild commented Apr 27, 2021

Comparison Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmsbuild commented Apr 27, 2021

srimanob commented Jul 13, 2021

srimanob commented Mar 3, 2021 •

edited

Loading

rappoccio commented Mar 4, 2021 •

edited

Loading