Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU workflow to runTheMatrix #35263

Merged
merged 12 commits into from
Sep 21, 2021
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 37 additions & 1 deletion Configuration/PyReleaseValidation/python/MatrixInjector.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,26 @@ def __init__(self,opt,mode='init',options=''):
if(opt.batchName):
self.batchName = '__'+opt.batchName+'-'+self.batchTime

####################################
# Checking and setting up GPU attributes
####################################
# Mendatory
self.RequiresGPU='forbidden'
if opt.gpu: self.RequiresGPU=opt.gpu
if self.RequiresGPU not in ('forbidden','optional','required'):
print("'--gpu option' you provided are not 'forbidden', 'optional', 'required'. Now, set to forbidden.")
self.RequiresGPU = 'forbidden'
#if self.RequiresGPU == 'optional':
#print("Optional GPU is turned off for RelVals. Now, changing it to forbidden")
#self.RequiresGPU = 'forbidden'
self.GPUMemoryMB = opt.GPUMemoryMB
self.CUDACapabilities = opt.CUDACapabilities.split(',')
self.CUDARuntime = opt.CUDARuntime
# optional
self.GPUName = opt.GPUName
self.CUDADriverVersion = opt.CUDADriverVersion
self.CUDARuntimeVersion = opt.CUDARuntimeVersion

# WMagent url
if not self.wmagent:
# Overwrite with env variable
Expand Down Expand Up @@ -180,8 +200,18 @@ def __init__(self,opt,mode='init',options=''):
"nowmIO": {},
"Multicore" : opt.nThreads, # this is the per-taskchain Multicore; it's the default assigned to a task if it has no value specified
"EventStreams": self.numberOfStreams,
"KeepOutput" : False
"KeepOutput" : False,
"RequiresGPU" : None,
"GPUParams": None
}
self.defaultGPUParams={
"GPUMemoryMB": self.GPUMemoryMB,
"CUDACapabilities": self.CUDACapabilities,
"CUDARuntime": self.CUDARuntime
}
if self.GPUName: self.defaultGPUParams.update({"GPUName": self.GPUName})
if self.CUDADriverVersion: self.defaultGPUParams.update({"CUDADriverVersion": self.CUDADriverVersion})
if self.CUDARuntimeVersion: self.defaultGPUParams.update({"CUDARuntimeVersion": self.CUDARuntimeVersion})

self.chainDicts={}

Expand Down Expand Up @@ -408,6 +438,9 @@ def prepare(self, mReader, directories, mode='init'):
if setPrimaryDs:
chainDict['nowmTasklist'][-1]['PrimaryDataset']=setPrimaryDs
nextHasDSInput=None
if 'GPU' in step and self.RequiresGPU == 'required':
chainDict['nowmTasklist'][-1]['RequiresGPU'] = self.RequiresGPU
chainDict['nowmTasklist'][-1]['GPUParams']=json.dumps(self.defaultGPUParams)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this part.
Why not simply

Suggested change
if 'GPU' in step and self.RequiresGPU == 'required':
chainDict['nowmTasklist'][-1]['RequiresGPU'] = self.RequiresGPU
chainDict['nowmTasklist'][-1]['GPUParams']=json.dumps(self.defaultGPUParams)
if self.RequiresGPU != 'forbidden':
chainDict['nowmTasklist'][-1]['RequiresGPU'] = self.RequiresGPU
chainDict['nowmTasklist'][-1]['GPUParams']=json.dumps(self.defaultGPUParams)

(and similarly below) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why so nested in all the checks, instead of simply being done for all steps ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I code in this way following the discussion in dmwm/WMCore#10393 (comment), to be flexible in the task/step level. In case of taskchain, one can do GEN-SIM in non-GPU env, while HLT in GPU env, for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO that is not going to be maintainable - we cannot add "GPU" in the name of all the steps that we want to (potentially) run on a GPU-equipped node.

For example, soon enough the HLT step of any Run-3 workflow will be able to run on GPUs; so it could make sense to submit jobs with --gpu optional, but I doublt we want to rename everything adding GPU in its name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the proposed update to the default values and help messages.

I didn't check if it runs !

Thanks very much @fwyzard
I've implemented all suggestions, but leave checking on GPU on step name for now. We can open a discussion with @amaltaro

else:
#not first step and no inputDS
chainDict['nowmTasklist'].append(copy.deepcopy(self.defaultTask))
Expand All @@ -420,6 +453,9 @@ def prepare(self, mReader, directories, mode='init'):
chainDict['nowmTasklist'][-1]['LumisPerJob']=splitForThisWf
if step in wmsplit:
chainDict['nowmTasklist'][-1]['LumisPerJob']=wmsplit[step]
if 'GPU' in step and self.RequiresGPU == 'required':
chainDict['nowmTasklist'][-1]['RequiresGPU'] = self.RequiresGPU
chainDict['nowmTasklist'][-1]['GPUParams']=json.dumps(self.defaultGPUParams)

# change LumisPerJob for Hadronizer steps.
if 'Hadronizer' in step:
Expand Down
Loading