-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to provide supported CUDA compute capabilities and runtime version where needed? #33542
Comments
assign core,heterogeneous,pdmv |
New categories assigned: heterogeneous,core,pdmv @Dr15Jones,@smuzaffar,@jordan-martins,@fwyzard,@chayanit,@wajidalikhan,@makortel,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
A new Issue was created by @makortel Matti Kortelainen. @Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
I can think of adding environment variables for these in the CUDA toolfile, but is that the best solution? |
one other way could be that cuda.spec generates a python script which one can import to get cuda and supported compute capabilities information |
(to repeat also here) one question for the How to have the correct CUDA compute capability list for ARM jobs? Would the job submission specify the target |
|
json is also possible but then we still need an environment variable to point to it. Can we convert |
I suppose it could be reimplemented in python by interpreting the output of |
Or we remove |
Note that the compute capabilities that we have in CMSDIST's Currently (on x86 and power) we build for |
I'm confused what you mean with using
|
Right, so we'd need to provide an explicit list of the compute capabilities where the software can run on (or that's what I understood #33057 would need).
In python program (be it a separate script or part of the configuration system) execute the program, parse the compute capabilities of the devices from its output, and compare that to the contents of the python fragment generated in (I think this is still heavily on "what we could do" side rather than "should do") |
Ah, I see, you mean to decide whether "CUDA is available" locally, independently from the job description/matching. |
Right, this was for |
OK. Then I'd suggest we should autodetect which GPUs are usable based on if we are actually able to use them: #33561 . |
I agree, #33561 provides much better way for |
If we merge cms-sw/cmsdist#6851 , then we will only need to give a "minimum compute capability". Maybe we can add that value to |
Currently the
cudaIsEnabled
uses a hardcoded condition for the CUDA compute capability check.#33057 adds a need to have the list of supported compute capabilities to
runTheMatrix
as well as the CUDA runtime version (two leading parts if I understood correctly, e.g.11.2
).At this point we should look into having a single source of this information that would scale to adding other pieces of information and also to other technologies beyond CUDA.
The text was updated successfully, but these errors were encountered: