-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In Google Batch, install GPU drivers for GPU VMs #5406
Conversation
Also clean up some old logic for container options when using GPUs. These are now automatically handled by Google Cloud. Fixes nextflow-io#5372. Signed-off-by: Siddhartha Bagaria <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchTaskHandler.groovy
Show resolved
Hide resolved
plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchMachineTypeSelector.groovy
Outdated
Show resolved
Hide resolved
final accel = task.config.getAccelerator() | ||
// add nvidia specific driver paths | ||
// see https://cloud.google.com/batch/docs/create-run-job#create-job-gpu | ||
if( accel && accel.type.toLowerCase().startsWith('nvidia-') ) { | ||
container | ||
.addVolumes('/var/lib/nvidia/lib64:/usr/local/nvidia/lib64') | ||
.addVolumes('/var/lib/nvidia/bin:/usr/local/nvidia/bin') | ||
} | ||
|
||
def containerOptions = task.config.getContainerOptions() ?: '' | ||
// accelerator requires privileged option | ||
// https://cloud.google.com/batch/docs/create-run-job#create-job-gpu | ||
if( task.config.getAccelerator() || fusionEnabled() ) { | ||
if( fusionEnabled() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this logic removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not seem to be needed anymore since Google does this for you automatically if installGpuDrivers
bit is set. It is not mentioned in the official Google documentation as well.
The only thing is that people might still need to add /usr/local/nvidia/bin
to their PATH and /usr/local/nvidia/lib64
to their LD_LIBRARY_PATH depending on their app (some apps automatically look there anyway), but it is not happening right now as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the library path should be handled by the container
Co-authored-by: Ben Sherman <[email protected]> Signed-off-by: Siddhartha Bagaria <[email protected]>
@siddharthab overall the changes make sense. I think we'll need to do some testing on our end to make sure the machine type selection still works with Fusion + SSD Can you confirm that it works for you with some GPU-enabled tasks? |
Signed-off-by: Siddhartha Bagaria <[email protected]>
Yes, I confirm that I tested the following configurations:
† Needed new logic in ‡ It looks like
|
@bentsherman The test failures seem unrelated to this change. Please let me know if you would like to see anything more in this PR. |
@bentsherman Any thoughts? |
@pditommaso Bringing it to your attention in case you missed this PR. It is now waiting for your final approval. |
5a93547
to
27345a6
Compare
plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchMachineTypeSelector.groovy
Show resolved
Hide resolved
Signed-off-by: Paolo Di Tommaso <[email protected]>
Testing against platform showcase pipelines |
ok, some tests fails because credentials are not added into foreign branches. let's merge anyway |
Also clean up some old logic for container options when using GPUs.
These are now automatically handled by Google Cloud.
Fixes #5372.
Signed-off-by: Siddhartha Bagaria [email protected]