Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] nvidia-smi wrapper script ignores ENABLE_NON_MIG_GPUS=1 on a heterogeneous multi-GPU machine #4635

Closed
gerashegalov opened this issue Jan 26, 2022 · 0 comments · Fixed by NVIDIA/spark-rapids-examples#95
Assignees
Labels
bug Something isn't working

Comments

@gerashegalov
Copy link
Collaborator

Describe the bug
When testing on a machine with multiple physical GPU devices where only a subset of GPUs is sliced into MIGs the wrapper script nvidia-smi-wrapper.sh fails to produce a heterogeneous output including GPU elements backed by MIG devices and non-MIG-enabled GPU elements.

Steps/Code to reproduce bug

MIG_AS_GPU_ENABLED=1 ENABLE_NON_MIG_GPUS=1 ./examples/MIG-Support/yarn-unpatched/scripts/nvidia-smi-wrapper.sh -q -x

Expected behavior
Should be able to produce a mix of physical and MIG devices when ENABLE_NON_MIG_GPUS=1

Environment details (please complete the following information)

  • Environment location: Standalone
  • Spark configuration settings related to the issue: N/A

Additional context
transparent support for MIG on YARN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants