-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support bare-metal Kata GPU containers #1133
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
katexochen
reviewed
Jan 13, 2025
packages/by-name/kata/kata-runtime/0019-Revert-kata-agent-Add-CDI-support.patch
Outdated
Show resolved
Hide resolved
...ame/microsoft/cloud-hypervisor/0001-snp-fix-panic-when-rejecting-extended-guest-report.patch
Outdated
Show resolved
Hide resolved
msanft
force-pushed
the
msanft/gpu-image
branch
3 times, most recently
from
January 13, 2025 07:39
98169fc
to
df4e08e
Compare
msanft
force-pushed
the
msanft/gpu-image
branch
2 times, most recently
from
January 17, 2025 15:14
74a8665
to
93683c8
Compare
Enabling CDI support in the Kata runtime breaks the legacy mode setup that we're using as both are trying to facilitate the container with the GPU device and auxiliaries, so disable it for the time being. The long-term goal is to get the native CDI support working.
This is necessary for GPU-enabled containers, which may pull their images for a lot longer, given they sometimes include model weights, which are costly storage-wise.
msanft
force-pushed
the
msanft/gpu-image
branch
from
January 17, 2025 16:37
93683c8
to
56b29b3
Compare
burgerdev
approved these changes
Jan 22, 2025
This adds the necessary bits to facilitate GPU support in bare-metal Kata deployments to our NixOS image build.
This pulls in an upstream PR that fixes the output paths of libnvidia-container to not include dangling symlinks, which otherwise confuses nvidia-container-toolkit.
msanft
force-pushed
the
msanft/gpu-image
branch
from
January 22, 2025 09:34
c87ddf9
to
2771f21
Compare
katexochen
approved these changes
Jan 22, 2025
This adds a runtime class for the local just-based deployments as well as the release artifacts that corresponds to the GPU-enabled runtime for Contrast on bare-metal platforms.
This adds an E2E test for GPU use on Contrast. It currently runs on the GPU-enabled bare-metal SNP runner. The test currently only verifies that the GPU is available via nvidia-smi, which also verifies that driver and CUDA work correctly.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds support for running bare-metal Kata containers with GPUs on Contrast.
Please refer to the commit messages for reasoning about the individual code changes.
Requires a nixpkgs containing NixOS/nixpkgs#372320.
An E2E test based on this will follow.