Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bare-metal Kata GPU containers #1133

Merged
merged 6 commits into from
Jan 23, 2025
Merged

Support bare-metal Kata GPU containers #1133

merged 6 commits into from
Jan 23, 2025

Conversation

msanft
Copy link
Contributor

@msanft msanft commented Jan 9, 2025

This adds support for running bare-metal Kata containers with GPUs on Contrast.
Please refer to the commit messages for reasoning about the individual code changes.

Requires a nixpkgs containing NixOS/nixpkgs#372320.

An E2E test based on this will follow.

@msanft msanft added the no changelog PRs not listed in the release notes label Jan 9, 2025
@msanft msanft added this to the v1.4.0 milestone Jan 9, 2025
@msanft msanft requested a review from burgerdev January 9, 2025 15:28
@msanft msanft marked this pull request as ready for review January 10, 2025 10:44
@msanft msanft requested a review from katexochen as a code owner January 10, 2025 10:44
@msanft msanft force-pushed the msanft/gpu-image branch 3 times, most recently from 98169fc to df4e08e Compare January 13, 2025 07:39
@msanft msanft requested a review from katexochen January 13, 2025 10:46
@msanft msanft force-pushed the msanft/gpu-image branch 2 times, most recently from 74a8665 to 93683c8 Compare January 17, 2025 15:14
Enabling CDI support in the Kata runtime breaks the legacy mode setup that we're using as both are trying to facilitate the container with the GPU device and auxiliaries, so disable it for the time being. The long-term goal is to get the native CDI support working.
This is necessary for GPU-enabled containers, which may pull their images for a lot longer, given they sometimes include model weights, which are costly storage-wise.
packages/nixos/gpu.nix Show resolved Hide resolved
This adds the necessary bits to facilitate GPU support in bare-metal Kata deployments to our NixOS image build.
This pulls in an upstream PR that fixes the output paths of
libnvidia-container to not include dangling symlinks, which otherwise
confuses nvidia-container-toolkit.
This adds a runtime class for the local just-based deployments as well
as the release artifacts that corresponds to the GPU-enabled runtime for
Contrast on bare-metal platforms.
This adds an E2E test for GPU use on Contrast.
It currently runs on the GPU-enabled bare-metal SNP runner.

The test currently only verifies that the GPU is available via
nvidia-smi, which also verifies that driver and CUDA work correctly.
@msanft msanft merged commit a9a12ec into main Jan 23, 2025
13 checks passed
@msanft msanft deleted the msanft/gpu-image branch January 23, 2025 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no changelog PRs not listed in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants