-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/cdi.dynamic.nvidia: expose driverLink #291828
Conversation
{ | ||
hostPath = addDriverRunpath.driverLink; | ||
containerPath = addDriverRunpath.driverLink; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only contains the symlinks, naturally. We need closureinfo
to enumerate all their targets and their dependencies. Right now they are mounted "accidentally" as ${nvidia-driver}/lib
{ hostPath = lib.getExe' nvidia-driver "nvidia-cuda-mps-control"; | ||
containerPath = "/usr/bin/nvidia-cuda-mps-control"; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only now realized that this isn't going to be preserved by any of the formatters we've got, but I wanted to keep the diff small(er)
Let me have a look later today. I can try to help with those FIXME comments on a separate PR. This has been my first NixOS module contribution and I am not fully aware of all the details, but I can try to move that forward. |
I just tried this PR, I got the following error:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, after a reboot it works fine. I guess the driver did update and I had to reboot to activate it. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sanity check (using this image: https://gist.github.com/SomeoneSerge/eda63ed8b51b795ab732e678da7d0e11):
Loaded image: localhost/docker-pytorch:ijlfl22js4p2lqr8lm1bj2a1ynv381wn
~/Unsorted/docker-pytorch took 2m28s
❯ podman run --rm -it --device=nvidia.com/gpu=all docker-pytorch:ijlfl22js4p2lqr8lm1bj2a1ynv381wn python
Python 3.11.7 (main, Dec 4 2023, 18:10:11) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
False
>>>
❯ sudo nixos-rebuild switch
...
❯ podman run --rm -it --device=nvidia.com/gpu=all docker-pytorch:ijlfl22js4p2lqr8lm1bj2a1ynv381wn python
Python 3.11.7 (main, Dec 4 2023, 18:10:11) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❯ systemctl status nvidia-container-toolkit-cdi-generator.service
● nvidia-container-toolkit-cdi-generator.service - Container Device Interface (CDI) for Nvidia generator
Loaded: loaded (/etc/systemd/system/nvidia-container-toolkit-cdi-generator.service; enabled; preset: enabled)
Active: active (exited) since Mon 2024-03-04 14:09:46 UTC; 1h 40min ago
Main PID: 3304726 (code=exited, status=0/SUCCESS)
CPU: 39ms
Mar 04 14:09:46 cs-338 systemd[1]: Starting Container Device Interface (CDI) for Nvidia generator...
Mar 04 14:09:46 cs-338 nvidia-cdi-generator[3304742]: time="2024-03-04T14:09:46Z" level=info msg="Auto-detected mode as \"nvml\""
Mar 04 14:09:46 cs-338 nvidia-cdi-generator[3304742]: time="2024-03-04T14:09:46Z" level=error msg="failed to generate CDI spec: failed to create device CDI specs: failed to initialize NVML: ERROR_LIB_RM_VERSION_MISMATCH"
Mar 04 14:09:46 cs-338 systemd[1]: Finished Container Device Interface (CDI) for Nvidia generator.
Doesn't correctly report the exit status (should have reported as failing)
Description of changes
A quick follow-up to #284507. I'd hate to merge something with that many FIXME comments, but I'm not going to have the time to implement these in a while yet
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.