-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvidia-container-toolkit: do not shadow docker executable #330197
nvidia-container-toolkit: do not shadow docker executable #330197
Conversation
At this time, the nvidia-container-toolkit derivation installs a docker executable that shadows the main one, and that is not thought to forward commands to the original docker command, causing issues to users when the `nvidia-container-toolkit` is in scope and they try to call to `docker`.
subPackages = [ | ||
"cmd/nvidia-ctk" | ||
"cmd/nvidia-container-runtime" | ||
"cmd/nvidia-container-runtime-hook" | ||
]; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm stupid, what does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvidia-container-toolkit
has a lot of binaries in its tree. Some of them are under https://github.com/NVIDIA/nvidia-container-toolkit/tree/a470818ba7d9166be282cd0039dd2fc9b0a34d73/cmd. Others are under https://github.com/NVIDIA/nvidia-container-toolkit/tree/a470818ba7d9166be282cd0039dd2fc9b0a34d73/tools/container.
The last one is the one causing the shadowing for docker, but in reality the main issue is that some of those despite being named docker
or crio
are not meant to shadow the original binaries, but just named like that to set up the corresponding runtime configuration.
I believe we don't need to expose that kind of tooling to the user, and that NixOS will properly configure the given runtime with regular NixOS options. This is an implementation detail in my opinion.
So with subPackages
what we are telling buildGoModule
is: instead of building all the binaries that you can find in this tree structure, just build cmd/nvidia-ctk
, cmd/nvidia-container-runtime
and cmd/nvidia-container-runtime-hook
.
To be honest, I would even just try with cmd/nvidia-ctk
and then start adding more binaries as we see are missing, but this was a way to start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the list of binaries produced by nix-build -A nvidia-docker
before this change:
❯ tree result/bin
result/bin
├── containerd -> /nix/store/9933i2f5wnj6f994isn242vdnbzdry62-container-toolkit-container-toolkit-1.15.0-rc.3/bin/containerd
├── crio -> /nix/store/9933i2f5wnj6f994isn242vdnbzdry62-container-toolkit-container-toolkit-1.15.0-rc.3/bin/crio
├── docker -> /nix/store/9933i2f5wnj6f994isn242vdnbzdry62-container-toolkit-container-toolkit-1.15.0-rc.3/bin/docker
├── nvidia-container-cli -> /nix/store/570j23jms28cjb603a7hy695nmibfi0n-libnvidia-container-1.9.0/bin/nvidia-container-cli
├── nvidia-container-runtime -> /nix/store/9933i2f5wnj6f994isn242vdnbzdry62-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-container-runtime
├── nvidia-container-runtime.cdi -> /nix/store/9933i2f5wnj6f994isn242vdnbzdry62-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-container-runtime.cdi
├── nvidia-container-runtime-hook -> /nix/store/9933i2f5wnj6f994isn242vdnbzdry62-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-container-runtime-hook
├── nvidia-container-runtime.legacy -> /nix/store/9933i2f5wnj6f994isn242vdnbzdry62-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-container-runtime.legacy
├── nvidia-container-toolkit -> /nix/store/9933i2f5wnj6f994isn242vdnbzdry62-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-container-toolkit
├── nvidia-ctk -> /nix/store/9933i2f5wnj6f994isn242vdnbzdry62-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk
├── nvidia-docker -> /nix/store/lc86iqnadxhc54n34n4g9rd94li39ls8-nvidia-docker-2.5.0/bin/nvidia-docker
├── nvidia-toolkit -> /nix/store/9933i2f5wnj6f994isn242vdnbzdry62-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-toolkit
└── toolkit -> /nix/store/9933i2f5wnj6f994isn242vdnbzdry62-container-toolkit-container-toolkit-1.15.0-rc.3/bin/toolkit
And after this change:
❯ tree result/bin
result/bin
├── nvidia-container-cli -> /nix/store/570j23jms28cjb603a7hy695nmibfi0n-libnvidia-container-1.9.0/bin/nvidia-container-cli
├── nvidia-container-runtime -> /nix/store/iw49hbrnnxg2pf1ynha35gn96mgpg4xx-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-container-runtime
├── nvidia-container-runtime-hook -> /nix/store/iw49hbrnnxg2pf1ynha35gn96mgpg4xx-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-container-runtime-hook
├── nvidia-container-toolkit -> /nix/store/iw49hbrnnxg2pf1ynha35gn96mgpg4xx-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-container-toolkit
├── nvidia-ctk -> /nix/store/iw49hbrnnxg2pf1ynha35gn96mgpg4xx-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk
└── nvidia-docker -> /nix/store/lc86iqnadxhc54n34n4g9rd94li39ls8-nvidia-docker-2.5.0/bin/nvidia-docker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So instead of moving runtime wrappers to a separate output or prefix, this removes the wrappers entirely. I've no idea if the wrappers still have any users, I think we should go with the moving instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, note that the wrapper is nvidia-docker
. The rest (containerd
, crio
, docker
) are just helper binaries that configure the runtime after they are named: tools that perform imperative changes on the host to set up the runtime wrapper, but not the wrappers themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking this PR as draft until I have double checked everything I want.
Although CDI should be used in order to not require container runtime wrappers anymore, fix the nvidia-container-runtime integration with Docker for cases when Docker < 25.
Ouch, deleted the branch by mistake. Opening another PR since I cannot reopen this one. |
Description of changes
At this time, the nvidia-container-toolkit derivation installs a docker executable that shadows the main one, and that is not thought to forward commands to the original docker command, causing issues to users when the
nvidia-container-toolkit
is in scope and they try to call todocker
.Fixes: #293857
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.