-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvidia-podman: contaminates PATH with a fake "docker" executable; breaks docker-compose #293857
Comments
My understanding is the nvidia "runtime wrappers" had been broken for a few months now. Note also that at least nvidia-docker is deprecated by nvidia. A few PRs were recently merged that updated and moved around a bunch of stuff (libnvidia-container, nvidia-container-toolkit, the integration for apptainer, etc). The runtime wrappers stayed pretty much as they were, i.e. broken to the best of my knowledge. They're also scheduled for removal. Recently #284507 was merged which offers a better alternative that works and is also recommended by the upstream. It's only available in nixos-unstable (but also the next release is in 2 months). Basically, I doubt it'd be cost efficient to start fixing the wrappers.
This is something generated by |
I dont know enough about the space to propose a solution. If the package is broken, maybe mark it as broken. NB: Thanks for all you do for nixpkgs. I've been looking into LLM + nvidia and your PRs keep coming up. |
docker
I'm not sure if what they do is intentional |
I am also not super familiar with this specific bit, but from looking at it, it seems a helper binary for setting up Docker --there are also other helpers like containerd, and crio-- support for the nvidia-container-toolkit. I believe we can get rid of all of them on NixOS, given they are performing changes at the system level, and in a NixOS system I'd expect all this to be handled by the declarative configuration. |
Note that |
I agree in splitting the outputs and keeping |
same issue here, unable to run nvidia-docker
|
In what version of nixpkgs/NixOS are you? What do you want to achieve by running nvidia-docker manually? At the nixpkgs tree we still have Docker 24, but when we can assume Docker 25 as the minimum version, we can get rid of all these hooks in benefit of the CDI implementation. |
nixos-unstable (24.11)
run gpu-powered container for deep learning related stuff
Docker version 24.0.9, build v24.0.9 |
In this case you can set the NixOS option Then you will be able to do |
Hi, First of all thank you all for the help. I tried the proposed solution, but It doesn't work for me even after including your suggested changes. I have this config:
When I try to run an nvidia docker with the following command:
If I have package nvidia-container-toolkit installed I get:
and If I don't have it installed I get:
The problem seems to be that when you install nvidia-container-toolkit, those additional binaries overwrite the docker command as the original post mentioned... |
Please note that it's Also, I recommend you to set Try to use the binary from Docker directly, not the wrapper that podman installs. In any case, we should not install a |
here is what i'm getting:
|
@sophronesis I can reproduce this issue if you don't set Please, let's keep this issue focused on the docker CLI contamination, and open a new issue if you face a different problem. |
Unfortunately no luck yet: Config:
With nvidia-container-toolkit:
Without nvidia-container-toolkit:
Not sure what can be the issue... |
@jl1990 I'm facing quite similar issue with quite similar config. What's your output of this?
|
OK, during 24 upgrade I removed something that's still needed for the kernel to load the modules:
|
The issue here is that you are calling the wrapper docker (the docker that is not docker). Your configuration is likely correct but you have to call the real docker CLI by using its full path on /nix until we fix this issue. |
Yes, I think this is what I posted in my last response right?
It doesn't work for me neither. |
@jl1990 also depending on your version of nixpkgs your error could be fixed by #305312 (comment) |
@jl1990 @ahirner Please use NixOS Discourse or Matrix for support and questions. |
I might be missing something, because the history is not completely clear to me, but I think it was never intended to replace the Docker CLI, but be installed as a runtime wrapper, configured in |
Result is quite large, I uploaded it to pastebin: https://pastebin.com/4fNeETeR
Thanks for the help, I still get the same result after including CDI with nixos unstable branch. edit: Running with sudo (and full nix store path worked)
I only tried to provide feedback about the solutions that were suggested here (as they didn't work in my case). This link for example shows that nvidia container toolkit should modify And this one shows the expected behaviour and how docker commands should be executed... If this binary overwrite was intentional, the binary that is replacing the docker binary should be able to process the same parameters, but it does not... it complains about not understanding the "run" parameter:
|
@jl1990 we are mixing different problems here, please open a new issue or as @SomeoneSerge mentioned use Discourse or Matrix. Thank you! |
Sounds wrong to me too, I've no idea what upstream's intentions were. The offending binary is packaged in @ereslibre we can try moving it to a different output or a deeper prefix; then we can see what references we break |
I have created #330197 to fix this shadowing. We can follow up on that PR, also please, give a heads up if you find out that anything is missing/wrong. |
Describe the bug
I've had errors such as
I thought it was this https://stackoverflow.com/questions/66514436/difference-between-docker-compose-and-docker-compose when I found about it but that was a red herring, nixpkgs spelunking showed we have been using the go extension for a while now.
Investigating further, I realized that it was because I had the nvidia-podman program installed, which contains a
docker
executable serge that is nothing like pkgs.docker. This is confusing and I suppose a mistake. Can we rename it to something else ?cc @SomeoneSerge
On nixos-unstable.
The text was updated successfully, but these errors were encountered: