Skip to content
This repository was archived by the owner on Oct 27, 2023. It is now read-only.

Trying to get nvidia-container-runtime on other distribution to work #101

Closed
ich777 opened this issue May 21, 2020 · 12 comments
Closed

Trying to get nvidia-container-runtime on other distribution to work #101

ich777 opened this issue May 21, 2020 · 12 comments

Comments

@ich777
Copy link

ich777 commented May 21, 2020

I'm trying to get the nvidia-container-runtime running on Slackware but when i start up a container with the needed startup commands and runtime I can see nvidia-smi in /usr/bin/ from inside the container but it shows that the filesize is zero.
Also all library files show up with the filesize zero from inside the container.

usr-bin

Can somebody help?

What i've done so far:

  1. install the latest drivers from the runfile
  2. compiled the libraries for the container runtime
  3. created the /etc/docker/daemon.json file
  4. compiled the nvidia-container-toolkit
  5. compiled runc and renamed it to nvidia-container-runtime
  6. made a symlink from the toolkit to nvidia-container-runtime-hook
  7. compiled seccomp and made sure that it is enabled in the kernel

I'm a little lost right now and don't know what i've done possibly wrong.

@klueska
Copy link
Contributor

klueska commented May 22, 2020

I'm not sure why your file sizes would be 0, but I don't see you listing libnvidia-container as one of the libraries you compiled either. That is the library that does all of the heavy lifting of setting up GPU support for your containers. The nvidia-container-toolkit and nvidia-container-runtime are just thin wrappers that set things up so you can invoke libnvidia-container.

@klueska
Copy link
Contributor

klueska commented May 22, 2020

More info on how all of these components interrelate: NVIDIA/nvidia-docker#1268 (comment)

@ich777
Copy link
Author

ich777 commented May 22, 2020

@klueska Thank you for the quick response! ;)

Oh sorry forget to include that the order i did this is:

  1. installing the drivers
  2. compiling libnvidia-container (https://github.com/NVIDIA/libnvidia-container.git)
  3. created the /etc/docker/daemon.json file
  4. compile container-runtime (https://github.com/NVIDIA/nvidia-container-runtime.git)
  5. linking nvidia-container-toolkit to nvidia-container-runtime-hook
  6. compile runc (https://github.com/opencontainers/runc)
  7. copy the binary runc to nvidia-container-runtime
  8. compiled seccomp and made sure that it is enabled in the kernel

@klueska
Copy link
Contributor

klueska commented May 22, 2020

I'm not too familiar with slackware or why things might differ, but all of the files you mention are mounted into the container by this same line:

https://github.com/NVIDIA/libnvidia-container/blob/master/src/nvc_mount.c#L77

Is it possible that something more is required by slackware when performing the bind-mount than what we are doing on other platforms?

@ich777
Copy link
Author

ich777 commented May 22, 2020

I will look into that, eventually a permission issue.

Also someone said you have to apply a patch to runc for this to work (the tread is pretty old but the patch link doesn't work anymore):
#9 (comment)

@klueska
Copy link
Contributor

klueska commented May 22, 2020

Looking back at your steps, by the way, this one seems odd to me:

compiled runc and renamed it to nvidia-container-runtime

That should not be necessary. The nvidia-conbtainer-runtime is a small wrapper script that takes the runC spec as input, injects nvidia-container-toolkit as a runtime hook inside of it, and then execs into the native runC of your machine, passing it the modified runC spec.

I'm not sure what you would accomplish by renaming the native runC to nvidia-container-runtime, other than removing this injection step.

@ich777
Copy link
Author

ich777 commented May 22, 2020

I did this because of this post:
#9 (comment)

The last sentence says that you should rename the generated runc to nvidia-container-runtime

Btw @klueska thank you very much fo these speedy answers, i'm hoping to get everything to work.
Will try that after the weekend and report back.

@klueska
Copy link
Contributor

klueska commented May 22, 2020

That comment is old, and the stack has been rearchitected since then. This is the most up-to-date summary of the architecture: NVIDIA/nvidia-docker#1268 (comment)

@ich777
Copy link
Author

ich777 commented May 24, 2020

@klueska
Finally got it to work!
What I've did was the follwing:

  1. install the drivers
  2. downloaded libnvidia-container
  3. deleted this two lines otherwise i got a nasty ldconfig error:
  1. compiled 'libnvidia-container'
  2. created the 'daemon.json'
  3. compiled 'nvidia-container-toolkit'
  4. created a symlink from the 'nvidia-container-toolkit' to 'nvidia-container-runtime-hook'
  5. created the 'config.toml' and patched it to my config
  6. compiled 'nvidia-container-runtime'

I've got only one question left, is there a new/other way to compile the 'nvidia-container-toolkit' & 'nvidia-container-toolkit' from the sources since i have used it from branch v3.1.4 otherwise i don't found a way to compile it...

@klueska
Copy link
Contributor

klueska commented May 25, 2020

For nvidia-container-toolkit you can run make binary to build it natively on the newest 1.1.1 release.

For nvidia-container-runtime the ability to (easily) build natively from the top-level Makefile seems to have been removed (it is only buildable inside docker now).

That said, all we do in docker is:

  1. install all the prerequisite dependencies
  2. cd into the src directory
  3. run make build
  4. Package the resulting scripts / binaries up for the platform

You should be able to do this manually -- take a look at this for the steps:
https://github.com/NVIDIA/nvidia-container-runtime/blob/master/docker/Dockerfile.ubuntu

@ich777
Copy link
Author

ich777 commented May 26, 2020

@klueska thank you very much!
Everything is now working perfectly fine!
Huge shout out for your work and support, thank you so much!

Greetings from Austria ;)
Vielen Dank! =)

@ich777 ich777 closed this as completed May 26, 2020
@klueska
Copy link
Contributor

klueska commented May 26, 2020

Great to hear. Gern geschehen.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants