Calling podman binary as rootless user from C++ running as a separate user? w/ --group-add keep-groups #10212

KCSesh · 2021-05-04T22:19:25Z

I am trying to understand what is required to properly call podman from a non-terminal environment, using: execve.

My understanding from this rootless container login doc is that the best way to use rootless container runtimes is to be logged in as the user on a terminal, using ssh or machinectl shell.

However, I would like to use podman from a C++ program, but, this program is not started as the user that I want podman to be called with.

So I am wondering if anyone can point me to what I need to set?

I am using fork() + execve:

Essentially the C program is running as root (for the time being)

But I want to spin up podman with a rootless user, user1:
user1 - uid == 1000
user1 - gid == 1001

Essentially I am doing:

const pid_t pid = fork();
 if (pid == 0) { // child process
        setresuid(1000,1000,1000);
        setresgid(1001,1001,1001);
        char *envp[] =
        {
          "HOME=/data/runtime/home/user1",
          "XDG_RUNTIME_DIR=/run/user/1000",
          "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin",
          "LOGNAME=user1",
          "USER=user1",
          "USERNAME=user1",
          0
        }
        
      execve("podman", **podman args** , envp);

}
 //other stuff

This does seem to correctly spin up containers, with the correct user. And in a terminal I can verify that I see the running container with that user.

The problem, comes from the following and is related to this: #10166 and NVIDIA/nvidia-container-runtime#85

Part of my podman arguments are using the --group-add keep-groups and additionally --hooks-dir=/data/hooks/ -e NVIDIA_VISIBLE_DEVICES=all.

These arguments helps map the video group that my user1 is apart of into the rootless container, which allows the nvidia hook to detect my GPUs.

When I start it from inside my C++ program it seems that the user which starts the C++ program also needs to be inside the video group for me to properly access the GPU.

If I remove user1 from the video group, my container starts but my GPU is not detected when I try and run cuda code.
If I remove root from the video group, my container does not start, because the nvidia hook that detects the GPU fails with the error:
Error: OCI runtime error: error executing hook 'usr/bin/nvidia-container-toolkit' (exit code: 1)
And from experience, I see this error when the video group is not correctly being used/mapped.

When both users, user1 and root, are in the video group, the container successfully starts and I have access to the GPU.

I am wondering if there is any path forward for me here?
Should I be using something other than: setresuid / setresgid?

Why do both users need to be in video group? Is it related to how hooks are invoked in podman?

The text was updated successfully, but these errors were encountered:

rhatdan · 2021-05-05T10:22:03Z

I have a feeling this has nothing to do with Podman, and something else is happening. I would just check to see if the process being execed out of the c++ program actually ends up with the video group when you execute it from the root account without the video group. All that podman is doing is allowing the container to inherit the groups of the process that executed it. If that process does not have the video group, then the container will not have the video group.

KCSesh · 2021-05-05T16:21:07Z

Right, this could be the incorrect place for this issue.
I figured I would start here to see if I was doing something incorrect from podmans perspective.

But, running your test is actually why I created this ticket.

ssh-user@ubuntu: sudo ./some-daemon
05-05 15:40:32.046 main daemon started
Execv: id $(whoami): 
uid=1005(user1) gid=1007(user1) groups=1007(user1),44(video)
Execv: podman with args:  --group-add keep-groups --hooks-dir=/data/hooks/ -e NVIDIA_VISIBLE_DEVICES=all
6527cad9e2bf313c57908debf5e92050a6b4f7bfd028acb9b1b912d090346c58

What this shows is me starting my daemon, which uses the same function to execv different processes.
Process 1 is the output of: id $(whoami) which shows uid=1005(user1) gid=1007(user1) groups=1007(user1),44(video)
Process 2 is the out podman, which shows: 6527cad9e2bf313c57908debf5e92050a6b4f7bfd028acb9b1b912d090346c58

This shows my podman container starts successfully with --group-add keep-groups --hooks-dir=/data/hooks/ -e NVIDIA_VISIBLE_DEVICES=all. I am able to access my GPU.

Now if all I change is this:

sudo gpasswd -d root video
exit
re-login with ssh

(remove root user from video group)

When I re-run the program:

ssh-user@ubuntu: sudo ./some-daemon
05-05 15:40:32.046 main daemon started
Execv: id $(whoami): 
uid=1005(user1) gid=1007(user1) groups=1007(user1),44(video)
Execv: podman with args:  --group-add keep-groups --hooks-dir=/data/hooks/ -e NVIDIA_VISIBLE_DEVICES=all
Error: OCI runtime error: error executing hook `/usr/bin/nvidia-container-runtime-hook` (exit code: 1)

Adding root user back into video group fixes this issue.

rhatdan · 2021-05-05T17:26:02Z

On the second one, could you just execute
podman run ... id

Without the hooks?

KCSesh · 2021-05-05T18:38:08Z

Yes definitely.
To do this, as you said I removed: the hooks
But I also removed the --detach flag I had added as well.
So that:

podman run ... id would output to terminal.

Lastly I removed the first execv process, so now the only thing that get started is podman.

ssh-user@ubuntu: sudo ./some-daemon
05-05 15:40:32.046 main daemon started
Execv: podman with args:  --group-add keep-groups 
uid=0(root) gid=0(root) groups=0(root),65534(nogroup)

This looks correct to me. It correctly adds the video group to the container. However, of course since the nvidia script did not run the GPU is not under /dev so I can't access it.
And thus far the nvidia script has been the only way I have been able to access the GPU.

As a sanity test, I added --detach back into it my command, and had it run a long lasting process.
Then I exec'ed into the container using podman exec and I verifed this way as well that:

root@3359484e34d6:/# id
uid=0(root) gid=0(root) groups=0(root),65534(nogroup)

video group looks to be added.

Side note: The user that is starting the container is not an ssh-able user, so to exec into it I used:
sudo HOME=/data/runtime/home/... XDG_RUNTIME_DIR=/run/user/... -u user1 ... exec ...

rhatdan · 2021-05-05T19:12:54Z

So it looks like Podman is working as expected.

KCSesh · 2021-05-05T19:13:18Z

This actually got me digging a bit more, and I was able to access my GPU without invoking the script at all.

By adding the proper --device mounts along with --group-add keep-groups this allowed my container to have access to the GPU.
This works with the root user not being in video.
Although, I would prefer to use the nvidia script.

So something seems to be happening in the hook portion of the execution.
I don't have the best understanding of hooks, and where they are getting executed.
I would like to know what user is executing the hook?

Is it possible that because it is a prestart hook the keep-group flags hasn't been set yet which means the hook fails because the hook doesn't have access to the video group?
This makes sense to me.

Update: Maybe this actually doesn't make sense because it works in a terminal fine. Which leads me to why I opened this ticket :)

rhatdan · 2021-05-05T19:23:08Z

@giuseppe Is it possible the crun/runc are dropping groups before executing the hooks?

KCSesh · 2021-05-05T20:04:56Z

So I setup a test, to test this:

{
  "version": "1.0.0",
  "hook": {
    "path": "/data/hooks/test.sh",
    "args": ["test.sh"],
    "env": []
  },
  "when": {
    "always": true
  },
  "stages": ["prestart"]
}

test.sh

#/bin/bash
id 2>&1 | tee /tmp/hook.txt

Ran my C++ program with sudo, and added the hook flags back, mounted /tmp to the container, and it wrote :

cat /tmp/hook.txt
uid=0(root) gid=0(root) groups=0(root),65534(nogroup)

So this is correct, the user that is executing the nvidia script should be aware of the video group.
But the hook doesn't work from C++, I guess I am unsure of where to go from here.

giuseppe · 2021-05-13T15:48:51Z

So this is correct, the user that is executing the nvidia script should be aware of the video group.

But the hook doesn't work from C++, I guess I am unsure of where to go from here.

does it work if you don't do any setresuid or setresgid?

Anyway, it seems there is nothing we can do from Podman. I am closing this issue, but feel free to comment more (or reopen if you disagree)

KCSesh mentioned this issue May 5, 2021

Running nvidia-container-runtime with podman is blowing up. NVIDIA/nvidia-container-runtime#85

Closed

giuseppe closed this as completed May 13, 2021

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling podman binary as rootless user from C++ running as a separate user? w/ --group-add keep-groups #10212

Calling podman binary as rootless user from C++ running as a separate user? w/ --group-add keep-groups #10212

KCSesh commented May 4, 2021 •

edited

Loading

rhatdan commented May 5, 2021

KCSesh commented May 5, 2021

rhatdan commented May 5, 2021

KCSesh commented May 5, 2021 •

edited

Loading

rhatdan commented May 5, 2021

KCSesh commented May 5, 2021 •

edited

Loading

rhatdan commented May 5, 2021

KCSesh commented May 5, 2021 •

edited

Loading

giuseppe commented May 13, 2021

Calling podman binary as rootless user from C++ running as a separate user? w/ --group-add keep-groups #10212

Calling podman binary as rootless user from C++ running as a separate user? w/ --group-add keep-groups #10212

Comments

KCSesh commented May 4, 2021 • edited Loading

rhatdan commented May 5, 2021

KCSesh commented May 5, 2021

rhatdan commented May 5, 2021

KCSesh commented May 5, 2021 • edited Loading

rhatdan commented May 5, 2021

KCSesh commented May 5, 2021 • edited Loading

rhatdan commented May 5, 2021

KCSesh commented May 5, 2021 • edited Loading

giuseppe commented May 13, 2021

KCSesh commented May 4, 2021 •

edited

Loading

KCSesh commented May 5, 2021 •

edited

Loading

KCSesh commented May 5, 2021 •

edited

Loading

KCSesh commented May 5, 2021 •

edited

Loading