Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group mapping in rootless #13090

Open
quentin9696 opened this issue Jan 31, 2022 · 37 comments
Open

Group mapping in rootless #13090

quentin9696 opened this issue Jan 31, 2022 · 37 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@quentin9696
Copy link

/kind bug

Description

User group mapping are not keep when using --annotation run.oci.keep_original_groups=1

On the host:

$ id
uid=2001(test) gid=2001(test) groups=2001(test),1001(group1),2000(group2) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

When I run the container:

$ podman run -it --rm --userns=keep-id --annotation run.oci.keep_original_groups=1 docker.io/library/bash
bash-5.1$ id
uid=2001(test) gid=2001(test) groups=65534(nobody),65534(nobody),2001(test)

I'm not sure to understand why my group1 and group2 are mapped with nobody.

Steps to reproduce the issue:

  1. Create a user

  2. Create 2 groups and add it to the user

  3. run a container with userns keep-id and with annotation run.oci.keep_original_groups=1 and check what are your groups. They should be mapped as your host

Describe the results you received:

$ id
uid=2001(test) gid=2001(test) groups=65534(nobody),65534(nobody),2001(test)

Describe the results you expected:

$ id
uid=2001(test) gid=2001(test) groups=1001(group1),2000(group2),2001(test)

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:      3.4.4
API Version:  3.4.4
Go Version:   go1.16.8
Built:        Wed Dec  8 21:45:07 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.30-2.fc35.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: '
  cpus: 2
  distribution:
    distribution: fedora
    variant: coreos
    version: "35"
  eventLogger: file
  hostname: ip-10-124-2-41
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 2001
      size: 1
    - container_id: 1
      host_id: 493216
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 2001
      size: 1
    - container_id: 1
      host_id: 493216
      size: 65536
  kernel: 5.15.7-200.fc35.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 4070068224
  memTotal: 8241754112
  ociRuntime:
    name: crun
    package: crun-1.4-1.fc35.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4
      commit: 3daded072ef008ef0840e8eccb0b52a7efbd165d
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/2001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc35.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 1h 10m 56.9s (Approximately 0.04 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/mnt/home/test/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/tmp/podman/user/2001/containers
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 3
  runRoot: /run/user/2001
  volumePath: /var/tmp/podman/user/2001/containers/volumes
version:
  APIVersion: 3.4.4
  Built: 1638999907
  BuiltTime: Wed Dec  8 21:45:07 2021
  GitCommit: ""
  GoVersion: go1.16.8
  OsArch: linux/amd64
  Version: 3.4.4

Package info (e.g. output of rpm -q podman or apt list podman):

I use fedora coreOS aws AMI

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Run on AWS fedora coreos official image

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 31, 2022
@rhatdan
Copy link
Member

rhatdan commented Jan 31, 2022

Because those groupids are not mapped inside of the containers user namespace.
If you run podman top CID hgroups you will see the leaked GIDs into the container.

User Namespace maps all UIDs not mapped into the User Namespace as 65534(nobody)

@rhatdan rhatdan closed this as completed Jan 31, 2022
@quentin9696
Copy link
Author

Hi @rhatdan

I'm a bit confused. When I run podman top CID hgroups, I got

HGROUPS
558749,558749,2001

Do I need to create group1, group2 inside the container?

Thanks

@rhatdan
Copy link
Member

rhatdan commented Jan 31, 2022

No, did you run your container with --groups keep-groups

@rhatdan
Copy link
Member

rhatdan commented Jan 31, 2022

$ podman run -it --rm --userns=keep-id --annotation run.oci.keep_original_groups=1 docker.io/library/bash
bash-5.1$ id
uid=2001(test) gid=2001(test) groups=65534(nobody),65534(nobody),2001(test)

Now in a different terminal run podman top CID hgroups
And it should show all 5 groups.

@quentin9696
Copy link
Author

quentin9696 commented Jan 31, 2022

Yes, that's what I did:

$ podman run -it --rm --userns=keep-id --annotation run.oci.keep_original_groups=1 docker.io/library/bash
bash-5.1$ id
uid=2001(test) gid=2001(xxxxxxx) groups=65534(nobody),65534(nobody),2001(test)
$ podman ps
CONTAINER ID  IMAGE                          COMMAND     CREATED        STATUS            PORTS       NAMES
070992ab2903  docker.io/library/bash:latest  bash        5 seconds ago  Up 5 seconds ago              quirky_nas

$ podman top 070992ab2903 hgroups
HGROUPS
427677,427677,2001

@rhatdan
Copy link
Member

rhatdan commented Feb 1, 2022

That looks correct, although podman top might have a bug here, since it printed out the first leaked group twice. @vrothberg PTAL, it looks like we might have a bug in podman top.

@rhatdan
Copy link
Member

rhatdan commented Feb 1, 2022

@giuseppe PTAL I am not sure we are leaking groups in podman 4.0

@rhatdan rhatdan reopened this Feb 1, 2022
@rhatdan
Copy link
Member

rhatdan commented Feb 1, 2022

$ podman -v
podman version 4.0.0-dev
$ groups
dwalsh wheel users
$ podman run -d --group-add keep-groups  alpine top
16fe1fbbdd9ebc0c49760b54c62ef81e5ad480e694492d05223e6f43ccb84a34
$ podman top -l hgroups
HGROUPS
165533,165533,3267
$ podman top -l groups
GROUPS
nobody,nobody,root

@giuseppe
Copy link
Member

giuseppe commented Feb 1, 2022

it seems to work for me.

What groups do you have on the host?

Can you check grep ^Groups /proc/$CONTAINER_PID/status ?

@quentin9696
Copy link
Author

Just to make sure to understand well what's happen.

If I run podman in rootless, and add the --group-add keep-groups flag, I should have the same groups on the container and host. In my case, I should see my 2 other groups ids ?

@giuseppe
Copy link
Member

giuseppe commented Feb 1, 2022

The Linux kernel maps gids that are not part of the user namespace mapping to the overflow gid.

@vrothberg
Copy link
Member

Yes, an example would be:

~ $ groups; podman unshare groups
vrothberg wheel
root nobody

@quentin9696
Copy link
Author

What can be solutions to be able to also map gid that are not part of the user namespace ?

@quentin9696
Copy link
Author

In my case:

grep ^Groups /proc/2410/status
Groups:	1001 2000 2001 

@vrothberg
Copy link
Member

podman top requires to be inside podman's user NS in order to join the container's PID NS.

So I think we had to find a way to "leak" the host process' groups (e.g., export HOSTS_GROUPS=$(groups)) into podman's user namespace. @giuseppe WDYT?

@rhatdan
Copy link
Member

rhatdan commented Feb 1, 2022

Is the HOSTS_GROUPS available inside of the container, or just to podman top?

@vrothberg
Copy link
Member

vrothberg commented Feb 2, 2022

Is the HOSTS_GROUPS available inside of the container, or just to podman top?

It does not exist yet but I would leak it before re-execing into Podman's User NS. groups(1) would not be sufficient though since we'd need the ID and the name. I don't think we should leak it into the container for security reasons; any info about the host could theoretically be exploited.

@rhatdan
Copy link
Member

rhatdan commented Feb 2, 2022

Right, I thing you could set this in the user namespace by default then top could find it, I think the GIDs are all you need, since the user namespace still has access to the /etc/group on the host.

$ grep Group /proc/self/status
Groups:	10 100 3267 
$ podman unshare grep Group /proc/self/status
Groups:	65534 65534 0 

@giuseppe should we leak this always to the user namespace or only when running top, we could force this to happen in rootless.c?

@giuseppe
Copy link
Member

giuseppe commented Feb 2, 2022

would that work though?

We are injecting the groups of the current process, but we should read the /proc/$CONTAINER_PID/status file instead since in theory, they could be different (user added to a new group and runs newgrp).

@rhatdan
Copy link
Member

rhatdan commented Feb 2, 2022

Yes, this kind of sucks, Is there away to first look at the process out side of the user namespace and then enter the user namespace to continue into the pid namespace?

@giuseppe
Copy link
Member

giuseppe commented Feb 2, 2022

Yes, this kind of sucks, Is there away to first look at the process out side of the user namespace and then enter the user namespace to continue into the pid namespace?

I am still looking into it, if we can leak /proc somehow, but the IDs are always converted depending on the reader:

$ podman run --rm -v /proc:/proc-host --uidmap 0:1000:10000 alpine grep ^[UG]id /proc-host/1/status

The only way so far seems to do it in two steps, do not join directly the user namespace and read this information from the host, then re-exec a helper process to read everything else.

It looks like a corner case though, is it even worth to support in podman top? Could we just mark these IDs so that it is clear they are injected from the host?

@vrothberg
Copy link
Member

It looks like a corner case though, is it even worth to support in podman top?

I agree. It looks like a substantial massaging of the code for a corner case.

Could we just mark these IDs so that it is clear they are injected from the host?

Can you elaborate on what you mean by "marking"?

@giuseppe
Copy link
Member

giuseppe commented Feb 2, 2022

Just convert the overflow id to something clearer like "Not Mapped" or something people can understand more easily

@rhatdan
Copy link
Member

rhatdan commented Feb 2, 2022

Well that is the issue, everyone who has hit this errors is already complaining about seeing the

$ podman run --group-add=keep-groups alpine groups
root nobody nobody

Couldn't we just leak in a list of groups via environment variable on podman top, and then substiture the nobody for IDs on the list other then the primary group.
If there are no matches for nobody group then we just drop thinking that there is no leak.

@github-actions
Copy link

github-actions bot commented Mar 5, 2022

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Mar 7, 2022

@vrothberg @giuseppe Lets talk about this at Watercooler tomorrow.

@github-actions
Copy link

github-actions bot commented Apr 7, 2022

A friendly reminder that this issue had no activity for 30 days.

@github-actions
Copy link

github-actions bot commented May 8, 2022

A friendly reminder that this issue had no activity for 30 days.

@vrothberg
Copy link
Member

Couldn't we just leak in a list of groups via environment variable on podman top, and then substiture the nobody for IDs on the list other then the primary group.
If there are no matches for nobody group then we just drop thinking that there is no leak.

@rhatdan how would that env variable look like? Wouldn't we need to inject the entire mapping? That would make me nervous for security reasons.

@giuseppe
Copy link
Member

it could be the output of grep ^Groups /proc/self/status.

The problem I see is that this information may be different than what the container process is using. It is rarely changed, but if it happens then it is going to be difficult to find out what happened and why podman top returns the wrong information

@rhatdan
Copy link
Member

rhatdan commented Aug 26, 2022

Well Podman top returns the wrong information now.

The issue is we can not get the actual GIDs of the leaked FDs, If we just leaked the FDs in as the Current list and we found a matching list of NOBODYS we would be 99% sure that they are the leaked FDs.

@rhatdan
Copy link
Member

rhatdan commented Aug 26, 2022

Actually I think we would need to record the grep ^Groups /proc/self/status. into the container info, so we could record these were leaked. Then podman top could look this information up, when it sees multiple NOBODY groups in the /etc/group.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@zeehio
Copy link
Contributor

zeehio commented Apr 22, 2023

Update

Apparently what I want is called rootless id mounts and it is not supported yet in the kernel due to security concerns in the design.

My "solution" here is a proposal for (1) a permission system for rootless id mounts and (2) an idea of not only mapping "container uids to high uids at the host" (/etc/subgid) but also the opposite, mapping "low uids at the host to high uids inside the container". With both the permission system and the gid inversion (low->high & high-> low) rootless mapping of secondary groups should not be a problem.

However I guess the following applies:

If it was that easy it would have been done already.

Thanks anyway for reading. And apologies for probably wasting your time, I'm learning.

Context

When using rootless containers, for instance with podman, podman creates a user namespace following settings defined at /etc/subuid and /etc/subgid.

These settings allow to map users and groups in the user namespace (inside the container) to a reserved range (if done correctly the range is unique for each user) in the host/parent namespace.

This correspondence is used so we can create files with different user/group ownership inside the namespace that do not collide with any other user in the host namespace. Specifically the 0 UID and the 0 GID in the userspace are mapped to the default user id and the default group id, so it's easy for the user namespace processes to know how to make files owned by the parent user: just assign them to root inside the user namespace.

Problem

I do not know of an easy way to configure the opposite: I would like to map groups in the host to a reserved range inside the namespace (you have called this "group leaking"). For instance if I have an "engineering" group in my host system, e.g. with gid 1000, as system administrator, I would like for the default user namespaces in rootless podman to see mounted host files belonging to the "engineering" group (and ideally not other random files in the container) as belonging to the "engineering" group inside the user namespace as well.

Solution?

I believe it would make sense to have a /etc/revsubgid file specifying a list of groups that should leak into the user namespaces by default.

This list could be given in the following format:

<gid_host>:<uids_filter>:<gids_filter>
  • The first field is the group name or group id in the host that should be leaked into the namespace.

  • The second and third fields if empty it should mean "everyone". They are a comma separated list of user names or uids, and a comma separated list of group names or group IDs respectively. When creating a user namespace for a given user, only if the user is in those users or groups it would leak the gid_host.

For instance:

engineering::engineering

Would automatically map, for all users in the engineering group (as given by the last field) the engineering group (first field).

This would be convenient for rootless containers that are expected to access directories mounted as volumes owned by secondary groups.

podman (via crun) can now use --groups-add keep-groups to preserve group access. However (correct me otherwise) I understand the kernel maps those groups to overflow IDs. Seeing all those nobody is unintuitive to me.

Besides leaking the groups in the namespace, podman could additionally append the leaked groups into the container /etc/groups file, and modify the /etc/passwd file in the container adding the root user to the leaked groups, so the root user in the container would have transparent access to the leaked groups and the group names would appear with the same name as the host.

Final words

If that's already doable with some setting and I have missed it, I apologize.

I would appreciate your feedback. I am not sure if I can contribute to this, since this is far from my field of knowledge, but for sure I'd love to use this feature.

Thank you for your time reading this and your work in podman.

@codonell
Copy link

Adding my +1 here as an upstream glibc developer.

Developers are using distrobox and toolbox to develop glibc and one of the limitations they run into is that the glibc testsuite users secondary groups for testing the POSIX identity management APIs. Often we require just one additional supplementary group, and we need to be able to validly find the group via getgrouplist and then use fchown.

Having a straight forward way to map at least some host groups into the container would be useful.

We've worked around this today and mark a subset of tests as unsupported in the container configurations that lack the requisite configurations. This isn't new, there are some tests we can't run in containers at all (like tests which use namespace isolation themselves to test things).

@giuseppe
Copy link
Member

Developers are using distrobox and toolbox to develop glibc and one of the limitations they run into is that the glibc testsuite users secondary groups for testing the POSIX identity management APIs. Often we require just one additional supplementary group, and we need to be able to validly find the group via getgrouplist and then use fchown.

this won't work even if we solve the issue above. A group will show as overflow id inside the user namespace, the kernel controls that and we have no way to change this behavior. I think that for your use case, there is need to have a correct mapping for the groups, in a way that setgroups work fine inside the container without the keep-id workaround.

For a rootless user, you need to make sure these additional GIDs are added through /etc/subgid and then run podman system migrate to recreate the user namespace

@rhatdan
Copy link
Member

rhatdan commented Jun 12, 2023

It would be great if user groups could be added to the new user namespace via newgidmap, but I guess the risk DAC_OVERRIDE, might allow users to modify group files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

6 participants