Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup is not displayed as shared namespace in pod inspection #12765

Closed
hshiina opened this issue Jan 6, 2022 · 24 comments · Fixed by #12930
Closed

cgroup is not displayed as shared namespace in pod inspection #12765

hshiina opened this issue Jan 6, 2022 · 24 comments · Fixed by #12930
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@hshiina
Copy link
Contributor

hshiina commented Jan 6, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

podman help pod create displays cgroup as one of default values of --share flag, which specifies shared namespaces in a pod. However, cgroup is not listed in SharedNameSpaces of a pod with default values. Even if cgroup is explicitly specified in --share flag, cgroup is not listed in a result of podman pod inspect and actually containers in a pod don't share a cgroup namespace. Regarding other options for --share such as ipc and net, specified values to --share are displayed as SharedNameSpaces in podman pod inspect.

In the source code, there are two similar parameters for cgroups in a pod. When cgroup is specified in --share, PodConfig.UsePodCgroup is set to true. For SharedNameSpaces in pod inspection, PodConfig.UsePodCgroupNS is referred to. There are two resources shared in a pod regarding cgroup, a cgroup parent and a cgroup namespace:

  • When PodConfig.UsePodCgroup is true, a cgroup parent is shared in a pod, so that all containers in the pod have the same cgroup parent.
  • If PodConfig.UsePodCgroupNS is true, a cgroup namespace is shared in a pod, so that all containers in the pod join the same cgroup namespace though this flag is currently never set.

There are some options for the issue:

  • Just add a note to documentation.
  • Share a cgroup parent as the current behavior and fix to display cgroup in SharedNameSpaces in podman pod inspect based on PodConfig.UsePodCgroup. Another change is required if it is necessary to share a cgroup namespace, which is currently not shared.
  • Fix to share a cgroup namespace by setting PodConfig.UsePodCgroupNS. Another fix is required for sharing a cgroup parent.

Steps to reproduce the issue:

  1. Confirm the default shared namespaces in a pod:
$ podman pod create --help

Create a new empty pod

Description:
  After creating the pod, the pod ID is printed to stdout.
<snip>
     --share string                  A comma delimited list of kernel namespaces the pod will share (default "cgroup,ipc,net,uts")
<snip>
  1. Create a pod:
$ podman pod create --name testpod
  1. Inspect shared namespaces of the created pod:
$ podman pod inspect --format '{{.SharedNamespaces}}' testpod

Describe the results you received:

"cgroup" is not included in the result:

$ podman pod inspect --format '{{.SharedNamespaces}}' testpod
[net uts ipc]

Describe the results you expected:

"cgroup" is included in the result:

$ podman pod inspect --format '{{.SharedNamespaces}}' testpod
[cgroup net uts ipc]

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:      3.4.2
API Version:  3.4.2
Go Version:   go1.16.8
Built:        Fri Nov 12 15:25:37 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.30-2.fc35.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: '
  cpus: 8
  distribution:
    distribution: fedora
    variant: workstation
    version: "35"
  eventLogger: journald
  hostname: laptop
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.15.7-200.fc35.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 2182692864
  memTotal: 16295391232
  ociRuntime:
    name: crun
    package: crun-1.3-1.fc35.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.3
      commit: 8e5757a4e68590326dafe8a8b1b4a584b10a1370
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc35.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 8587571200
  swapTotal: 8589930496
  uptime: 13h 58m 30.75s (Approximately 0.54 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 0
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.7.1-2.fc35.x86_64
      Version: |-
        fusermount3 version: 3.10.5
        fuse-overlayfs: version 1.7.1
        FUSE library version 3.10.5
        using FUSE kernel interface version 7.31
  graphRoot: /home/user/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 12
  runRoot: /run/user/1000/containers
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.2
  Built: 1636748737
  BuiltTime: Fri Nov 12 15:25:37 2021
  GitCommit: ""
  GoVersion: go1.16.8
  OsArch: linux/amd64
  Version: 3.4.2

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.4.2-1.fc35.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 6, 2022
@mheon
Copy link
Member

mheon commented Jan 10, 2022

@cdoern Interested in this one?

@cdoern
Copy link
Contributor

cdoern commented Jan 10, 2022

sure @mheon I was dealing with the above PR which this issue seems to be related to.

@cdoern cdoern self-assigned this Jan 10, 2022
@cdoern
Copy link
Contributor

cdoern commented Jan 10, 2022

I think we need to decide what we want to happen when using --share @mheon @rhatdan personally I think the current usage is incorrect since --share is meant to put things in the same cgroup NS not assign them the same parent. I see where this is happening: https://github.com/containers/podman/blob/main/libpod/runtime_ctr.go#L354-L388 and can easily change this so that we aren't assigning the pod to be the parent, I am just unsure of what behavior we are expecting here.

@rhatdan
Copy link
Member

rhatdan commented Jan 10, 2022

The expected behaviour, as I understand it, is that all of the containers within the Pod should share the same cgroup namespace as the the Pod. Nothing about the parent cgroup namespace.

@vrothberg @giuseppe do you agree? @haircommander @umohnani8 Is this how Kubernetes works?

@giuseppe
Copy link
Member

Kubernetes doesn't use the cgroup namespaces on cgroup v1.

The cgroup namespace is used on cgroup v2 (and it defaults to cgroupns=private) unless a privileged pod/container is created, in this case it will use cgroupns=host.

Thinking more of it, I am not sure a cgroupns makes sense for a pod. I'd drop it from the default list.

@cdoern
Copy link
Contributor

cdoern commented Jan 10, 2022

so @giuseppe what behavior would you expect from --share=cgroup? would the pod (infra ctr) be assigned as the parent or would every container enter the same cgroupNS as the infra ctr? I think the second option makes more sense.

@giuseppe
Copy link
Member

I don't think we should set the same cgroup for different containers. if the --share=cgroup is explicit, then it does what the user intended to do and we share just the cgroupns. I am suggesting just to change the default to not include it

@cdoern
Copy link
Contributor

cdoern commented Jan 11, 2022

but currently all --share=cgroup does is set the pod as the cgroup parent. The other namespaces function differently like the PID namespace for example, when --share=pid is passed all containers in the pod default the the first container's (infras) pid namespace.

Either way we need to establish the difference between pod.config.UsePodCgroupsvs pod.config.UsePodCgroupNS why they both exist and when to use each one. because currently UsePodCgroupNS is dead code.

@rhatdan
Copy link
Member

rhatdan commented Jan 11, 2022

Do me --share=cgroup would mean that containers within the POD all share the same cgroups. --share without cgroup means they don't, IE Each container gets its own cgroup. Default should be --share=cgroup.

Lets talk about this as the Water Cooler today.

@mheon
Copy link
Member

mheon commented Jan 11, 2022

That doesn't sound correct. --share=cgroup should share the actual cgroup namespace. The shared pod cgroup underneath should be a separate matter entirely. I was 90% sure it had a separate CLI command to enable/disable it but I've looked at the manpages and I can't find said command, so I could be wrong?

@cdoern
Copy link
Contributor

cdoern commented Jan 11, 2022

I cannot attend watercooler today because of travel but let me know what you decide @rhatdan @mheon

I think as it currently stands we share the cgroup parent and not the actual cgroup NS. What does having the same cgroup parent mean for the containers in the pod as compared to just being in the same cgroup NS? I am not sure what the expected behavior is or if I am correct in my assumption here.

@giuseppe
Copy link
Member

I think as it currently stands we share the cgroup parent and not the actual cgroup NS. What does having the same cgroup parent mean for the containers in the pod as compared to just being in the same cgroup NS? I am not sure what the expected behavior is or if I am correct in my assumption here.

sharing the same parent means all the containers end up under the same cgroup, e.g.:

  • /sys/fs/cgroup/path/to/parent/container_1
  • /sys/fs/cgroup/path/to/parent/container_2
  • /sys/fs/cgroup/path/to/parent/container_3

not sharing the cgroup namespace, means each container sees something like:

$ podman run --rm fedora cat /proc/self/cgroup 
0::/

I am not sure sharing the cgroupns makes any sense. Why would a container sees its cgroup paths as resolved from another container context like the following?

$ podman run --rm --name foo -d fedora sleep 60
c557bd64c08d6b17abd2d308fa1198de7bce3a35b451b8406822efcf75df40de

$ podman run --cgroupns=container:foo fedora cat /proc/self/cgroup
0::/../../libpod-783359c4a2d53739de8f0b3e99e26510cefac04b7657af77744028801edd344f.scope/container

@cdoern
Copy link
Contributor

cdoern commented Jan 12, 2022

ok @giuseppe, I see some possible solutions here:

  1. we just amend the pod inspect code to show that cgroup is shared when the containers are using the pod as cgroup parent
  2. we add some other flag for if people want all containers in the pod to be in the same cgroupns rather than share a cgroup parent (I agree this does not make much sense but it might be useful for some...).

@mheon
Copy link
Member

mheon commented Jan 12, 2022

--share=cgroup needs to force sharing of the actual cgroup namespace; having it not do this is very confusing for all involved, I think. Changing the CLI might be a breaking change, but fortunately we have a few more weeks of that left.

Keeping the same defaults as current (cgroup namespace unshared, pod cgroup shared) seems reasonable, let's just refine the CLI experience so that --share only affects namespaces.

@rhatdan
Copy link
Member

rhatdan commented Jan 12, 2022

SGTM

@cdoern
Copy link
Contributor

cdoern commented Jan 13, 2022

so to be clear, this bit: https://github.com/containers/podman/blob/main/libpod/runtime_ctr.go#L354-L388 needs to be modified to utilize the pod.config.UsePodCgroupNS as well as pod.config.UsePodCgroup and pod.config.UsePodCgroup should continue to function how it does but pod.config.UsePodCgroupNS (currently dead code) puts the container in the same cgroupNS as infra?

@giuseppe
Copy link
Member

I don't think https://github.com/containers/podman/blob/main/libpod/runtime_ctr.go#L354-L388) should be changed. It sets the cgroup and it should not look at pod.config.UsePodCgroupNS at all.

Correct, pod.config.UsePodCgroupNS should put the container in the same cgroupNS as infra.

@cdoern
Copy link
Contributor

cdoern commented Jan 18, 2022

@mheon debating making --share=cgroupns share the NS and adding a new case to https://github.com/containers/podman/blob/main/pkg/specgen/generate/namespaces.go#L482-L507 such as case "cgroupns": does that or something like that sound good so we can keep the defaults? Otherwise the default will be overridden since I need to switch the case "cgroup" to sharing the actual cgroupNS

@mheon
Copy link
Member

mheon commented Jan 18, 2022

I don't feel like mixing namespaces and shared cgroups in the same flag is a good idea, but it is names --share not --shared-namespaces so it's not completely out of the question.

@cdoern
Copy link
Contributor

cdoern commented Jan 18, 2022

okay, so then should I take the current case cgroup: (setting the cgroup parent to the pod) out of that file and replace it with the actual cgroupNS? and I am assuming these are mutually exclusive so if we are not sharing the cgroupNS then we do the default? @mheon

@rhatdan
Copy link
Member

rhatdan commented Jan 18, 2022

I don't believe users would see the difference that we do. So I would go for --share cgroups.

@giuseppe
Copy link
Member

we already have cgroup-parent for run.

If we go the --share path, I think it should be --share=cgroup-parent.

As Matt pointed out though, I also find it confusing. Perhaps it should be podman run --cgroup-parent=pod-shared?

@cdoern
Copy link
Contributor

cdoern commented Jan 19, 2022

This has to be a pod create option not a container run option... right? Yesterday at watercooler we discussed that we are going to have --share=cgroup put all of the containers in infra's cgroupNS and we are making a new flag called something like --share-parent (bool) to determine if we want the pod to be the cgroup parent of all containers in the pod, this will default to true.

@mheon
Copy link
Member

mheon commented Jan 19, 2022

Yeah, pod create option.

cdoern added a commit to cdoern/podman that referenced this issue Feb 3, 2022
separated cgroupNS sharing from setting the pod as the cgroup parent,
made a new flag --share-parent which sets the pod as the cgroup parent for all
containers entering the pod

remove cgroup from the default kernel namespaces since we want the same default behavior as before which is just the cgroup parent.

resolves containers#12765

Signed-off-by: cdoern <[email protected]>
Signed-off-by: cdoern <[email protected]>
Signed-off-by: cdoern <[email protected]>
mheon pushed a commit to mheon/libpod that referenced this issue Feb 10, 2022
separated cgroupNS sharing from setting the pod as the cgroup parent,
made a new flag --share-parent which sets the pod as the cgroup parent for all
containers entering the pod

remove cgroup from the default kernel namespaces since we want the same default behavior as before which is just the cgroup parent.

resolves containers#12765

Signed-off-by: cdoern <[email protected]>
Signed-off-by: cdoern <[email protected]>
Signed-off-by: cdoern <[email protected]>
patrycja-guzik pushed a commit to patrycja-guzik/podman that referenced this issue Feb 15, 2022
separated cgroupNS sharing from setting the pod as the cgroup parent,
made a new flag --share-parent which sets the pod as the cgroup parent for all
containers entering the pod

remove cgroup from the default kernel namespaces since we want the same default behavior as before which is just the cgroup parent.

resolves containers#12765

Signed-off-by: cdoern <[email protected]>
Signed-off-by: cdoern <[email protected]>
Signed-off-by: cdoern <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants