Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cgroup v2 in runsc #3481

Closed
fvoznika opened this issue Aug 3, 2020 · 14 comments · Fixed by #6821
Closed

Support cgroup v2 in runsc #3481

fvoznika opened this issue Aug 3, 2020 · 14 comments · Fixed by #6821
Labels
area: compatibility Issue related to (Linux) kernel compatibility area: container runtime Issue related to docker, kubernetes, OCI runtime area: integration Issue related to third party integrations type: enhancement New feature or request

Comments

@fvoznika
Copy link
Member

fvoznika commented Aug 3, 2020

runcs uses cgroups V1 to set pod limits. Kubernetes is switching over to use cgroups V2, it's alpha in 1.19 and will possibly hit beta in 1.20.

Relevant links:
SIG-node cgroups KEP
containerd issue
runc issue

@fvoznika fvoznika added type: enhancement New feature or request area: container runtime Issue related to docker, kubernetes, OCI runtime labels Aug 3, 2020
@ianlewis ianlewis added area: integration Issue related to third party integrations area: compatibility Issue related to (Linux) kernel compatibility labels Aug 14, 2020
@majek
Copy link
Contributor

majek commented Oct 27, 2020

Hi, we are slowly thinking about cgroups v2, it would be nice to know if this is on the roadmap.

@fvoznika
Copy link
Member Author

This work is not staffed right now. We're planning to pick this up early next year.

@dqminh
Copy link
Contributor

dqminh commented Jan 19, 2021

@fvoznika has there been any progress on this issue ? I'm planning to spend some time to work on this if possible ( since we are planning to migrate to cgroupv2 very soon ), so wonder if we can wait or start a collaboration effort on this.

@fvoznika
Copy link
Member Author

No progress yet. It would be great if you could get started on it.

@dqminh
Copy link
Contributor

dqminh commented Jan 20, 2021

Looking at this, i think my plan roughly is:

  • Replace runsc/cgroup internal with existing cgroup handling mechanism used in other runtime ( such as runc's libcontainer library ), as it provides a relatively sufficent interface to switch v1/v2. Some notes here:

    • runc allows us to provide a map of cgroup paths to join which takes precedent over the provided cgroupsPath string, which is similar to owned cgroup concept in current runsc/crgroups
  • We will add fs first, but we should also add systemd cgroup support at least for cgroupv2. AFAIR systemd support is required if we ever want runsc to run rootless in an unified hierrachy. Not sure if there's any other restrictions that requires systemd-interop when running in a delegated cgroupv2 hierrachy.

  • Testing: do we have any integration test on CI ? For example, in containerd/runc we use nested virtualization to spin up a fedora vagrant host with cgroupv2 support ( both github workflow and travis CI supports this with some caveats ). It would be ideal if we have the same thing for gvisor.

@fvoznika
Copy link
Member Author

Thanks for spelling out your plan. We try to avoid adding dependencies as much as possible to have tight control over the code that is included in runsc. See Security principles for more details.

So instead of replacing runsc/cgroup, it could be extended to support cgroups v2. The exported functions in cgroup.Cgroup can move to an interface that has distinct implementations for v1 and v2. Something like this:

type Cgroup interface {
  Install(res *specs.LinuxResources) error
  Uninstall() error
  Join() (func(), error)
  CPUQuota() (float64, error)
  NumCPU() (int, error)
  MemoryLimit() (uint64, error)
}

Re: testing, that's a good question. We have cgroups integration test in root/cgroup_test.go. We can make sure that the images used to run this test has support for cgroups v2, otherwise nested virtualization is also an option.

@dqminh
Copy link
Contributor

dqminh commented Feb 1, 2021

@fvoznika some updates. First of all, it's working

[vagrant@localhost vagrant]$ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)
[vagrant@localhost vagrant]$ docker run --cpu-shares 4096 --memory 128m -it --runtime runsc hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

[vagrant@localhost vagrant]$ docker run --cpu-shares 4096 --memory 128m -it --runtime runsc debian bash
root@fdf96bc8e8dc:/#

I indeed abandoned the requirement for cgroupv1 changes to reuse libcontainer's cgroup interface, since that's quite complicated to do 1-1 feature set and still preserve backward compatibility. We use libcontainer's cgroup interface only for v2 and switch back and forth depends on the v2 detection. The current interface is:

type Cgroup interface {
  Install(name string, res *specs.LinuxResources) error
  Uninstall(name string,) error
  Join(name string,) (func(), error)
  CPUQuota(name string,) (float64, error)
  NumCPU(name string,) (int, error)
  MemoryLimit(name string,) (uint64, error)
}

type cgroupV2Manager struct {
	manager libcontainercgroups.Manager
}

I'm passing name in to reconstruct the libcontainercgroups.Manager object with each call if necessary. Then in the cgroup code we do

if libcontainercgroups.IsCgroup2UnifiedMode() {
  // do v2
} else {
  // do v1 
}

Now we are at the stage where we figure out how to pass most integration tests. I don't think the images will need any additional support, just that the integration tests will need to be adjusted because not all v1 values will be mapped to v2. Look like the CRI setup will need some changes too. I'm testing this inside a vagrant VM similar to how containerd/runc is doing this, so it can be mapped into CI that can support nested virtualizations.

@avagin
Copy link
Collaborator

avagin commented Jun 15, 2021

I've created the feature branch https://github.com/google/gvisor/tree/feature/cgroupv2.

Let's continue the cgroupv2 development there. Then when it will be ready, we will merge it to the master branch.

TODO list:

  • Run cgroup tests.
  • Remove external dependencies.
  • Bumping up containerd to 1.4 breaks compatibility with 1.3.

This list is based on @fvoznika comments for #5453 that have not been addressed.

@dqminh
Copy link
Contributor

dqminh commented Jul 28, 2021

Hi again ! Sorry for some inactive period, i was busy with some other projects.

@avagin https://github.com/google/gvisor/tree/feature/cgroupv2 is good, what's the development process here ? I think maybe we can split the patchset into 2 parts ( 1 is to bump dependencies and create cgroup interface for v1 and v2 cooperations ), and the second is to add v2 support.

Run cgroup tests.

The PR uses vagrant to setup a v2 environment. It would be great if someone with CI access can setup that up, either with vagrant, or with a build agent that runs cgroupv2. I don't have CI access so the feedback loop is terrible here.

Remove external dependencies.

I think ideally we want to have some shared libraries here that different cgroup consumers can use. Currently it's uses runc cgroupv2 implementation. But there's also desire to unify the cgroup implementation with containerd/cgroups ( see opencontainers/runc#3007 ). Is that acceptable ?

Bumping up containerd to 1.4 breaks compatibility with 1.3.

I will need to take a look at this again to see if we can still keep 1.3 compat (maybe possible but we have to reimplement a bunch of things iirc ). The simplest option is of course to bump required version of containerd to 1.4, is there any plan to do that ?

@avagin
Copy link
Collaborator

avagin commented Jul 29, 2021

Hi again ! Sorry for some inactive period, i was busy with some other projects.

@avagin https://github.com/google/gvisor/tree/feature/cgroupv2 is good, what's the development process here ?

All new PR-s about cgroupv2 should be created to this branch.

I think maybe we can split the patchset into 2 parts ( 1 is to bump dependencies and create cgroup interface for v1 and v2 cooperations ), and the second is to add v2 support.

It is up to you, but you need to remember that we want to avoid any new external dependencies without real reasons. We can consider to copy-paste some code from runc, I think the license allows us to do this.

Run cgroup tests.

The PR uses vagrant to setup a v2 environment. It would be great if someone with CI access can setup that up, either with vagrant, or with a build agent that runs cgroupv2. I don't have CI access so the feedback loop is terrible here.

I will help with this, but let's solve other todo-s first.

Remove external dependencies.

I think ideally we want to have some shared libraries here that different cgroup consumers can use. Currently it's uses runc cgroupv2 implementation. But there's also desire to unify the cgroup implementation with containerd/cgroups ( see opencontainers/runc#3007 ). Is that acceptable ?

It depends on a few things. The main idea is that we want to be able to review all code that we use. It means that a new library should have a limit number of new external dependencies and it has to be relatively small (does minimal things that we will not use).

@dqminh
Copy link
Contributor

dqminh commented Aug 25, 2021

I'm repackaging the patchset to make reviewing and testing simpler:

  1. Update containerd dependency to v1.4.9 #6485 to bump containerd dependencies to 1.4 without any changes. I think this still satisfies our requirements i.e. should work with containerd 1.3 runtime. This should reduce some code that we need for the shim.
  2. Add common Cgroup interface #6499 this ports the cgroup interface to use v1
  3. Next step would be to write cgroupv2 patch based on our past work, I'm rewriting the patch a little bit to remove external dependency in libcontainer. Once 2) is merged we can based off feature/cgroupv2 on that, and if @avagin can help add the test environment for cgroupv2 that would be great.

@dqminh
Copy link
Contributor

dqminh commented Nov 3, 2021

@fvoznika @avagin I have repackaged the work into 2 PRs. We don't need to bump any extra dependencies now.

#6499 adds the common cgroup interface for v1 and v2
#6821 adds the cgroupv2 implementation

We need a cgroupv2 environment to run the tests. Can you help with that ?

@avagin
Copy link
Collaborator

avagin commented Nov 3, 2021

We need a cgroupv2 environment to run the tests. Can you help with that ?

I will help with that. I am going to add cgroup2 workers in buildkite.

@avagin
Copy link
Collaborator

avagin commented Nov 22, 2021

#6884

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: compatibility Issue related to (Linux) kernel compatibility area: container runtime Issue related to docker, kubernetes, OCI runtime area: integration Issue related to third party integrations type: enhancement New feature or request
Projects
None yet
5 participants