-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Rootless Docker and Rootless Podman, without patching Kubernetes #1935
Support Rootless Docker and Rootless Podman, without patching Kubernetes #1935
Conversation
Hi @AkihiroSuda. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
3304c25
to
46de4ac
Compare
Discussed in kubernetes-sigs#1935 (comment) Signed-off-by: Akihiro Suda <[email protected]>
Discussed in kubernetes-sigs#1935 (comment) Signed-off-by: Akihiro Suda <[email protected]>
Discussed in kubernetes-sigs#1935 (comment) Signed-off-by: Akihiro Suda <[email protected]>
Discussed in kubernetes-sigs#1935 (comment) Signed-off-by: Akihiro Suda <[email protected]>
Discussed in kubernetes-sigs#1935 (comment) Signed-off-by: Akihiro Suda <[email protected]>
Discussed in kubernetes-sigs#1935 (comment) Signed-off-by: Akihiro Suda <[email protected]>
Discussed in kubernetes-sigs#1935 (comment) Signed-off-by: Akihiro Suda <[email protected]>
Now this PR needs to wait for cgroup v2 fix: #2013 (EDIT: merged) |
46de4ac
to
20063ac
Compare
Rebased. Tested with Kubernetes v1.20.2. |
runc update upstream should be picked up in #2057 |
20063ac
to
e4ab738
Compare
Rebased. Tested with Kubernetes v1.20.4. |
Podman CI is failing (#2085), unrelated to this PR. |
d45dd3e
to
99077d5
Compare
Updated PR to remove prerequisite of "net.netfilter.nf_conntrack_max" , by setting I confirmed this version works with Ubuntu 20.04 (kernel 5.4.0-66-generic, Docker 20.10.5). Image: |
f4eef8b
to
fd99e3c
Compare
Great, it works for me now, nice, I never used rootless before, it has some networking edges, i.e. you can't access containers from the host ... I think this approach is better because we reduce bash and is easier to maintain in the long term, however, we need @BenTheElder to check the new changes introduced in the provider interface |
Updated again
Image: |
Tested with vanilla Kubernetes v1.20.4 Signed-off-by: Akihiro Suda <[email protected]>
11c96b0
to
85d51d8
Compare
Rebased and squashed commits. Image: |
/test pull-kind-e2e-kubernetes |
/test pull-kind-e2e-kubernetes-1-19 |
/test pull-kind-e2e-kubernetes-1-20 |
identifier: "rootless" | ||
weight: 3 | ||
--- | ||
Starting with kind 0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/) and [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) can be used as the node provider of kind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Starting with kind 0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/) and [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) can be used as the node provider of kind. | |
Starting with kind v0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/) and [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) can be used as the node provider of kind. |
(we should try to be consistent about this)
(non-blocking nit, can easily handle this sort of thing in a follow-up)
@@ -281,3 +283,33 @@ func (p *provider) CollectLogs(dir string, nodes []nodes.Node) error { | |||
errs = append(errs, errors.AggregateConcurrent(fns)) | |||
return errors.NewAggregate(errs) | |||
} | |||
|
|||
// Info returns the provider info. | |||
func (p *provider) Info() (*providers.ProviderInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I love this API, but the point of node providers being internal is precisely so we can iterate on stuff like this in isolated implementation packages (still exporting) without worrying about users depending on it, versus the public cluster provider. (comment directed at @aojea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but that makes a big assumption about providers compatibility, and we have right now 2 providers that try to be completely compatible, I can't see how this will go with kata or windows native, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can pretty trivially rework this method later though, this PR has already seen a lot of back and forth
# If /proc/self/uid_map 4294967295 mappings, we are in the initial user namespace, i.e. the host. | ||
# Otherwise we are in a non-initial user namespace. | ||
# https://github.com/opencontainers/runc/blob/v1.0.0-rc92/libcontainer/system/linux.go#L109-L118 | ||
userns="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that we have to detect rootless provider on the host anyhow, is there a reason to do this inside the container?
do we expect this to vary within rootless?
or should we just start passing a KIND_ROOTLESS_NODE_PROVIDER=true ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking userns here is beneficial for potential support of dockerd --userns-remap
and LXD driver.
thank you for keeping after this. |
@AkihiroSuda: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AkihiroSuda, BenTheElder The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Discussed in kubernetes-sigs#1935 (comment) Signed-off-by: Akihiro Suda <[email protected]>
This PR adds support for running kind with Rootless Docker provider.
Requires Docker 20.10 / Podman 3.0, with cgroup v2.
Unlike the previous PR (#1727), this version works without patching Kubernetes (1.20.4).
However, this version has dirty hacks such as faking sysctl keys by bind-mounting regular files under
/proc/sys
.So I still want the Kubernetes PR to be merged: kubernetes/kubernetes#92863
Restrictions
The restrictions of Rootless Docker apply to kind clusters as well.
e.g.
To workaround the OverlayFS issue, we could use fuse-overlayfs on kernel >= 4.18: https://github.com/AkihiroSuda/containerd-fuse-overlayfs
However, to decrease complexity of PR, the support for fuse-overlayfs is not included in this PR, and will be introduced in a separate PR after this PR gets merged.
How this PR works
When the entrypoint script detect that it is running inside user namespace (i.e. rootless), it modifies
/etc/containerd/config.toml
to:restrict_oom_score_adj
to true to adjust oomScoreAdj valueThe entrypoint script also does:
/proc/sys
to make them writable. Workaround until kubelet & kube-proxy: ignore sysctl errors and rlimit errors when running in UserNS (for rootless) kubernetes/kubernetes#92863 gets merged.kubeProxyConfiguration.conntrack.maxPerCore
to 0, for avoiding an error during setting sysctl valuenet.netfilter.nf_conntrack_max
How to test
Step 1: Prepare host
Install Ubuntu 20.10 host.
Add
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1"
to/etc/default/grub
.Create
/etc/systemd/system/[email protected]/delegate.conf
with the following content:Run
sudo update-grub
and reboot.Install Docker 20.10 or Podman 3.0.
docker info
withDOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock
, and make sure it shows "rootless" as a Security Option:Step 2: Prepare the node image
$ (cd $GOPATH/src/k8s.io/kubernetes && git checkout v1.20.4)
The image is registered as
kindest/node:latest
in Rootless Docker's image store.NOTE:
--type=bazel
is required for runningkind build node-image
with Rootless Docker.Step 3: Start kind with Rootless Docker
kind create cluster --image kindest/node:latest
ps auxw
on the hosts, and make sure the kind processes are running as unprivileged userskubectl get pods -A
shows all pods asRunning
Pre-built image is temporarily available on my Docker Hub: https://hub.docker.com/r/akihirosuda/tmp-kind-node/tags