-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Rootless Docker #1727
Support Rootless Docker #1727
Conversation
Signed-off-by: Akihiro Suda <[email protected]>
`mount -o remount,ro /sys` fails with `permission denied` on rootless Docker and on rootless Podman, but the error is negligible. Signed-off-by: Akihiro Suda <[email protected]>
`/etc/containerd/config-rootless.toml` is the config for running kind in rootless Docker/Podman. * `ociwrapper` script is used to remove `.linux.resources.devices` from `config.json`, because `.linux.resources.devices` is meaningless on rootless and yet produces errors. Workaround until we get proper fixes in containerd and runc. * restrict_oom_score_adj is set to true to ignore oom_score_adj errors The entrypoint overrides `/etc/containerd/config.toml` with `config-rootless.toml` when running in rootless Docker/Podman. The rootless-ness is detected by comparing `/proc/1/uid_map` with `0 0 4294967295`. Note that Kubernetes needs to be patched as well (see the PR description text) Signed-off-by: Akihiro Suda <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: AkihiroSuda The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @AkihiroSuda! |
Hi @AkihiroSuda. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
cc @giuseppe |
great achievement. I'll look at the issue with Podman |
/ok-to-test |
if os.Geteuid() != 0 { | ||
p.logger.Errorf("podman provider does not work properly in rootless mode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should stop failing until this works, actually, this was previous state but it was confusing for users
@AkihiroSuda: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@giuseppe I think that podman is failing because is using the slirp4netns network
|
@@ -70,6 +70,7 @@ RUN echo "Ensuring scripts are executable ..." \ | |||
libseccomp2 pigz \ | |||
bash ca-certificates curl rsync \ | |||
nfs-common \ | |||
jq \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how big is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about 1MB including deps
https://packages.ubuntu.com/focal/jq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, paying 1MB for rootless seems worthwhile :-)
@@ -38,7 +38,11 @@ fix_mount() { | |||
# https://systemd.io/CONTAINER_INTERFACE/ | |||
# however, we need other things from `docker run --privileged` ... | |||
# and this flag also happens to make /sys rw, amongst other things | |||
# | |||
# EACCES on rootless is negligible. | |||
set +o errexit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you already detect if we're in rootless or not below, instead detect that early on and save it, and switch on it here?
toggling errexit in scripts leads to bugs, it has unintuitive behavior.
@@ -196,6 +197,18 @@ func (c *buildContext) buildImage(dir string) error { | |||
if err := createFile(cmder, containerdConfigPath, containerdConfig); err != nil { | |||
return err | |||
} | |||
containerdRootlessConfig, err := getContainerdConfig(containerdConfigTemplateData{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be able to do this without building a special node-image.
the entrypoint can rewrite this at runtime instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we edit TOML in the entrypoint? Is sed robust enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be, since the entrypoint is tied to the config, and at this point user patches have not yet been applied, so we know what the config looks like.
we can sed on default_runtime_name =.*
right?
yes, and I am not sure yet how to address it. Containers must be able to contact each other but at the same time be in different network namespaces |
if os.Geteuid() != 0 { | ||
p.logger.Errorf("podman provider does not work properly in rootless mode") | ||
os.Exit(1) | ||
p.logger.Warn("support for rootless mode is experimental, some features may not work") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the PR body suggests that it doesn't work, if that's the case then this new message seems misleading.
is there somewhere we can track this? |
checked up on the dependent PRs:
I've updated us again to the latest containerd changes, and I am WIP on redoing the containerd config in the image, kind on ZFS needs a similar automatic "if in this mode modify the containerd config" change. #1719 |
opencontainers/runc#2522 is the most relevant one, but maybe we need more
👍 Could you open a PR? |
sorry I got behind on all of this. working to catch up but the ZFS PR will be a little lower on the stack, we actually shouldn't need most of that anyhow now that opencontainers/runc#2522 is in? |
Still blocked on upstream.
I will work on a PR related to 2) with a slightly better approach, basically instead of disabling err exit we should catch and log this without failing, and we should disable the systemd mount / udev w/o depending on this xref: #1474
|
@AkihiroSuda: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Opened a new PR: #1935 The new version works with vanilla Kubernetes (1.20.0-beta.2). |
This PR adds support for running kind with Rootless Docker provider.
Requires cgroup v2 hosts.
Commits in this PR
1. "podman: unlock rootless"
Turn off
"podman provider does not work properly in rootless mode"
error and print a warning message instead.However, Podman provider still doesn't work . (see the bottom of this PR)
2. "base: ignore EACCES from
mount -o remount,ro /sys
"mount -o remount,ro /sys
fails withpermission denied
on rootless Docker and on rootless Podman, but the error is negligible.3. "containerd: add /etc/containerd/config-rootless.toml"
/etc/containerd/config-rootless.toml
is the config for running kind in rootless Docker/Podman.ociwrapper
script is used to remove.linux.resources.devices
fromconfig.json
, because.linux.resources.devices
is meaningless on rootless and yet produces errors. Workaround until we get proper fixes in containerd and runc.restrict_oom_score_adj
is set to true to ignore oom_score_adj errorsThe entrypoint overrides
/etc/containerd/config.toml
withconfig-rootless.toml
when running in rootless Docker/Podman.The rootless-ness is detected by comparing
/proc/1/uid_map
with0 0 4294967295
.How to test
Images
Base
$ docker build -t kind-base ./images/base
Available on Docker Hub as
akihirosuda/tmp-kind-base:g554d2e07
.Built from https://github.com/AkihiroSuda/kind/commits/554d2e076b1ea0fb55fcdff5cf8d972933bb78df .
Node
Needs PR kubernetes/kubernetes#93012 and PR kubernetes/kubernetes#92863.
The
kubelet: new feature gate: Rootless
commit is not necessary forkind
, because Rootless Docker itself sets up cgroup fs.$ kind build node-image --base-image kind-base
Available on Docker Hub as
akihirosuda/tmp-kind-node:g554d2e07-g3c1dda52
Built from https://github.com/AkihiroSuda/kind/commits/554d2e076b1ea0fb55fcdff5cf8d972933bb78df + https://github.com/AkihiroSuda/kubernetes/commits/3c1dda52bb3a931acb4810e34fbfa1afee949ec5
Rootless Docker
systemd.unified_cgroup_hierarchy=1
dockerd-rootless.sh
export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock
, and make suredocker info
shows "rootless" as a security option.kind create cluster --image akihirosuda/tmp-kind-node:g554d2e07-g3c1dda52
ps auxw
on the hosts, and make sure the kind processes are running as unprivileged userskubectl get pods -A
shows all pods asRunning
Rootless Podman (doesn't work yet)
systemd.unified_cgroup_hierarchy=1
KIND_EXPERIMENTAL_PROVIDER=podman kind create cluster --image akihirosuda/tmp-kind-node:g554d2e07-g3c1dda52