Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear error on 'kind cluster create' #2093

Closed
aronchick opened this issue Feb 26, 2021 · 14 comments
Closed

Unclear error on 'kind cluster create' #2093

aronchick opened this issue Feb 26, 2021 · 14 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@aronchick
Copy link

Went through the standard install on Ubuntu 20.04

❯ kind create cluster --retain -v 1 --name kind202102251848
Creating cluster "kind202102251848" ...
DEBUG: docker/images.go:58] Image: kindest/node:v1.20.2@sha256:8f7ea6e7642c0da54f04a7ee10431549c0257315b3a634f6ef2fecaaedb19bab present locally
 ✓ Ensuring node image (kindest/node:v1.20.2) 🖼 
 ✓ Preparing nodes 📦  
 ✗ Writing configuration 📜 
ERROR: failed to create cluster: failed to generate kubeadm config content: failed to get kubernetes version from node: failed to get file: command "docker exec --privileged kind202102251848-control-plane cat /kind/version" failed with error: exit status 1
Command Output: Error response from daemon: Container 3f381d3b346ded528df202bfb68b6518a4c724375ece9a406639a61470163be1 is not running
Stack Trace: 
sigs.k8s.io/kind/pkg/errors.WithStack
        sigs.k8s.io/kind/pkg/errors/errors.go:51
sigs.k8s.io/kind/pkg/exec.(*LocalCmd).Run
        sigs.k8s.io/kind/pkg/exec/local.go:124
sigs.k8s.io/kind/pkg/cluster/internal/providers/docker.(*nodeCmd).Run
        sigs.k8s.io/kind/pkg/cluster/internal/providers/docker/node.go:146
sigs.k8s.io/kind/pkg/exec.OutputLines
        sigs.k8s.io/kind/pkg/exec/helpers.go:81
sigs.k8s.io/kind/pkg/cluster/nodeutils.KubeVersion
        sigs.k8s.io/kind/pkg/cluster/nodeutils/util.go:35
sigs.k8s.io/kind/pkg/cluster/internal/create/actions/config.getKubeadmConfig
        sigs.k8s.io/kind/pkg/cluster/internal/create/actions/config/config.go:172
sigs.k8s.io/kind/pkg/cluster/internal/create/actions/config.(*Action).Execute.func1.1
        sigs.k8s.io/kind/pkg/cluster/internal/create/actions/config/config.go:84
sigs.k8s.io/kind/pkg/errors.UntilErrorConcurrent.func1
        sigs.k8s.io/kind/pkg/errors/concurrent.go:30
runtime.goexit
        runtime/asm_amd64.s:1374

What happened:
Stack trace / failure to install

What you expected to happen:
Installed correctly

How to reproduce it (as minimally and precisely as possible):

curl -Lo /tmp/kind https://kind.sigs.k8s.io/dl/v0.10.0/kind-linux-amd64
chmod +x /tmp/kind
mv /tmp/kind /usr/local/bin/kind

Anything else we need to know?:
I DID have rootless installed but have since uninstalled it and it correctly indicates as such.

Environment:


❯ kind export logs --name kind202102251854
ERROR: [command "docker exec --privileged kind202102251854-control-plane sh -c 'tar --hard-dereference -C /var/log/ -chf - . || (r=$?; [ $r -eq 1 ] || exit $r)'" failed with error: exit status 1, [command "docker exec --privileged kind202102251854-control-plane journalctl --no-pager -u kubelet.service" failed with error: exit status 1, command "docker exec --privileged kind202102251854-control-plane cat /kind/version" failed with error: exit status 1, command "docker exec --privileged kind202102251854-control-plane journalctl --no-pager" failed with error: exit status 1, command "docker exec --privileged kind202102251854-control-plane journalctl --no-pager -u containerd.service" failed with error: exit status 1]]

❯ kind version                                             
kind v0.10.0 go1.15.7 linux/amd64

❯ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-18T16:12:00Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

❯ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)

Server:
 Containers: 11
  Running: 2
  Paused: 0
  Stopped: 9
 Images: 19
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: false
 Logging Driver: json-file
 Cgroup Driver: none
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
  rootless
 Kernel Version: 5.4.0-1039-azure
 Operating System: Ubuntu 20.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.36GiB
 Name: cloudVMOCTO
 ID: VNN3:TAMI:I3CM:DEBY:CDG2:5Y2T:2RL3:JDFR:OAVS:ZMSH:CE5A:SUZT
 Docker Root Dir: /home/daaronch/.local/share/docker
 Debug Mode: false
 Username: aronchick
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: No swap limit support

❯ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
@aronchick aronchick added the kind/bug Categorizes issue or PR as related to a bug. label Feb 26, 2021
@BenTheElder
Copy link
Member

Rootless is not supported yet. Kubernetes has not agreed to support rootless yet but there is a WIP PR to kind to implement support with workarounds.

@BenTheElder
Copy link
Member

I'm not sure what error we can give here, what we know is we failed to interact with the node. There are lots of root causes and not much time to implement detecting them versus fixing the ones we can prevent.

@aronchick
Copy link
Author

No for sure - I deleted/uninstalled rootless, and docker reports it is running rootful- if you have any suggestions on additional logs I can provide, happy to!

(btw thank you so much for being so unbelievably responsive!)

@aronchick
Copy link
Author

(Btw #1288 led me here, but I can’t quite figure out any work arounds in that thread that I could apply here)

@BenTheElder
Copy link
Member

ACK thanks! I was AFK for a bit, let's root cause this:

  • Backing Filesystem: <unknown> is a little surprising. Do you know what filesystem /home/daaronch/.local/share/docker is on?

  • Since you ran with --retain we can grab kind export logs and see what the in-cluster logs show. Given that the node container appears to be no longer running the most interesting one that exports in this case is equivilant to docker logs kind-control-plane

  • Is docker installed with a confined snap? https://kind.sigs.k8s.io/docs/user/known-issues/#docker-installed-with-snap

@BenTheElder BenTheElder self-assigned this Feb 26, 2021
@aronchick
Copy link
Author

Sorry, i was afk now :)

OK - i started poking around at the first one - that seems like it HAS to be it. Unfortunately no - I saw that 'share' had weird perms for a directory (go didn't have read) so i added it and didn't help.

❯ ls -la
total 28
drwx--x--x   5 daaronch daaronch  4096 Oct 12 16:39 .
drwxr-xr-x 130 daaronch daaronch 12288 Feb 26 15:49 ..
drwxrwxr-x   2 daaronch daaronch  4096 Dec 16 20:21 bin
drwx------   3 daaronch daaronch  4096 Oct 12 16:39 lib
drwx--x--x  10 daaronch daaronch  4096 Feb 12 17:47 share

❯ chmod go+r share

❯ ls -la
total 28
drwx--x--x   5 daaronch daaronch  4096 Oct 12 16:39 .
drwxr-xr-x 130 daaronch daaronch 12288 Feb 26 15:50 ..
drwxrwxr-x   2 daaronch daaronch  4096 Dec 16 20:21 bin
drwx------   3 daaronch daaronch  4096 Oct 12 16:39 lib
drwxr-xr-x  10 daaronch daaronch  4096 Feb 12 17:47 share

❯ kind create cluster --retain -v 1 --name kind202102260750
Creating cluster "kind202102260750" ...
DEBUG: docker/images.go:58] Image: kindest/node:v1.20.2@sha256:8f7ea6e7642c0da54f04a7ee10431549c0257315b3a634f6ef2fecaaedb19bab present locally
 ✓ Ensuring node image (kindest/node:v1.20.2) 🖼
 ✓ Preparing nodes 📦
 ✗ Writing configuration 📜
ERROR: failed to create cluster: failed to generate kubeadm config content: failed to get kubernetes version from node: failed to get file: command "docker exec --privileged kind202102260750-control-plane cat /kind/version" failed with error: exit status 1
Command Output: Error response from daemon: Container 421fe4dcc8297ceb7ec12b2296c23cd367348fa80323f6d4290e2d6d9439bc48 is not running
Stack Trace:
sigs.k8s.io/kind/pkg/errors.WithStack
        sigs.k8s.io/kind/pkg/errors/errors.go:51
sigs.k8s.io/kind/pkg/exec.(*LocalCmd).Run
        sigs.k8s.io/kind/pkg/exec/local.go:124
sigs.k8s.io/kind/pkg/cluster/internal/providers/docker.(*nodeCmd).Run
        sigs.k8s.io/kind/pkg/cluster/internal/providers/docker/node.go:146
sigs.k8s.io/kind/pkg/exec.OutputLines
        sigs.k8s.io/kind/pkg/exec/helpers.go:81
sigs.k8s.io/kind/pkg/cluster/nodeutils.KubeVersion
        sigs.k8s.io/kind/pkg/cluster/nodeutils/util.go:35
sigs.k8s.io/kind/pkg/cluster/internal/create/actions/config.getKubeadmConfig
        sigs.k8s.io/kind/pkg/cluster/internal/create/actions/config/config.go:172
sigs.k8s.io/kind/pkg/cluster/internal/create/actions/config.(*Action).Execute.func1.1
        sigs.k8s.io/kind/pkg/cluster/internal/create/actions/config/config.go:84
sigs.k8s.io/kind/pkg/errors.UntilErrorConcurrent.func1
        sigs.k8s.io/kind/pkg/errors/concurrent.go:30
runtime.goexit
        runtime/asm_amd64.s:1374

Here's the logs for the new cluster.

❯ kind export logs --name kind202102260750
ERROR: [command "docker exec --privileged kind202102260750-control-plane sh -c 'tar --hard-dereference -C /var/log/ -chf - . || (r=$?; [ $r -eq 1 ] || exit $r)'" failed with error: exit status 1, [command "docker exec --privileged kind202102260750-control-plane journalctl --no-pager -u containerd.service" failed with error: exit status 1, command "docker exec --privileged kind202102260750-control-plane cat /kind/version" failed with error: exit status 1, command "docker exec --privileged kind202102260750-control-plane journalctl --no-pager -u kubelet.service" failed with error: exit status 1, command "docker exec --privileged kind202102260750-control-plane journalctl --no-pager" failed with error: exit status 1]]

No to 'snap' - here's how I installed it. (it's done in a bash wrapped by go) - last line is a test to make sure it worked properly under my existing user.

		apt-get update
		apt-get install -y \
			apt-transport-https \
			ca-certificates \
			curl \
			gnupg-agent \
			software-properties-common
		curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
		add-apt-repository \
			"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
			$(lsb_release -cs) \
			stable"
		apt-get update
		apt-get install -y docker-ce docker-ce-cli containerd.io
		usermod -aG docker ` + executingUser.Username + `
		sudo su ` + executingUser.Username + `
		docker run hello-world

@BenTheElder
Copy link
Member

What do you see for:
docker logs kind-control-plane
stat -f -c %T /home/daaronch/. local/share/docker
mount -l

@aronchick
Copy link
Author

❯ docker logs kind-control-plane
Error: No such container: kind-control-plane

❯ mount -l
/dev/sda1 on / type ext4 (rw,relatime,discard) [cloudimg-rootfs]
devtmpfs on /dev type devtmpfs (rw,relatime,size=16439588k,nr_inodes=4109897,mode=755)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,size=3288732k,mode=755)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=28,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11903)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
/dev/sda15 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro,discard) [UEFI]
/var/lib/snapd/snaps/lxd_19032.snap on /snap/lxd/19032 type squashfs (ro,nodev,relatime,x-gdu.hide)
/var/lib/snapd/snaps/core18_1944.snap on /snap/core18/1944 type squashfs (ro,nodev,relatime,x-gdu.hide)
/var/lib/snapd/snaps/snapd_10707.snap on /snap/snapd/10707 type squashfs (ro,nodev,relatime,x-gdu.hide)
/var/lib/snapd/snaps/lxd_19188.snap on /snap/lxd/19188 type squashfs (ro,nodev,relatime,x-gdu.hide)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)
/dev/sdb1 on /mnt type ext4 (rw,relatime,x-systemd.requires=cloud-init.service)
tmpfs on /run/snapd/ns type tmpfs (rw,nosuid,nodev,size=3288732k,mode=755)
nsfs on /run/snapd/ns/lxd.mnt type nsfs (rw)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=3288728k,mode=700,uid=1000,gid=1000)
/var/lib/snapd/snaps/core18_1988.snap on /snap/core18/1988 type squashfs (ro,nodev,relatime,x-gdu.hide)
/var/lib/snapd/snaps/snapd_11036.snap on /snap/snapd/11036 type squashfs (ro,nodev,relatime,x-gdu.hide)

❯ stat -f -c %T /home/daaronch/.local/share/docker
ext2/ext3

@BenTheElder
Copy link
Member

Huh, even after --retain we can't get docker logs?
Can you try creating again and then try to get the container log?

It's also strange that docker isn't in /var/lib by default, maybe the rootless uninstall wasn't complete?

@aronchick
Copy link
Author

YUP. I'm almost certain this is the case (rootless uninstall wasn't complete).

Before I did some slash and burn, I just wanted to give you an opportunity to debug - lemme know and i'll start deleting a lot more stuff.

How should I get the container log? Just --retain and -v 1?

(Could also be a cheap and cheerful way for you to add detection code - e.g. "exec.LookPath("docker") -> /var/lib/docker or 'log.warn'"

@BenTheElder
Copy link
Member

Retain should prevent kind from deleting the container on failure and then docker logs kind-control-plane.

Go ahead and burn at will, I'm curious what happened here but ultimately not surprised that rootless could cause this, rootless docker is still pretty new and the Kubernetes KEP is still blocked. Hopefully we'll enable workarounds by the next kind release. I need to go review those again 😅

People (including our CI sometimes) configure an alternate docker root dir fairly often with no issue, we could try to probe the filesystem but we also still nominally support dockerd not being on the same host (useful in some CI environments to run a dockerd as a service available to containerized tests remotely)

@BenTheElder
Copy link
Member

BTW for rootless: #1797, kubernetes/enhancements#2033 (we can ship the former without the latter, but we really want the latter as well)

@aronchick
Copy link
Author

aronchick commented Feb 26, 2021

OK that got it...

For those coming here via Google - the following things probably are sticking around (even after removing rootless)

  • sudo apt-get remove docker-ce-rootless-extras
  • /etc/opt/microsoft/scx/conf/sudodir/dockerd-rootless-setuptool.sh uninstall --force
  • /home/YOUR_USER_NAME/bin/rootlesskit rm -rf /home/YOUR_USER_NAME/.local/share/docker

Then reinstall docker (maybe), and check your echo $DOCKER_HOST. It should NOT be pointing to /run/user/..., and it should probably be pointing to unix:///run/docker.sock

@BenTheElder
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants