unable to create bootstrap cluster: failed to create kind cluster tkg-kind- #2138

syangsao · 2021-10-04T21:18:16Z

Bug Report

Installation fails with the following error:

tanzu management-cluster create --ui

Validating the pre-requisites...
Serving kickstart UI at http://127.0.0.1:8080
Identity Provider not configured. Some authentication features won't work.
Validating configuration...
web socket connection established
sending pending 2 logs to UI
Using infrastructure provider docker:v0.3.23
Generating cluster configuration...
Setting up bootstrapper...
unable to set up management cluster, : unable to create bootstrap cluster: failed to create kind cluster tkg-kind-c5dmk170futgf1a95cv0: failed to init node with kubeadm: command "docker exec --privileged tkg-kind-c5dmk170futgf1a95cv0-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1

Expected Behavior

kind cluster should run through and install

Steps to Reproduce the Bug

Install Fedora 34 workstation with base rpms
Install docker CE following https://docs.docker.com/engine/install/fedora/
sudo usermod -a -G docker <username>
Reboot workstation
Download and extract the tce-linux-amd64-v0.9.1.tar.gz
Run the following command tanzu management-cluster create --ui

Screenshots or additional information and context

Environment Details

Build version (tanzu version): v0.2.1
Deployment (Managed/Standalone cluster): Standalone
Infrastructure Provider (Docker/AWS/Azure/vSphere): Docker
Operating System (client): Fedora 34 (5.14.9-200.fc34.x86_64)

Diagnostics and log bundle

The tanzu diagnostics collect fails to capture anything.

 syangsao  ~  tanzu diagnostics collect
2021/10/04 16:02:09 Collecting bootstrap cluster diagnostics
2021/10/04 16:02:09 Error: kind program binary not found
2021/10/04 16:02:09 Error: One or more required program(s) missing
2021/10/04 16:02:09 Warn: skipping management cluster diagnostics: management cluster: name not set
2021/10/04 16:02:09 Warn: skipping workload cluster diagnostics: workload cluster: name not set

The text was updated successfully, but these errors were encountered:

github-actions · 2021-10-04T21:18:28Z

Hey @syangsao! Thanks for opening your first issue. We appreciate your contribution and welcome you to our community! We are glad to have you here and to have your input on Tanzu Community Edition.

figo · 2021-10-04T22:49:58Z

Hi @syangsao , could you do the following steps to help us understand the issue better.

in your setup, install https://github.com/kubernetes-sigs/kind the kind cli.
run kind create cluster.

if you see failure, which means your setup is not ready to create kind cluster,
if kind cluster can be created successfully with https://github.com/kubernetes-sigs/kind kind cli in your setup, we need to get more logs by running tanzu management-cluster create -v 9

syangsao · 2021-10-04T23:41:10Z

Installed kind and ran the kind cluster create command without any failures.

The tanzu management-cluster create -v 9 shows the following output. I have not cleaned up the last attempted installation yet and it is still running in another window with the exit status 1 error. Should I restart the installation?

figo · 2021-10-05T00:38:28Z

@syangsao sorry, please try to run kind create cluster to actually create the kind cluster.

tvanderka · 2021-10-06T18:08:56Z

Just a guess, this is caused by old containerd 1.3.x runtime in vmware kind/node image that does not support cgroupv2. Kind with containerd 1.5 works fine on fedora 34. Similar issue from minikube kubernetes/minikube#11310
Exec into kind container while install is running and look at journalctl -fu containerd

mstefany · 2021-10-07T18:14:42Z

It seems it is really related to cgroup v2 (running Fedora 34):

Oct 07 18:12:53 tkg-kind-c5fjg6v01b0qpem9ta20-control-plane containerd[100]: time="2021-10-07T18:12:53.036761050Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-controller-manager-tkg-kind-c5fjg6v01b0qpem9ta20-control-plane,Uid:c20aaad9a0dba937a81e7bfdf96beb26,Namespace:kube-system,Attempt:0,}"
Oct 07 18:12:53 tkg-kind-c5fjg6v01b0qpem9ta20-control-plane containerd[100]: time="2021-10-07T18:12:53.058588353Z" level=info msg="starting signal loop" namespace=k8s.io path=/run/containerd/io.containerd.runtime.v2.task/k8s.io/e1940f974520601a6365a52411c223c39c23729be4874b2d6eb3bbf2f0006d12 pid=2767
Oct 07 18:12:53 tkg-kind-c5fjg6v01b0qpem9ta20-control-plane containerd[100]: time="2021-10-07T18:12:53.127894158Z" level=error msg="loading cgroup for 2791" error="cgroups: cgroup mountpoint does not exist"
Oct 07 18:12:53 tkg-kind-c5fjg6v01b0qpem9ta20-control-plane containerd[100]: time="2021-10-07T18:12:53.374520447Z" level=error msg="loading cgroup for 2791" error="cgroups: cgroup mountpoint does not exist"
Oct 07 18:12:53 tkg-kind-c5fjg6v01b0qpem9ta20-control-plane containerd[100]: time="2021-10-07T18:12:53.377250053Z" level=info msg="shim disconnected" id=e1940f974520601a6365a52411c223c39c23729be4874b2d6eb3bbf2f0006d12
Oct 07 18:12:53 tkg-kind-c5fjg6v01b0qpem9ta20-control-plane containerd[100]: time="2021-10-07T18:12:53.377303980Z" level=warning msg="cleaning up after shim disconnected" id=e1940f974520601a6365a52411c223c39c23729be4874b2d6eb3bbf2f0006d12 namespace=k8s.io
Oct 07 18:12:53 tkg-kind-c5fjg6v01b0qpem9ta20-control-plane containerd[100]: time="2021-10-07T18:12:53.377316494Z" level=info msg="cleaning up dead shim"
Oct 07 18:12:53 tkg-kind-c5fjg6v01b0qpem9ta20-control-plane containerd[100]: time="2021-10-07T18:12:53.377362452Z" level=error msg="Failed to delete sandbox container \"e1940f974520601a6365a52411c223c39c23729be4874b2d6eb3bbf2f0006d12\"" error="ttrpc: closed: unknown"
Oct 07 18:12:53 tkg-kind-c5fjg6v01b0qpem9ta20-control-plane containerd[100]: time="2021-10-07T18:12:53.381549023Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-controller-manager-tkg-kind-c5fjg6v01b0qpem9ta20-control-plane,Uid:c20aaad9a0dba937a81e7bfdf96beb26,Namespace:kube-system,Attempt:0,} failed, error" error="failed to start sandbox container task \"e1940f974520601a6365a52411c223c39c23729be4874b2d6eb3bbf2f0006d12\": ttrpc: closed: unknown"
Oct 07 18:12:53 tkg-kind-c5fjg6v01b0qpem9ta20-control-plane containerd[100]: time="2021-10-07T18:12:53.501095979Z" level=warning msg="cleanup warnings time=\"2021-10-07T18:12:53Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=2807\n"

syangsao · 2021-10-07T19:30:08Z

@syangsao sorry, please try to run kind create cluster to actually create the kind cluster.

My bad, my syntax was incorrect. Re-ran kind create cluster and verified that runs fine.

The installation still fails at the same error.

tanzu management-cluster create --ui

Downloading TKG compatibility file from 'projects.registry.vmware.com/tkg/framework-zshippable/tkg-compatibility'
Downloading the TKG Bill of Materials (BOM) file from 'projects.registry.vmware.com/tkg/tkg-bom:v1.4.0'
Downloading the TKr Bill of Materials (BOM) file from 'projects.registry.vmware.com/tkg/tkr-bom:v1.21.2_vmware.1-tkg.1'
ERROR 2021/10/07 14:19:35 svType != tvType; key=release, st=map[string]interface {}, tt=<nil>, sv=map[version:], tv=<nil>

Validating the pre-requisites...
Serving kickstart UI at http://127.0.0.1:8080
Identity Provider not configured. Some authentication features won't work.
Validating configuration...
web socket connection established
sending pending 2 logs to UI
Using infrastructure provider docker:v0.3.23
Generating cluster configuration...
Setting up bootstrapper...
unable to set up management cluster, : unable to create bootstrap cluster: failed to create kind cluster tkg-kind-c5fkgmn0futit0abc4ag: failed to init node with kubeadm: command "docker exec --privileged tkg-kind-c5fkgmn0futit0abc4ag-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1

journalctl -fu containerd shows the following:

-- Journal begins at Mon 2020-05-18 08:20:31 CDT. --
Oct 07 13:07:10 degobah containerd[1029]: time="2021-10-07T13:07:10.470406346-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
Oct 07 13:07:10 degobah containerd[1029]: time="2021-10-07T13:07:10.470413854-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
Oct 07 13:07:10 degobah containerd[1029]: time="2021-10-07T13:07:10.471241795-05:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
Oct 07 13:07:10 degobah containerd[1029]: time="2021-10-07T13:07:10.471295452-05:00" level=info msg=serving... address=/run/containerd/containerd.sock
Oct 07 13:07:10 degobah containerd[1029]: time="2021-10-07T13:07:10.471761532-05:00" level=info msg="containerd successfully booted in 0.338183s"
Oct 07 13:07:10 degobah systemd[1]: Started containerd container runtime.
Oct 07 14:10:16 degobah containerd[1029]: time="2021-10-07T14:10:16.852480151-05:00" level=info msg="starting signal loop" namespace=moby path=/run/containerd/io.containerd.runtime.v2.task/moby/cddd706f33140f00f29dc637d09d92c112c8efd2d975905d71f9a7d74e131d5c pid=4731
Oct 07 14:14:06 degobah containerd[1029]: time="2021-10-07T14:14:06.129039397-05:00" level=info msg="shim disconnected" id=cddd706f33140f00f29dc637d09d92c112c8efd2d975905d71f9a7d74e131d5c
Oct 07 14:14:06 degobah containerd[1029]: time="2021-10-07T14:14:06.129081387-05:00" level=error msg="copy shim log" error="read /proc/self/fd/11: file already closed"
Oct 07 14:20:19 degobah containerd[1029]: time="2021-10-07T14:20:19.192025058-05:00" level=info msg="starting signal loop" namespace=moby path=/run/containerd/io.containerd.runtime.v2.task/moby/53aa7941d1fae008cb7cb662fcaf28086366b5f873ac4f6d16586688953fff43 pid=9487
Oct 07 14:24:22 degobah containerd[1029]: time="2021-10-07T14:24:22.537754905-05:00" level=info msg="shim disconnected" id=53aa7941d1fae008cb7cb662fcaf28086366b5f873ac4f6d16586688953fff43
Oct 07 14:24:22 degobah containerd[1029]: time="2021-10-07T14:24:22.537835036-05:00" level=error msg="copy shim log" error="read /proc/self/fd/11: file already closed"

anibal-aguila · 2021-10-07T20:06:57Z

same issue here >

logs CONTAINERID'couldn't initialize a Kubernetes clusterk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitCo

an error has occurred:timed out waiting for the conditionThis error is likely caused by:- The kubelet is not running- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:- 'systemctl status kubelet'- 'journalctl -xeu kubelet'Additionally, a control plane component may have crashed or exited when started by the container runtime.To troubleshoot, list all containers using your preferred container runtimes CLI.Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'Once you have found the failing container, you can inspect its logs with:- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'couldn't initialize a Kubernetes clusterk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:114k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:234k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:152k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895k8s.io/kubernetes/cmd/kubeadm/app.Run/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50main.main_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25runtime.main/usr/local/go/src/runtime/proc.go:225runtime.goexit/usr/local/go/src/runtime/asm_amd64.s:1371error execution phase wait-control-planek8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:152k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895k8s.io/kubernetes/cmd/kubeadm/app.Run/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50main.main_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25runtime.main/usr/local/go/src/runtime/proc.go:225runtime.goexit/usr/local/go/src/runtime/asm_amd64.s:1371

figo · 2021-10-08T04:40:58Z

We are aware of this issue, the tkg/kind/node does not support cgroup v2 yet, the workaround is to run the kind cluster on Linux kernel with cgroup v1 (slightly older linux)

anibal-aguila · 2021-10-08T15:34:21Z

thanks @figo, after recreate grub to use cgroup v1
I get an error loop from mgmt-control-plane

Oct 08 15:30:11 tce-mgmt-control-plane-pr26z kubelet[1027]: E1008 15:30:11.234381    1027 kubelet.go:1384] "Failed to start ContainerManager" err="failed to get rootfs info: failed to get device for dir \"/var/lib/kubelet\": could not find device with major: 0, minor: 29 in cached partitions map"
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.874508    1063 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.878613    1063 cri_stats_provider.go:369] "Failed to get the info of the filesystem with mountpoint" err="failed to get device for dir \"/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs\": could not find device with major: 0, minor: 29 in cached partitions map" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.878663    1063 kubelet.go:1306] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.879435    1063 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"tce-mgmt-control-plane-pr26z.16ac17e0bf268f99", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"tce-mgmt-control-plane-pr26z", UID:"tce-mgmt-control-plane-pr26z", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"tce-mgmt-control-plane-pr26z"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc05036e2b432ab99, ext:6377886713, loc:(*time.Location)(0x74bc600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc05036e2b432ab99, ext:6377886713, loc:(*time.Location)(0x74bc600)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://172.18.0.3:6443/api/v1/namespaces/default/events": EOF'(may retry after sleeping)
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.884161    1063 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.896533    1063 kubelet_network_linux.go:79] "Failed to ensure that nat chain exists KUBE-MARK-DROP chain" err="error creating chain \"KUBE-MARK-DROP\": exit status 3: modprobe: ERROR: could not insert 'ip6_tables': Exec format error\nip6tables v1.8.4 (legacy): can't initialize ip6tables table `nat': Table does not exist (do you need to insmod?)\nPerhaps ip6tables or your kernel needs to be upgraded.\n"
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.896631    1063 kubelet.go:1870] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.911302    1063 manager.go:1123] Failed to create existing container: /docker/7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418: failed to identify the read-write layer ID for container "7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418". - open /var/lib/docker/image/btrfs/layerdb/mounts/7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418/mount-id: no such file or directory
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.911949    1063 manager.go:1123] Failed to create existing container: /docker/7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418/docker/7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418: failed to identify the read-write layer ID for container "7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418". - open /var/lib/docker/image/btrfs/layerdb/mounts/7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418/mount-id: no such file or directory
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.935350    1063 manager.go:1123] Failed to create existing container: /docker/7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418/docker/7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418: failed to identify the read-write layer ID for container "7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418". - open /var/lib/docker/image/btrfs/layerdb/mounts/7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418/mount-id: no such file or directory
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.936557    1063 manager.go:1123] Failed to create existing container: /docker/7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418: failed to identify the read-write layer ID for container "7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418". - open /var/lib/docker/image/btrfs/layerdb/mounts/7b40b87a33e82c4e6d3c8a7c328f631c2c933a031e1defd9b653c802cf8d0418/mount-id: no such file or directory
Oct 08 15:30:18 tce-mgmt-control-plane-pr26z kubelet[1063]: E1008 15:30:18.951003    1063 kubelet.go:1384] "Failed to start ContainerManager" err="failed to get rootfs info: failed to get device for dir \"/var/lib/kubelet\": could not find device with major: 0, minor: 29 in cached partitions map"

Change to use cgroup v1

add systemd.unified_cgroup_hierarchy=0
grub-mkconfig
reboot

sudo vim /etc/default/grub
    GRUB_CMDLINE_LINUX=" ...  systemd.unified_cgroup_hierarchy=0"
sudo grub-mkconfig -o /boot/grub/grub.cfg
reboot

manifest-file.yaml

CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: tce-mgmt
ENABLE_MHC: "false"
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: docker
LDAP_BIND_DN: ""
LDAP_BIND_PASSWORD: ""
LDAP_GROUP_SEARCH_BASE_DN: ""
LDAP_GROUP_SEARCH_FILTER: ""
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: ""
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: ""
LDAP_ROOT_CA_DATA_B64: ""
LDAP_USER_SEARCH_BASE_DN: ""
LDAP_USER_SEARCH_FILTER: ""
LDAP_USER_SEARCH_NAME_ATTRIBUTE: ""
LDAP_USER_SEARCH_USERNAME: userPrincipalName
OIDC_IDENTITY_PROVIDER_CLIENT_ID: ""
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: ""
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: ""
OIDC_IDENTITY_PROVIDER_ISSUER_URL: ""
OIDC_IDENTITY_PROVIDER_NAME: ""
OIDC_IDENTITY_PROVIDER_SCOPES: ""
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: ""
OS_ARCH: ""
OS_NAME: ""
OS_VERSION: ""
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: "true"
CLUSTER_PLAN: dev

syangsao · 2021-10-08T15:59:39Z

We are aware of this issue, the tkg/kind/node does not support cgroup v2 yet, the workaround is to run the kind cluster on Linux kernel with cgroup v1 (slightly older linux)

With Fedora 34, you don't need to run an older kernel release. There is a method to use cgroups v1 [1].

I ran the following command from the link and rebooted.

sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"

Make sure you verify this is configured upon reboot.

cat /proc/cmdline

BOOT_IMAGE=(hd0,msdos6)/vmlinuz-5.14.9-200.fc34.x86_64 root=/dev/mapper/fedora_localhost--live-root ro resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.lvm.lv=fedora_localhost-live/swap rhgb quiet systemd.unified_cgroup_hierarchy=0

cat /etc/default/grub

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.lvm.lv=fedora_localhost-live/swap rhgb quiet systemd.unified_cgroup_hierarchy=0"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

I just verified that the installation finished and is working for me on Fedora 34 with cgroups v1.

[1] https://fedoramagazine.org/docker-and-fedora-32/

martingruening · 2021-10-11T11:38:45Z

I have the exactly same issue, but not on Fedora but on Debian 11 (AMD64) with a 5.10 kernel / Docker 20.10.9.
It took me quite some time of researching until I've found this issue. It would be great to find this information in the Getting Started section of the documention (together with the other Docker-specific requirements).

figo · 2021-10-11T20:37:20Z

cc @joshrosso @dvonthenen

davidvonthenen · 2021-10-11T22:27:34Z

Yup, we can definitely add that in the getting started guide so people aren't spinning their wheels.
cc: @kcoriordan

syangsao added kind/bug A bug in an existing capability triage/needs-triage Needs triage by TCE maintainers labels Oct 4, 2021

stmcginnis mentioned this issue Oct 10, 2021

Deployement failed // docker installation #2207

Closed

davidvonthenen assigned kcoriordan Oct 11, 2021

davidvonthenen added proposal/acccepted Change is accepted owner/docs Work executed by VMware documentation team kind/docs A change in documentation and removed kind/bug A bug in an existing capability triage/needs-triage Needs triage by TCE maintainers labels Oct 11, 2021

kcoriordan mentioned this issue Oct 12, 2021

doc: bootstrap prereq for linux cgroup1 #2220

Merged

kcoriordan added this to the v0.10.0 milestone Oct 12, 2021

stmcginnis closed this as completed in #2220 Oct 12, 2021

joshrosso removed the proposal/acccepted Change is accepted label Jan 14, 2022

minuq mentioned this issue Feb 20, 2022

Creation of managed cluster fails #3200

Closed

swalner-vmware mentioned this issue Mar 21, 2022

Proposal TCE Concierge (installer) #3257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to create bootstrap cluster: failed to create kind cluster tkg-kind- #2138

unable to create bootstrap cluster: failed to create kind cluster tkg-kind- #2138

syangsao commented Oct 4, 2021 •

edited

Loading

github-actions bot commented Oct 4, 2021

figo commented Oct 4, 2021 •

edited

Loading

syangsao commented Oct 4, 2021

figo commented Oct 5, 2021

tvanderka commented Oct 6, 2021

mstefany commented Oct 7, 2021

syangsao commented Oct 7, 2021 •

edited

Loading

anibal-aguila commented Oct 7, 2021 •

edited

Loading

figo commented Oct 8, 2021

anibal-aguila commented Oct 8, 2021 •

edited

Loading

syangsao commented Oct 8, 2021 •

edited

Loading

martingruening commented Oct 11, 2021

figo commented Oct 11, 2021

davidvonthenen commented Oct 11, 2021 •

edited

Loading

unable to create bootstrap cluster: failed to create kind cluster tkg-kind- #2138

unable to create bootstrap cluster: failed to create kind cluster tkg-kind- #2138

Comments

syangsao commented Oct 4, 2021 • edited Loading

Bug Report

Expected Behavior

Steps to Reproduce the Bug

Screenshots or additional information and context

Environment Details

Diagnostics and log bundle

github-actions bot commented Oct 4, 2021

figo commented Oct 4, 2021 • edited Loading

syangsao commented Oct 4, 2021

figo commented Oct 5, 2021

tvanderka commented Oct 6, 2021

mstefany commented Oct 7, 2021

syangsao commented Oct 7, 2021 • edited Loading

anibal-aguila commented Oct 7, 2021 • edited Loading

figo commented Oct 8, 2021

anibal-aguila commented Oct 8, 2021 • edited Loading

syangsao commented Oct 8, 2021 • edited Loading

martingruening commented Oct 11, 2021

figo commented Oct 11, 2021

davidvonthenen commented Oct 11, 2021 • edited Loading

syangsao commented Oct 4, 2021 •

edited

Loading

figo commented Oct 4, 2021 •

edited

Loading

syangsao commented Oct 7, 2021 •

edited

Loading

anibal-aguila commented Oct 7, 2021 •

edited

Loading

anibal-aguila commented Oct 8, 2021 •

edited

Loading

syangsao commented Oct 8, 2021 •

edited

Loading

davidvonthenen commented Oct 11, 2021 •

edited

Loading