Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document known issue around BTRFS #2584

Merged
merged 2 commits into from
Feb 6, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions site/content/docs/user/known-issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ description: |-


## Contents
* [Troubleshooting Kind](#troubleshooting-kind)
* [Kubectl Version Skew](#kubectl-version-skew) (Kubernetes limits supported version skew)
* [Older Docker Installations](#older-docker-installations) (untested, known to have bugs)
* [Docker Installed With Snap](#docker-installed-with-snap) (snap filesystem restrictions problematic)
Expand All @@ -39,6 +40,12 @@ description: |-
* [IPv6 Port Forwarding](#ipv6-port-forwarding) (docker doesn't seem to implement this correctly)
* [Couldn't find an alternative telinit implementation to spawn](#docker-init-daemon-config)
* [Fedora](#fedora) (various)
* [Failed to get rootfs info](#failed-to-get-rootfs-info--stat-failed-on-dev)

## Troubleshooting Kind

If the cluster fails to create, try again with the `--retain` option (preserving the failed container),
then run `kind export logs` to export the logs from the container to a temporary directory on the host.
aojea marked this conversation as resolved.
Show resolved Hide resolved

## Kubectl Version Skew

Expand Down Expand Up @@ -306,6 +313,34 @@ your workloads inside the cluster via the nodes IPv6 addresses.

See Previous Discussion: [kind#1326]

## Failed to get rootfs info / "stat failed on /dev/..."

On some systems, creating a cluster times out with these errors in kubelet.log (device varies):

```
stat failed on /dev/nvme0n1p3 with error: no such file or directory
"Failed to start ContainerManager" err="failed to get rootfs info: failed to get device for dir \"/var/lib/kubelet\": could not find device with major: 0, minor: 40 in cached partitions map"
```

Kubernetes needs access to storage device nodes in order to do some stuff, e.g. tracking free disk space. Therefore, Kind needs to mount the necessary device nodes from the host into the control-plane container — however, it cannot always determine which device Kubernetes requires, since this varies with the host filesystem. For example, Kind doesn't handle BTRFS, which is the default for modern Fedora.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modern Fedora -> modern Fedora on Desktop, IIUC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Desktop, certainly... I don't know about other variants. But I don't think Fedora itself is relevant to the issue... I mention it only as an example of a common distro where the problem will occur on a stock configuration. I can adjust the wording if you have any suggestions?


This can be worked around by including the necessary device as an extra mount in the cluster configuration file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, setting $KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER to native or fuse-overlafys may work too?

Copy link
Contributor Author

@simon-geard simon-geard Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That I don't know, and can't test right now (it's late at night).

Is that setting documented somewhere? I've been using Kind for about 2 days now, so not an expert by any means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, did some quick experimenting today. Assuming you're expecting something like:

KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER=native kind create cluster --retain

...then neither native nor fuse-overlayfs has any apparent effect. Both fail with the original problem, "stat failed on /dev/nvme0n1p3 with error".


```yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraMounts:
- hostPath: /dev/nvme0n1p3
containerPath: /dev/nvme0n1p3
Comment on lines +335 to +336
Copy link
Contributor

@aojea aojea Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add some detailed instructions for users on how to obtain the device path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's in the following paragraph, where I state "the expected device is named in the error message".

Copy link

@e-minguez e-minguez Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, in my Fedora 35 it complained about /dev/mapper/luks-903aad3d-... and using hostPath/containerPath: /dev/mapper/luks-903aad3d-.. didn't worked because it is a symlink to /dev/dm-0. Using /dev/dm-0 worked.

ls -l /dev/mapper/
total 0
crw-------. 1 root root 10, 236 ene 11 08:06 control
lrwxrwxrwx. 1 root root       7 ene 11 08:06 luks-903aad3d-... -> ../dm-0
cat << EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraMounts:
  - hostPath: /dev/dm-0
    containerPath: /dev/dm-0
    propagation: HostToContainer
EOF

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, two variations - if the device cited in the error message is under /dev/mapper, the extra mount should be the /dev/dm-* device it points to; otherwise it should be the device from the message?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it needs to be the real device, not a symlink. So if the error complains about /dev/foo/bar, the config needs to have the real device such as readlink -f /dev/foo/bar

MYDEVICE=$(readlink -f /dev/mapper/luks-903aad3d-...)
cat << EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraMounts:
  - hostPath: ${MYDEVICE}
    containerPath: ${MYDEVICE}
    propagation: HostToContainer
EOF

propagation: HostToContainer
```

The expected device is named in the error message, but will typically be the location where container volumes are stored — for rootless Docker or Podman, this will usually be $HOME.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this workaround really work on rootless?
I also guess rootless may not require this workaround, as it uses fuse-overlayfs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly works for Podman on Fedora - and yes, the workaround is definitely required; otherwise I wouldn't have known this issue existed.


See Previous Discussion: [kind#2411]

## Fedora

### Firewalld
Expand Down Expand Up @@ -348,6 +383,7 @@ Although the policy has been fixed in Fedora 34, the fix has not been backported
[kind#1179]: https://github.com/kubernetes-sigs/kind/issues/1179
[kind#1326]: https://github.com/kubernetes-sigs/kind/issues/1326
[kind#2296]: https://github.com/kubernetes-sigs/kind/issues/2296
[kind#2411]: https://github.com/kubernetes-sigs/kind/issues/2411
[moby#9939]: https://github.com/moby/moby/issues/9939
[moby#17666]: https://github.com/moby/moby/issues/17666
[Docker resource lims]: https://docs.docker.com/docker-for-mac/#advanced
Expand Down