-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeStage might not always be idempotent for nitro instances #1076
Comments
Similar to issue #1027 |
The way to reproduce is to have a pod with an ebs csi volume mounted and restart kubelet. The instance must be nitro and kubelet must be >= 1.20. In 1.19 and below, kubelet checked if the dir is mounted for us before calling NodeStage. In 1.20 it called without checking, assuming our implementation is idempotent. Our implementation is really not due to this nitro-specific bug, so I'll submit a fix to make it more consistent with the newly idempotent NodePublish.
|
I reproduced also the stuck terminating part in #1027 (after kubelet has restarted and is printing those already mounted errors, just try deleting the pod) |
/kind bug
What happened?
The logs aren't mine and I am ASSUMING that /dev/xvdba is a symlink to /dev/nvme3n1. But from reading the code I think there is potential for a bug so I will try to reproduce and get my own logs.
The check here
aws-ebs-csi-driver/pkg/driver/node.go
Lines 199 to 202 in c428c04
source
returned byfindDevicePath
might be of the form "/dev/xvdcf" where "/dev/xvdcf" is a symlink to "/dev/nvme1n1". BecausefindDevicePath
naively assumes that if /dev/xvdcf exists then the volume is not nvme:aws-ebs-csi-driver/pkg/driver/node_linux.go
Line 42 in edb4fbe
devicePath
returned byGetDeviceNameFromMount
will be the "canonical" path "/dev/nvme1n1". So since "/dev/xvdcf" != "/dev/nvme1n1" the check will fail.aws-ebs-csi-driver/pkg/driver/node.go
Line 664 in edb4fbe
mount
with and the canonical path shows up in the mount table.What you expected to happen? Nodestage should be idempotent
How to reproduce it (as minimally and precisely as possible)?
WIP... I don't have clear repro steps yet. If I restart kubelet I would expect it to reconstruct actual state of world and not bother to call NodeStage again so I'm not sure how to force it to call NodeStage twice and trigger this bug at the moment.
Anything else we need to know?:
Environment
kubectl version
): 1.20The text was updated successfully, but these errors were encountered: