-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search for nvme device path even if non-nvme exists #1082
Search for nvme device path even if non-nvme exists #1082
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wongma7 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
25dbf8e
to
01d8c8d
Compare
01d8c8d
to
f3db9e8
Compare
537abf2
to
35e6aa8
Compare
35e6aa8
to
9ea0250
Compare
Also verified that if a pod is stuck Terminating because NodeStage keeps failing after the kubelet restart, upgrading to driver with this fix unsticks it.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
This code does not actually resolve the issue for me. @wongma7 can you paste the udev rules on your system that create the
however, the code is searching for:
I'm really not sure why the code bothers to dive into that logic however, it seems like at the point where it determines that the Another solution which would potentially be more reliable than the path games is if the code that compares the mount device to the symlink actually called |
Is this a bug fix or adding new feature? /bug
What is this PR about? / Why do we need it? (intended to) fix #1076 and maybe #1027.
NodeStage is not idempotent for nvme volumes. This is most apparent in kubelet 1.20 because it is guaranteed to call NodeStage again for the same volume when it restarts. Then you will see the errors in #1076 because NodeStage compares the path /dev/xvbda to the path /dev/nvme1n1, when the intention is to compare /dev/nvme1n1 with /dev/nvme1n1 and return NodeStage as no-op success.
The fix is to ensure that the path we compare is the "canonical" one, so even if we see that /dev/xvbda exists we still need to search for the "canonical" nvme representation of the device because that is what will show up in the mount table. Then we are comparing what we find with what is in the mount table and it should match.
What testing is done?
e2e In CI:
I updated the test to create nitro c5 instances so this code will get exercised
e2e on my cluster:
built and pushed an image to my registry
restarted kubelet to repro the issue.
deployed my image and the errors stopped.
unit tests:
see latest 2 commits