-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI: skip node unpublish on GC'd or down nodes #13301
Conversation
eec9b4b
to
e1d6b40
Compare
If the node has been GC'd or is down, we can't send it a node unpublish. The CSI spec requires that we don't send the controller unpublish before the node unpublish, but in the case where a node is gone we can't know the final fate of the node unpublish step. The `csi_hook` on the client will unpublish if the allocation has stopped and if the host is terminated there's no mount for the volume anyways. So we'll now assume that the node has unpublished at its end. If it hasn't, any controller unpublish will potentially hang or error and need to be retried.
e1d6b40
to
9de444e
Compare
@@ -0,0 +1,3 @@ | |||
```release-note:bug | |||
csi: Fixed a bug where volume claims on lost or garbage collected nodes could not be freed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: I'm torn on whether to call this a bug or improvement but calling it a bug makes it something we can backport so I'm leaning that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about other possible node status (like initializing or draining) but those would be fine since the node is still around to handle the request right?
Exactly! The |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #13264
If the node has been GC'd or is down, we can't send it a node
unpublish. The CSI spec requires that we don't send the controller
unpublish before the node unpublish, but in the case where a node is
gone we can't know the final fate of the node unpublish step.
The
csi_hook
on the client will unpublish if the allocation hasstopped and if the host is terminated there's no mount for the volume
anyways. So we'll now assume that the node has unpublished at its
end. If it hasn't, any controller unpublish will potentially hang or
error and need to be retried.
(Note that while this behavior isn't ideal, it appears to match user
expectations and the behavior reported by k8s users.)