You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's no way for the server to send a node unpublish command to the node plugin that's running on a down node. What's becoming clear here is that the CSI specification as written cannot guarantee correct behavior in the case of downed nodes, so what K8s apparently does is just gives up and lets you potentially corrupt your data in the multi-writer volume use case. User expectations seem to agree, so we may need to rework the design here to make it possible to discard claims for lost nodes and just assume that they're never coming back. This is why we built stop_after_client_disconnect but maybe we'll need to enforce that setting for CSI-claiming tasks unless specifically disabled by the user.
We'll alter the claim Unpublish workflow to skip the Node Unpublish RPCs for any node that is marked as lost, disconnected, or nil (GC'd). Just log the fault and send the controller RPCs, if any.
If users want correct behavior on the clients, they'll include stop_after_client_disconnect. We should consider making this the default behavior for allocations that include CSI volume claims, but this might cause us backwards compatibility grief.
In later work as a performance optimization, we should also update the volumewatcher to watch for updates to the Node table. When a node is GC'd or marked as lost, emit a volume reap on any volumes that have claims on that node.
The text was updated successfully, but these errors were encountered:
Draft branch for the unpublish workflow changes is csi-discard-claims-on-gcd-nodes but this needs a lot of testing and probably some rework of whether we're snapshotting the state store for the entire CSIVolume.Unpublish RPC.
Update: I've bench-tested this out on a real cluster to my satisfaction. Going to follow up with some automated testing to probe out the edge cases and then open a PR.
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
From #12346 (comment):
lost
,disconnected
, or nil (GC'd). Just log the fault and send the controller RPCs, if any.stop_after_client_disconnect
. We should consider making this the default behavior for allocations that include CSI volume claims, but this might cause us backwards compatibility grief.volumewatcher
to watch for updates to the Node table. When a node is GC'd or marked aslost
, emit a volume reap on any volumes that have claims on that node.The text was updated successfully, but these errors were encountered: