-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI: allocrunner w/ volumes fails to restore in csi_hook after client restart #11477
Comments
Following up on this because #10833 has been closed out: on further review it's pretty clear we should be handling the case where the servers are disconnected more safely. The changes in #11892 will partially help here. But we'll also need this work upcoming work on disconnected client handling anyways. I'll be looking into this as part of other plugin work going on this next few weeks. |
Will be fixed by #12113, expected to ship in Nomad 1.3.0 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Issue is also present in 1.1.2 to 1.2.0 Beta.
Operating system and Environment details
Issue
When restarting nomad without a drain using a CSI plugin and a mounted volume, nomad will fail to restore the allocation and leave the process running.
Reproduction steps
Use a CSI plugin to mount a volume to a task
Restart the nomad process without draining the node
Expected Result
Allocation is restored
Actual Result
Allocation is failed, but process remains running, volume remains mounted.
Nomad Client logs (if appropriate)
Possibly related to #10833
Specifically it appears that the csi_hook prerun requires that the retry join has completed to make the RPC call
CSIVolume.Claim
. However, there is a race in the go routines for the retry join and the restore allocations.The text was updated successfully, but these errors were encountered: