-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI volume keeps references to failed allocations #8145
Comments
The problem seems to be that allocations from failed jobs are not removed from the volume. I could reproduce the problem with these steps:
=> The volume still holds the allocation for the job. In my case I registered a volume which could not be mounted because a volume for the external id does not exist. |
Hi @mkrueger-sabio! Thanks for opening this issue. This is definitely unexpected behavior and I'll be digging into this.
This is an interesting detail.
So what we end up with is a Nomad-registered volume, that has no physical counterpart, but because of that it can't clean up the allocs that claimed it? It shouldn't be possible to write the claim in that case, but that may be where the bug is. |
I encountered quite the same problems but with "running" allocations that i stopped. |
Running into the same issue. Volumes reference a non-existent allocation and are unable to be removed. Not sure of anyway to manually force these volumes outta existance (the deregister force option doesn't work unfortunately) so I'm assuming they will likely be stuck there until a fix is released. |
Hey folks, just FYI we shipped a |
Thank, this helps to remove a lot of volumes. I still have the problem that I cannot remove a volume which has a pending allocation. But the allocation does not exist anymore. |
Understood. I'm pretty sure I know what's going on there and I'm working on a fix for this set of problems. |
Wanted to give a quick status update. I've landed a handful of PRs that will be released as part of the upcoming 0.12.2 release:
I believe these fixes combined should get us into pretty good shape, and #8584 will give you an escape hatch to manually detach the volume via |
For sake of our planning, I'm going to close this issue. We'll continue to track progress of this set of problems in #8100. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
0.11.3
Issue
I don't know what I have done afterwards. Probably, I removed the volume although the job was still in pending. The problem is now, that I cannot remove the volume because it is still in pending and it stays there.
I have stopped the plugin and the job but the allocation is still there. It is only visible in the UI. When I query the allocation with the nomad client I cannot find it.
I tried to run gc and restart the client and server, but nothing happens.
How can I remove the allocation?
The text was updated successfully, but these errors were encountered: