-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi: volume claim garbage collection #7125
Conversation
877e159
to
ef11c8e
Compare
e213e55
to
1942fc3
Compare
33571ea
to
44bdb3c
Compare
1942fc3
to
37af303
Compare
err = core.Process(eval) | ||
require.NoError(t, err) | ||
|
||
// Verify the claim was released |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@langmartin wanted to flag this for you in particular because you implemented the state store work. As far as I can tell there's no real reason to bother with tracking PastAllocs
with this implementation, and we can swap out the implementation of structs.CSIVolume.ClaimRelease
with that of structs.CSIVolume.GCAlloc
. Do you have any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PastAllocs at this point is there to serve as a debugging tool. If there's a user sensible gap of several minutes between the reapVolumes that would detect the ReadAlloc + Terminal state and the eval reap that deletes the allocations, it's worth keeping both stages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PastAllocs at this point is there to serve as a debugging tool. If there's a user sensible gap of several minutes between the reapVolumes that would detect the ReadAlloc + Terminal state and the eval reap that deletes the allocations, it's worth keeping both stages.
I was trying to wrap up the RFC section to explain this, but PastAllocs
as it stands right now doesn't get us anything. When we get Node.UpdateAlloc
for a terminal alloc, we can't move that alloc out of Read/WriteAllocs
because that would make it eligible for scheduling before we've released the claim. At the very least we'd need to have a PastReadAllocs
and a PastWriteAllocs
and check that during scheduling, but right now we're not looking in PastAllocs
at all during scheduling.
Alternately, we could add the alloc to PastAllocs
but not remove it from ReadAllocs/WriteAllocs
until it's GC'd, but I'm not sure that helps us in any way that isn't better served by checking if the alloc is terminal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way you have it working is correct. Here's how I think it should go:
UpdateAlloc
informs the state store that the alloc is terminal- The volume and eval are marked for GC
- If it's unclaimed, we
ControllerUnpublish
- we
ClaimRelease
, and move the alloc toPastAllocs
EvalGCThreshold
later, the the eval is garbage collected, and on alloc GC we delete it fromPastAllocs
The gap between 4 & 5 is the user perceivable bit that PastAllocs
gets us, I think. I'm a bit fuzzy on the details of where the eval or job is marked for GC to actually reap the eval (and allocs) but I think that configured duration kicks in either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I get what you're proposing at least.
But when we GC the job/alloc we end up having to run the volume GC process anyways, because we don't have any guarantees they're not running concurrently (we can interleave transactions with the lengthy ControllerUnpublish
). So the PastAllocs
might be useful but could just as easily be removed instantly depending on timing, which makes for an unreliable debugging instrument that we have to pay for in extra raft transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, without anything currently consuming PastAllocs
I'm feeling extra-skeptical about its use at this stage of the design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's fair. We can always re-introduce it if we need to.
37af303
to
9333bcf
Compare
c06d075
to
6b219bd
Compare
8eb6aaa
to
5773db2
Compare
6b219bd
to
5a3a9b0
Compare
err = core.Process(eval) | ||
require.NoError(t, err) | ||
|
||
// Verify the claim was released |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PastAllocs at this point is there to serve as a debugging tool. If there's a user sensible gap of several minutes between the reapVolumes that would detect the ReadAlloc + Terminal state and the eval reap that deletes the allocations, it's worth keeping both stages.
e42f01e
to
a976066
Compare
When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.
a976066
to
a4a4b75
Compare
When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.
When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.
When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.
When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.
When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.
When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.
When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
When an alloc is marked terminal (and after node unstage/unpublish
have been called), the client syncs the terminal alloc state with the
server via
Node.UpdateAlloc RPC
.For each job that has a terminal alloc, the
Node.UpdateAlloc
RPChandler at the server will emit an eval for a new core job to garbage
collect CSI volume claims. When this eval is handled on the core
scheduler, it will call a
volumeReap
method to release the claimsfor all terminal allocs on the job.
The volume reap will issue a
ControllerUnpublishVolume
RPC for anyalloc that has volumes with a controller plugin. Once this returns (or
is skipped), the volume reap will send a new
CSIVolume.Claim
RPCthat releases the volume claim for that allocation in the state store,
making it available for scheduling again.
This same
volumeReap
method will be called from the core job GC,which gives us a second chance to reclaim volumes during GC if there
were controller RPC failures.