-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi: plugins track jobs in addition to allocations, and use job information to set expected counts #8699
Conversation
509e152
to
933524f
Compare
933524f
to
4ed3f86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks like a good change. I've left a question about how the scheduler considers the plugin updates from job summaries.
The test code we've added here covers the low-level bits pretty well, but I feel like with the size of this changeset we'd benefit from having test coverage at the "boundary" of the RPC endpoints in the nomad
package (maybe at nomad/csi_endpoint.go
or nomad/job_endpoint.go
?)
comment in english Co-authored-by: Tim Gross <[email protected]>
Well, we'd definitely benefit from the bigger test! This change introduces an edge case where a plugin has been created by a job and that job is deleted before any allocation fingerprints make it back to the state store. The plugin will exist, but not have any allocations to use for GC. Working on a fix now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we've done some end-to-end testing of this, it looks ok to me. I've left a question about the expected behavior around updates.
End to end is looking good. |
The CSI HTTP API has to transform the CSI volume to redact secrets, remove the claims fields, and to consolidate the allocation stubs into a single slice of alloc stubs. This was done manually in #8590 but this is a large amount of code and has proven both very bug prone (see #8659, #8666, #8699, #8735, and #12150) and requires updating lots of code every time we add a field to volumes or plugins. In #10202 we introduce encoding improvements for the `Node` struct that allow a more minimal transformation. Apply this same approach to serializing `structs.CSIVolume` to API responses. Also, the original reasoning behind #8590 for plugins no longer holds because the counts are now denormalized within the state store, so we can simply remove this transformation entirely.
The CSI HTTP API has to transform the CSI volume to redact secrets, remove the claims fields, and to consolidate the allocation stubs into a single slice of alloc stubs. This was done manually in #8590 but this is a large amount of code and has proven both very bug prone (see #8659, #8666, #8699, #8735, and #12150) and requires updating lots of code every time we add a field to volumes or plugins. In #10202 we introduce encoding improvements for the `Node` struct that allow a more minimal transformation. Apply this same approach to serializing `structs.CSIVolume` to API responses. Also, the original reasoning behind #8590 for plugins no longer holds because the counts are now denormalized within the state store, so we can simply remove this transformation entirely.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Expected counts are derived from jobs, we may expect plugins for which
we have no valid fingerprints and therefore no allocations.
On job update, update plugin job collection and re-count expected
System jobs expected count is the sum of currently running
allocations + blocked evals. Blocked evals number will improve in
accuracy with blocking that accounts for driver start time.
It is called in
updateJobSummaryByAllocation
This does require keeping the expected count indexed by jobID so
that we can update on each allocation change.
Plugin emptiness accounts for jobs
Closes #8503 See also #7974