-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi: allow more than 1 writer claim for multi-writer mode #9040
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/hashicorp/nomad/ijjeq018w |
617c0e7
to
ba2eac6
Compare
ba2eac6
to
51007f5
Compare
Fixes a bug where CSI volumes with the `MULTI_NODE_MULTI_WRITER` access mode were using the same logic as `MULTI_NODE_SINGLE_WRITER` to determine whether the volume had writer claims available for scheduling.
51007f5
to
87f62b5
Compare
// the CSI spec doesn't allow for setting a max number of writers. | ||
// we track node resource exhaustion through v.ResourceExhausted | ||
// which is checked in WriteSchedulable | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bit unfortunate - but it makes sense. Does it make sense to have an higher level/integration/e2e test to test the semantics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason it wasn't originally E2E tested was because very few CSI plugins actually allow this mode (typically on-prem solutions), so we can't feasibly do an E2E test for it. But we could probably add this dimension to one of the existing integration-style tests in the nomad
package. Let me take a look at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug into this a bit and that node resource exhaustion check isn't being made during the claim but during scheduling (in the feasibility checker), which in retrospect makes sense as we only validate arguments in an RPC and you can add resources to a cluster after the job submission has been made to make it feasible.
I've swapped one of the volumes in the existing feasibility checker tests to multiwriter and verified this code path is getting hit. I've also extended one of the existing RPC tests a bit to make sure the logic for ReadFreeClaims
is being checked better while I'm in here. Our CSI E2E tests could use some work in general, but adding exhaustion checking would be a good idea for future work there.
…9040) Fixes a bug where CSI volumes with the `MULTI_NODE_MULTI_WRITER` access mode were using the same logic as `MULTI_NODE_SINGLE_WRITER` to determine whether the volume had writer claims available for scheduling. Extends CSI claim endpoint test to exercise multi-reader and make sure `WriteFreeClaims` is exercised for multi-writer in feasibility test.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #8968
Fixes a bug where CSI volumes with the
MULTI_NODE_MULTI_WRITER
access modewere using the same logic as
MULTI_NODE_SINGLE_WRITER
to determine whetherthe volume had writer claims available for scheduling.