Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: allow more than 1 writer claim for multi-writer mode #9040

Merged
merged 3 commits into from
Oct 7, 2020

Conversation

tgross
Copy link
Member

@tgross tgross commented Oct 7, 2020

Fixes #8968

Fixes a bug where CSI volumes with the MULTI_NODE_MULTI_WRITER access mode
were using the same logic as MULTI_NODE_SINGLE_WRITER to determine whether
the volume had writer claims available for scheduling.

@vercel
Copy link

vercel bot commented Oct 7, 2020

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/hashicorp/nomad/ijjeq018w
✅ Preview: https://nomad-git-b-csi-multiwriter-count.hashicorp.vercel.app

@tgross tgross added this to the 0.13 milestone Oct 7, 2020
@tgross tgross requested review from notnoop and shoenig October 7, 2020 13:25
@tgross tgross force-pushed the b-csi-multiwriter-count branch from 617c0e7 to ba2eac6 Compare October 7, 2020 13:26
@tgross tgross force-pushed the b-csi-multiwriter-count branch from ba2eac6 to 51007f5 Compare October 7, 2020 13:36
Fixes a bug where CSI volumes with the `MULTI_NODE_MULTI_WRITER` access mode
were using the same logic as `MULTI_NODE_SINGLE_WRITER` to determine whether
the volume had writer claims available for scheduling.
// the CSI spec doesn't allow for setting a max number of writers.
// we track node resource exhaustion through v.ResourceExhausted
// which is checked in WriteSchedulable
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit unfortunate - but it makes sense. Does it make sense to have an higher level/integration/e2e test to test the semantics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it wasn't originally E2E tested was because very few CSI plugins actually allow this mode (typically on-prem solutions), so we can't feasibly do an E2E test for it. But we could probably add this dimension to one of the existing integration-style tests in the nomad package. Let me take a look at that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug into this a bit and that node resource exhaustion check isn't being made during the claim but during scheduling (in the feasibility checker), which in retrospect makes sense as we only validate arguments in an RPC and you can add resources to a cluster after the job submission has been made to make it feasible.

I've swapped one of the volumes in the existing feasibility checker tests to multiwriter and verified this code path is getting hit. I've also extended one of the existing RPC tests a bit to make sure the logic for ReadFreeClaims is being checked better while I'm in here. Our CSI E2E tests could use some work in general, but adding exhaustion checking would be a good idea for future work there.

@tgross tgross merged commit 7cff2de into master Oct 7, 2020
@tgross tgross deleted the b-csi-multiwriter-count branch October 7, 2020 14:43
fredrikhgrelland pushed a commit to fredrikhgrelland/nomad that referenced this pull request Oct 22, 2020
…9040)

Fixes a bug where CSI volumes with the `MULTI_NODE_MULTI_WRITER` access mode
were using the same logic as `MULTI_NODE_SINGLE_WRITER` to determine whether
the volume had writer claims available for scheduling.

Extends CSI claim endpoint test to exercise multi-reader and make sure `WriteFreeClaims`
is exercised for multi-writer in feasibility test.
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

multi-writer volume does not calculate free claims correctly
2 participants