-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI: Expand Volume (Node edition) #18522
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good @gulducat! I've made a comment on nomad/csi_endpoint.go
that probably needs some thinking through.
nomad/csi_endpoint.go
Outdated
// log instead of return err, because even if there are node failures, | ||
// the real volume itself has been updated, and we should update raft | ||
// up in Register/Create to reflect that new reality. | ||
logger.Error("error from nodeExpandVolume", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this in terms of consistency, but there aren't any obviously good answers other than writing intermediate states to Raft:
- If we don't successfully Node Expand, then we'll record that the volume has expanded to the state store but it won't have had any file system resize operations completed.
- The user can't then just run Register again, because the state store will have the updated size and won't detect that as a change.
- Or, if we didn't write the changed volume size to the state store, the user could Register again but that would force us to send the Controller Expand again. Can we do that that safely?
Looking at the possible errors that can be returned by Node Expand, there are some errors that the operators needs to see to change their request. Those errors wouldn't be surfaced to the operator (they may not have access to the Nomad server logs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've had awkward feelings about this too -- both that the user won't see it (it appears as a successful API call), and if something actually could be done to correct it, there's no mechanism to do so.
Looking at the possible errors that can be returned by Node Expand, there are some errors that the operators needs to see to change their request.
I thiiink in reality, the values that the user can set (only min and max, maybe capabilities?) should be effectively validated by us and by the controller, before we move along to node expansion. I think most of the validation in this PR is to wrap it in the expectations of the spec, much of which is redundant by this point in the process, as far as user input goes. These validations are more to reveal bugs from programmer error.
From what I can tell, the spec does not say how controller plugins should handle repeated identical calls, but for example, AWS has a pretty brutal rate limit on modifying EBS volumes (once per 6 hours(!)), so I erred on the side of caution there.
But I suppose I could just let it happen, if that's what the controller wants to do, and let the user see it?
As is, the hacky escape hatch for that would be to increment min_capacity by 1 <byte or unit> to kick the whole thing again from the start, which could definitely be painful if you're having to wait 6 hours between attempts 😬
I could add new field(s) to the volume, to track intermediate state as you say, then retry only the node side of the equation if min_capacity = real capacity
but the last attempt at node expansion was not successful. or maybe new field(s) on the claim? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think we're in luck. From the spec on ControllerExpandVolume
:
This operation MUST be idempotent. If a volume corresponding to the specified volume ID is already larger than or equal to the target capacity of the expansion request, the plugin SHOULD reply 0 OK.
So I think it's safe for us to return the NodeExpandVolume
error here and not write the change to the Nomad state. That way the user can retry and if we end up sending the request to the controller plugin again, no worries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if we have more than one claim so that we end up with multiple NodeExpandVolume
but only one of them fails, no worries there either, from the NodeExpandVolume
spec:
This operation MUST be idempotent. If a volume corresponding to the specified volume ID is already larger than or equal to the target capacity of the expansion request, the plugin SHOULD reply 0 OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gee wiz, how did I miss that line? Alrighty, I'll go ahead and try that on a real EBS volume just out of paranoia, but that would sure be a convenient state of affairs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, very good, the EBS controller plugin logs
cloud.go:1050] [Debug] Volume "vol-0eb67494a47872eb3" current size (15 GiB) is greater or equal to the new size (15 GiB)
so looks to behave its idempotent self.
And returning an error here, if a node plugin is offline:
$ nomad volume register ebs-vol.hcl
Error registering volume: Unexpected response code: 500 (unable to update volume: 1 error occurred:
* CSI.NodeExpandVolume error: csi-node plugin 'aws-ebs0' did not become ready: context deadline exceeded)
which is quite useful -- hopefully enough for the operator to understand that they may need to check on their node plugin? Then re-running the create/register tries everything again, since state was not updated in the error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I squashed the change and force-pushed, but I added this test:
76a5e55#diff-214766af461ea207f297f2904489f01ac55b814950f301577516c1037501396fR1919-R1931
and it passes with the error returned here:
76a5e55#diff-6a3c30b17bb256ab9e3d5bab1620c3b9afee29aaa06371eae6cae0951a2fb12aL1275-R1275
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweeeet, thanks for calling this out!
Unless there's anything else for me to address in this PR, mind pressing Approve for me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, forgot to do that. Done!
|
||
func (a *ClientCSI) sendCSINodeRPC(nodeID, method, fwdMethod, op string, args any, reply any) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
following ControllerExpandVolume in c6dbba7, which expands the disk at e.g. a cloud vendor, the controller plugin may say that we also need to issue NodeExpandVolume for the node plugin to make the new disk space available to task(s) that have claims on the volume by e.g. expanding the filesystem on the node. csi spec: https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#nodeexpandvolume
a2069a5
to
76a5e55
Compare
Follow-up to #18359, wrapping up the full feature requested in #10324 (sans docs)
After the controller plugin has expanded the disk at e.g. a cloud vendor, it may say that we also need to issue NodeExpandVolume for the node plugin to make the new disk space available to task(s) that have claims on the volume by e.g. expanding the filesystem on the node.
The main gap in test coverage is
sendCSINodeRPC
which I split out fromClientCSI.NodeDetachVolume
, which was not actually tested before. I know that it works, but I'm open to (figuring out how to) add proper coverage there, amongst all the other layers with greater coverage, if it's a considerable concern.