CSI: Expand Volume (Node edition) #18522

gulducat · 2023-09-15T18:27:14Z

Follow-up to #18359, wrapping up the full feature requested in #10324 (sans docs)

After the controller plugin has expanded the disk at e.g. a cloud vendor, it may say that we also need to issue NodeExpandVolume for the node plugin to make the new disk space available to task(s) that have claims on the volume by e.g. expanding the filesystem on the node.

The main gap in test coverage is sendCSINodeRPC which I split out from ClientCSI.NodeDetachVolume, which was not actually tested before. I know that it works, but I'm open to (figuring out how to) add proper coverage there, amongst all the other layers with greater coverage, if it's a considerable concern.

tgross

This is looking really good @gulducat! I've made a comment on nomad/csi_endpoint.go that probably needs some thinking through.

tgross · 2023-09-15T18:37:34Z

nomad/csi_endpoint.go

+			// log instead of return err, because even if there are node failures,
+			// the real volume itself has been updated, and we should update raft
+			// up in Register/Create to reflect that new reality.
+			logger.Error("error from nodeExpandVolume", "error", err)


I'm not sure about this in terms of consistency, but there aren't any obviously good answers other than writing intermediate states to Raft:

If we don't successfully Node Expand, then we'll record that the volume has expanded to the state store but it won't have had any file system resize operations completed.

The user can't then just run Register again, because the state store will have the updated size and won't detect that as a change.

Or, if we didn't write the changed volume size to the state store, the user could Register again but that would force us to send the Controller Expand again. Can we do that that safely?

Looking at the possible errors that can be returned by Node Expand, there are some errors that the operators needs to see to change their request. Those errors wouldn't be surfaced to the operator (they may not have access to the Nomad server logs).

Yeah I've had awkward feelings about this too -- both that the user won't see it (it appears as a successful API call), and if something actually could be done to correct it, there's no mechanism to do so.

Looking at the possible errors that can be returned by Node Expand, there are some errors that the operators needs to see to change their request.

I thiiink in reality, the values that the user can set (only min and max, maybe capabilities?) should be effectively validated by us and by the controller, before we move along to node expansion. I think most of the validation in this PR is to wrap it in the expectations of the spec, much of which is redundant by this point in the process, as far as user input goes. These validations are more to reveal bugs from programmer error.

From what I can tell, the spec does not say how controller plugins should handle repeated identical calls, but for example, AWS has a pretty brutal rate limit on modifying EBS volumes (once per 6 hours(!)), so I erred on the side of caution there.

But I suppose I could just let it happen, if that's what the controller wants to do, and let the user see it?

As is, the hacky escape hatch for that would be to increment min_capacity by 1 <byte or unit> to kick the whole thing again from the start, which could definitely be painful if you're having to wait 6 hours between attempts 😬

I could add new field(s) to the volume, to track intermediate state as you say, then retry only the node side of the equation if min_capacity = real capacity but the last attempt at node expansion was not successful. or maybe new field(s) on the claim? 🤔

Oh, I think we're in luck. From the spec on ControllerExpandVolume:

This operation MUST be idempotent. If a volume corresponding to the specified volume ID is already larger than or equal to the target capacity of the expansion request, the plugin SHOULD reply 0 OK.

So I think it's safe for us to return the NodeExpandVolume error here and not write the change to the Nomad state. That way the user can retry and if we end up sending the request to the controller plugin again, no worries.

And if we have more than one claim so that we end up with multiple NodeExpandVolume but only one of them fails, no worries there either, from the NodeExpandVolume spec:

This operation MUST be idempotent. If a volume corresponding to the specified volume ID is already larger than or equal to the target capacity of the expansion request, the plugin SHOULD reply 0 OK.

Gee wiz, how did I miss that line? Alrighty, I'll go ahead and try that on a real EBS volume just out of paranoia, but that would sure be a convenient state of affairs!

Yep, very good, the EBS controller plugin logs

cloud.go:1050] [Debug] Volume "vol-0eb67494a47872eb3" current size (15 GiB) is greater or equal to the new size (15 GiB)

so looks to behave its idempotent self.

And returning an error here, if a node plugin is offline:

$ nomad volume register ebs-vol.hcl Error registering volume: Unexpected response code: 500 (unable to update volume: 1 error occurred: * CSI.NodeExpandVolume error: csi-node plugin 'aws-ebs0' did not become ready: context deadline exceeded)

which is quite useful -- hopefully enough for the operator to understand that they may need to check on their node plugin? Then re-running the create/register tries everything again, since state was not updated in the error case.

I squashed the change and force-pushed, but I added this test:
76a5e55#diff-214766af461ea207f297f2904489f01ac55b814950f301577516c1037501396fR1919-R1931

and it passes with the error returned here:
76a5e55#diff-6a3c30b17bb256ab9e3d5bab1620c3b9afee29aaa06371eae6cae0951a2fb12aL1275-R1275

Sweeeet, thanks for calling this out!

Unless there's anything else for me to address in this PR, mind pressing Approve for me?

Oops, forgot to do that. Done!

tgross · 2023-09-15T18:39:21Z

nomad/client_csi_endpoint.go


+func (a *ClientCSI) sendCSINodeRPC(nodeID, method, fwdMethod, op string, args any, reply any) error {


client/pluginmanager/csimanager/testing.go

following ControllerExpandVolume in c6dbba7, which expands the disk at e.g. a cloud vendor, the controller plugin may say that we also need to issue NodeExpandVolume for the node plugin to make the new disk space available to task(s) that have claims on the volume by e.g. expanding the filesystem on the node. csi spec: https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#nodeexpandvolume

gulducat added the backport/1.6.x backport to 1.6.x release line label Sep 15, 2023

gulducat requested review from tgross and Juanadelacuesta September 15, 2023 18:27

tgross reviewed Sep 15, 2023

View reviewed changes

gulducat force-pushed the csi-node-expand branch from a2069a5 to 76a5e55 Compare September 15, 2023 20:56

vercel bot deployed to Preview – nomad-storybook-and-ui September 15, 2023 20:59 View deployment

tgross approved these changes Sep 18, 2023

View reviewed changes

gulducat merged commit 4895d70 into main Sep 18, 2023

gulducat deleted the csi-node-expand branch September 18, 2023 15:30

hc-github-team-nomad-core mentioned this pull request Sep 18, 2023

Backport of CSI: Expand Volume (Node edition) into release/1.6.x #18531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSI: Expand Volume (Node edition) #18522

CSI: Expand Volume (Node edition) #18522

gulducat commented Sep 15, 2023

tgross left a comment

tgross Sep 15, 2023

gulducat Sep 15, 2023 •

edited

Loading

tgross Sep 15, 2023

tgross Sep 15, 2023 •

edited

Loading

gulducat Sep 15, 2023

gulducat Sep 15, 2023

gulducat Sep 15, 2023

tgross Sep 18, 2023

gulducat Sep 18, 2023

tgross Sep 18, 2023

tgross Sep 15, 2023


		func (a *ClientCSI) sendCSINodeRPC(nodeID, method, fwdMethod, op string, args any, reply any) error {

CSI: Expand Volume (Node edition) #18522

CSI: Expand Volume (Node edition) #18522

Conversation

gulducat commented Sep 15, 2023

tgross left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gulducat Sep 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgross Sep 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gulducat Sep 15, 2023 •

edited

Loading

tgross Sep 15, 2023 •

edited

Loading