Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI: Expand Volume (Node edition) #18522

Merged
merged 1 commit into from
Sep 18, 2023
Merged

CSI: Expand Volume (Node edition) #18522

merged 1 commit into from
Sep 18, 2023

Conversation

gulducat
Copy link
Member

Follow-up to #18359, wrapping up the full feature requested in #10324 (sans docs)

After the controller plugin has expanded the disk at e.g. a cloud vendor, it may say that we also need to issue NodeExpandVolume for the node plugin to make the new disk space available to task(s) that have claims on the volume by e.g. expanding the filesystem on the node.

The main gap in test coverage is sendCSINodeRPC which I split out from ClientCSI.NodeDetachVolume, which was not actually tested before. I know that it works, but I'm open to (figuring out how to) add proper coverage there, amongst all the other layers with greater coverage, if it's a considerable concern.

@gulducat gulducat added the backport/1.6.x backport to 1.6.x release line label Sep 15, 2023
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good @gulducat! I've made a comment on nomad/csi_endpoint.go that probably needs some thinking through.

Comment on lines 1276 to 1279
// log instead of return err, because even if there are node failures,
// the real volume itself has been updated, and we should update raft
// up in Register/Create to reflect that new reality.
logger.Error("error from nodeExpandVolume", "error", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this in terms of consistency, but there aren't any obviously good answers other than writing intermediate states to Raft:

  • If we don't successfully Node Expand, then we'll record that the volume has expanded to the state store but it won't have had any file system resize operations completed.
  • The user can't then just run Register again, because the state store will have the updated size and won't detect that as a change.
  • Or, if we didn't write the changed volume size to the state store, the user could Register again but that would force us to send the Controller Expand again. Can we do that that safely?

Looking at the possible errors that can be returned by Node Expand, there are some errors that the operators needs to see to change their request. Those errors wouldn't be surfaced to the operator (they may not have access to the Nomad server logs).

Copy link
Member Author

@gulducat gulducat Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've had awkward feelings about this too -- both that the user won't see it (it appears as a successful API call), and if something actually could be done to correct it, there's no mechanism to do so.

Looking at the possible errors that can be returned by Node Expand, there are some errors that the operators needs to see to change their request.

I thiiink in reality, the values that the user can set (only min and max, maybe capabilities?) should be effectively validated by us and by the controller, before we move along to node expansion. I think most of the validation in this PR is to wrap it in the expectations of the spec, much of which is redundant by this point in the process, as far as user input goes. These validations are more to reveal bugs from programmer error.

From what I can tell, the spec does not say how controller plugins should handle repeated identical calls, but for example, AWS has a pretty brutal rate limit on modifying EBS volumes (once per 6 hours(!)), so I erred on the side of caution there.

But I suppose I could just let it happen, if that's what the controller wants to do, and let the user see it?

As is, the hacky escape hatch for that would be to increment min_capacity by 1 <byte or unit> to kick the whole thing again from the start, which could definitely be painful if you're having to wait 6 hours between attempts 😬

I could add new field(s) to the volume, to track intermediate state as you say, then retry only the node side of the equation if min_capacity = real capacity but the last attempt at node expansion was not successful. or maybe new field(s) on the claim? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think we're in luck. From the spec on ControllerExpandVolume:

This operation MUST be idempotent. If a volume corresponding to the specified volume ID is already larger than or equal to the target capacity of the expansion request, the plugin SHOULD reply 0 OK.

So I think it's safe for us to return the NodeExpandVolume error here and not write the change to the Nomad state. That way the user can retry and if we end up sending the request to the controller plugin again, no worries.

Copy link
Member

@tgross tgross Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we have more than one claim so that we end up with multiple NodeExpandVolume but only one of them fails, no worries there either, from the NodeExpandVolume spec:

This operation MUST be idempotent. If a volume corresponding to the specified volume ID is already larger than or equal to the target capacity of the expansion request, the plugin SHOULD reply 0 OK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gee wiz, how did I miss that line? Alrighty, I'll go ahead and try that on a real EBS volume just out of paranoia, but that would sure be a convenient state of affairs!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, very good, the EBS controller plugin logs

cloud.go:1050] [Debug] Volume "vol-0eb67494a47872eb3" current size (15 GiB) is greater or equal to the new size (15 GiB)

so looks to behave its idempotent self.

And returning an error here, if a node plugin is offline:

$ nomad volume register ebs-vol.hcl
Error registering volume: Unexpected response code: 500 (unable to update volume: 1 error occurred:
        * CSI.NodeExpandVolume error: csi-node plugin 'aws-ebs0' did not become ready: context deadline exceeded)

which is quite useful -- hopefully enough for the operator to understand that they may need to check on their node plugin? Then re-running the create/register tries everything again, since state was not updated in the error case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweeeet, thanks for calling this out!

Unless there's anything else for me to address in this PR, mind pressing Approve for me?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgot to do that. Done!


func (a *ClientCSI) sendCSINodeRPC(nodeID, method, fwdMethod, op string, args any, reply any) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

client/pluginmanager/csimanager/testing.go Show resolved Hide resolved
following ControllerExpandVolume
in c6dbba7,
which expands the disk at e.g. a cloud vendor,
the controller plugin may say that we also need
to issue NodeExpandVolume for the node plugin to
make the new disk space available to task(s) that
have claims on the volume by e.g. expanding
the filesystem on the node.

csi spec:
https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#nodeexpandvolume
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.6.x backport to 1.6.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants