-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI: Expand Volume (Controller only) #18359
Conversation
6044d21
to
1ecc3aa
Compare
CSIControllerQuery | ||
} | ||
|
||
func (req *ClientCSIControllerExpandVolumeRequest) ToCSIRequest() *csi.ControllerExpandVolumeRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to perform a nil check on the req
here, or is that always non-nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see some other nil checks in similar methods in this file, so maybe there's a reason for them? The only place this one is called (currently) already assumes non-nil to get req.PluginID
, so I think possible-nil-ness would be bad there too, unless checked prior to that? I'm not sure how much our RPC boundaries increase the possibility of that kind of nil pointer...
if vol == nil || plugin == nil || capacity == nil { | ||
return errors.New("unexpected nil value") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to expand (hehe) on this error? I wonder how these nil values can manifest, and if they are user related, whether we can provide a more actionable message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should never happen -- this is my fear of panics showing. I actually wonder if a panic would be preferable? because at least then it'd provide a full stack trace if there is an unexpected bug...
but server panics are especially scary to me, and extra-especially in a spot like this, which might not get tripped until some poor operator is trying to do a reasonable thing but ends up taking down a whole server, which might be concurrently doing other important things. 😬
@@ -228,7 +231,7 @@ func (v *CSIVolume) Get(args *structs.CSIVolumeGetRequest, reply *structs.CSIVol | |||
return v.srv.blockingRPC(&opts) | |||
} | |||
|
|||
func (v *CSIVolume) pluginValidateVolume(req *structs.CSIVolumeRegisterRequest, vol *structs.CSIVolume) (*structs.CSIPlugin, error) { | |||
func (v *CSIVolume) pluginValidateVolume(vol *structs.CSIVolume) (*structs.CSIPlugin, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm looking at this signature, I'm realizing this is a very silly method name. I suspect at one time it did validate the volume somehow, but we never changed the name?
1ecc3aa
to
713b270
Compare
8ca10be
to
e7d689f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
the first half of volume expansion, this allows a user to update requested capacity ("capacity_min" and "capacity_max") in a volume specification file, and re-issue either Register or Create volume commands (or api calls). the requested capacity will now be "reconciled" with the current real capacity of the volume, issuing a ControllerExpandVolume RPC call to a running controller plugin, if requested "capacity_min" is higher than the current capacity on the volume in state. csi spec: https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#controllerexpandvolume note: this does not yet cover NodeExpandVolume
e7d689f
to
f71a248
Compare
Addresses #10324 with declarative config (instead of a new imperative command/api route).
I still need to update docs, but to trigger a volume increase, modify the volume specification that was used initially to create/register the volume with a higher
capacity_min
(and optionallycapacity_max
), then issuecreate
orregister
again.This introduces a comparatively very slow process into the create/register flow (if expand is actually needed), and modifies the behavior of
create
to be more idempotent than it currently is (presently it always issues create to the controller plugin).I hemmed and hawed a lot on where the validation logic should live, so I'd be happy to hear any suggestions to move any of it. I mainly want to avoid putting it right before writing to raft, because by that point reality (a real live disk in the world) may have already changed.
The commit history is messy right now; I intend to squash it before merging.
So far this covers
ControllerExpandVolume
to increase volumes outside of the instance (e.g. an AWS EBS volume), but does not yet complete the process (when required) withNodeExpandVolume
on the node where the volume is mounted. I think I'll make that a separate PR.