Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ControllerModifyVolume Status API #4298

Merged
merged 4 commits into from
Oct 31, 2023

Conversation

sunnylovestiramisu
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Oct 17, 2023
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 17, 2023
@sunnylovestiramisu sunnylovestiramisu force-pushed the kep-3751 branch 2 times, most recently from 406e8eb to dc6c59c Compare October 17, 2023 21:11
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 17, 2023
@sunnylovestiramisu sunnylovestiramisu changed the title [WIP] Update ControllerModifyVolume Status API Update ControllerModifyVolume Status API Oct 17, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 17, 2023
@sunnylovestiramisu
Copy link
Contributor Author

/assign @msau42

@@ -177,22 +177,27 @@ type PersistentVolumeClaimStatus struct {
...
// The VolumeAttributesClassName string cannot be empty
VolumeAttributesClassName string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this field since it is now moved to actualclassname

...
}

type ModifyVolumeStatus struct {
ActualClassName string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace Class with VolumeAttributesClass

// +enum
type PersistentVolumeClaimModifyVolumeStatus string

const (
// When modify volume is complete, the empty string is set by modify volume controller.
PersistentVolumeClaimNoModifyVolumeInProgress PersistentVolumeClaimModifyVolumeStatus = ""
PersistentVolumeClaimNoModifyVolumeInSucceeded PersistentVolumeClaimModifyVolumeStatus = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's either succeeded or not started. I don't know if we need a const for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between a nil value and an empty value? the Status field of the ModifyVolumeStatus is pointer type.

...
}

type ModifyVolumeStatus struct {
ActualClassName string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether we should actually make this field an object reference instead of just a name.

I am thinking of the scenario where someone may delete and recreate the VAC with the same name but different content. But actually, if we really want to keep track of the attributes are, we may need to keep track of it in the PV object. @jsafrane @gnufied wdyt?

Copy link
Contributor Author

@sunnylovestiramisu sunnylovestiramisu Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is uuid enough to distinguish instead of doing comparison of the whole VAC object?

Usually the protection controller will prevent the VAC to be deleted but an admin can force delete.

@msau42
Copy link
Member

msau42 commented Oct 17, 2023

/assign @gnufied

// +enum
type PersistentVolumeClaimModifyVolumeStatus string

const (
// When modify volume is complete, the empty string is set by modify volume controller.
PersistentVolumeClaimNoModifyVolumeInProgress PersistentVolumeClaimModifyVolumeStatus = ""
PersistentVolumeClaimNoModifyVolumeInSucceeded PersistentVolumeClaimModifyVolumeStatus = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between a nil value and an empty value? the Status field of the ModifyVolumeStatus is pointer type.

keps/sig-storage/3751-volume-attributes-class/README.md Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, If succeed, pvc.ModifyVolumeStatus.ActualClassName should be updated to pvc.ModifyVolumeStatus.TargetClassName and pvc.ModifyVolumeStatus.Status should be changed to nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes both the status and actualClassName will be set as you said.

@carlory
Copy link
Member

carlory commented Oct 18, 2023

A user creates a pvc with a VolumeAttributesClassName set. if the parameters of the PVC are not accepted by the lower version csi-driver but are accepted by the higher version csi-driver, the status of the PVC may be stuck in infeasible state. If the admin upgrades the csi-driver to the higher version, the PVC how to recover the PVC to user expected state? To modify the PVC to a different VolumeAttributesClass which parameters is same as the previous one is not a good UX for the user.

@msau42
Copy link
Member

msau42 commented Oct 18, 2023

A user creates a pvc with a VolumeAttributesClassName set. if the parameters of the PVC are not accepted by the lower version csi-driver but are accepted by the higher version csi-driver, the status of the PVC may be stuck in infeasible state. If the admin upgrades the csi-driver to the higher version, the PVC how to recover the PVC to user expected state? To modify the PVC to a different VolumeAttributesClass which parameters is same as the previous one is not a good UX for the user.

I am thinking that even in the Infeasible case, we will still have to periodically retry, but at a lower frequency than if it was InProgress

@gnufied
Copy link
Member

gnufied commented Oct 18, 2023

The ModifyStatus field on the image diagram is not in right place.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 20, 2023
const (
...
// PersistentVolumeClaimModifyVolume - a user trigger modify volume of pvc has been started
PersistentVolumeClaimModifyVolume PersistentVolumeClaimConditionType = "ModifyVolume"
Copy link
Member

@carlory carlory Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding this name of the new condition type. Is this saying that all the moditication of the volume which includes the reszie operation or just for changing volume attributes? If it's the latter, I think we should rename it to something like ModifyVolumeAttributes or ModifyVolumeAttributesClass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a strong opinion of the naming, I was following the previous naming by using actions instead of a noun. This condition is used to document the last error the controller sees.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in the sig-storage meeting, we will name it "VolumeAttriburesModifyError"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not going to change status when we retry ModifyVolume after failing (and status is set to Infeasible), then we need to somehow communicate to the user that, we are again retrying ModifyVolume (albeit slowly), so we will I think need a condition for that too. cc @msau42

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do you propose we set a ModifyVolume condition to true and false? From api-conventions, it says:

For state transitions which take a long period of time (e.g. more than 1 minute), it is reasonable to treat the transition itself as an observed state. In these cases, the Condition (such as "Resizing") itself should not be transient, and should instead be signalled using the True/False/Unknown pattern. This allows other observers to determine the last update from the controller, whether successful or failed. In cases where the state transition is unable to complete and continued reconciliation is not feasible, the Reason and Message should be used to indicate that the transition failed.

I interpret the last sentence to mean that if we still intend to retry, the status should still be True.

So the main benefits I see of having the condition is:

  1. that the timestamp can be updated every time we retry an operation.
  2. we could potentially return false for some preconditions like VAC doesn't exist or is invalid, PVC not bound, etc.

// +enum
// When a controller receives persistentvolume claim update with PersistentVolumeClaimModifyVolumeStatus for a resource
// that it does not recognizes, then it should ignore that update and let other controllers
// handle it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case the controller cannot recognize the status value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not happen but like if someone setting their own status in a fork etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is for telling consumers that how to handle the scenario where we may add new statuses in the future. I think it's simpler to reword it like "New statuses can be added in the future. Consumers should check for unknown statuses and fail appropriately".

type PersistentVolumeClaimModifyVolumeStatus string

const (
// When modify volume is complete, the empty string is set by modify volume controller.
PersistentVolumeClaimNoModifyVolumeInProgress PersistentVolumeClaimModifyVolumeStatus = ""
// State set when modify volume controller wants to modify the volume in control-plane
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the modify volume controller? pv controller in kube-controller-manager or CSI driver controller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is the resize controller actually, we are putting this modify volume controller logic in the external-resizer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to specify exactly which controller is handling what. That's more of an implementation detail. I think we can reword it to say something like "Pending indicates that the PersistentVolumeClaim cannot be modified due to requirements not being met, such as the PersistentVolumeClaim being in an invalid state or the specified VolumeAttributesClass is existing"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto below. I think we can remove references to specific controllers in the comments.

// State set when modify volume controller starts modifying the volume in control-plane
// When the request has been rejected as invalid by the CSI driver. To resolve this error,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here what?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should apply to the "Infeasible" one, not "InProgress"

// parameters it intends to modify. SPs MUST NOT modify volumes based
// on the absence of keys, only keys that are specified should result
// in modifications to the volume.
map<string, string> mutable_parameters = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the parameters field of the VAC optional? If the mutable_parameters is required, the field in the VAC definition may be required as well and at least contains one key.

from chatgpt:

企业微信截图_31b09e72-e351-4243-b8c3-dcdd7e993fb8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m curious why the required and optional keyword is not used in fields in the csi proto file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add it, but that is the CSI spec update. The CSI Spec release has been cut already. We can revisit at next CSI meeting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in the CSI spec, there are some extra paragraph I just did not copy pasted here because it is stand alone:

// Plugin specific creation-time parameters passed in as opaque 
  // key-value pairs. These mutable_parameteres MAY also be
  // changed during the lifetime of the volume via a subsequent
  // `ControllerModifyVolume` RPC. This field is OPTIONAL.
  // The Plugin is responsible for parsing and validating these
  // parameters. COs will treat these as opaque.

Please refer to the CSI spec for accuracy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also in CSI spec:

message ControllerModifyVolumeRequest {
  option (alpha_message) = true;

  // Contains identity information for the existing volume.
  // This field is REQUIRED.
  string volume_id = 1;

  // Secrets required by plugin to complete modify volume request.
  // This field is OPTIONAL. Refer to the `Secrets Requirements`
  // section on how to use this field.
  map<string, string> secrets = 2 [(csi_secret) = true];

  // Plugin specific volume attributes to mutate, passed in as 
  // opaque key-value pairs. 
  // This field is REQUIRED. The Plugin is responsible for
  // parsing and validating these parameters. COs will treat these
  // as opaque. The CO SHOULD specify the intended values of all mutable
  // parameters it intends to modify. SPs MUST NOT modify volumes based
  // on the absence of keys, only keys that are specified should result
  // in modifications to the volume.
  map<string, string> mutable_parameters = 3;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we defer this to CSI meeting in general? This PR is focusing on unblocking the k8s API change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in CreateVolume it is optional, but in ModifyVolume it is required. I think that makes sense.

From a k8s POV, if a user doesn't specify any vacName in the PVC, then that results in empty mutable_parameters.

A VAC having empty parameters doesn't make sense. If there is nothing to change then why would someone specify a vacName in their PVC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to remove this session in the doc and please refer to the CSI spec.md

// Condition is used to document the last error controller sees
const (
...
// PersistentVolumeClaimVolumeAttriburesModifyError - a user trigger modify volume of pvc has been started
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment needs to be updated

// ModifyVolumeStatus represents the status object of ControllerModifyVolume operation
type ModifyVolumeStatus struct {
// ActualVolumeAttributesClassName is the actual name of the VolumeAttributesClass the PVC is using
ActualVolumeAttributesClassName string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "Actual", I would use "Current" to be consistent with how other resources specify current status (like StatefulSet.status.currentReplicas)

type ModifyVolumeStatus struct {
// ActualVolumeAttributesClassName is the actual name of the VolumeAttributesClass the PVC is using
ActualVolumeAttributesClassName string
// TargetVolumeAttributesClassName is the target name of the VolumeAttributesClass the PVC wants to use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite what was requested in the PVC. It is what is currently being reconciled, which may not necessarily match. For example, if user sets PVC.spec.vacName = A, then B, then C, we may still being trying to reconcile B even though the user has set C.

// +enum
// When a controller receives persistentvolume claim update with PersistentVolumeClaimModifyVolumeStatus for a resource
// that it does not recognizes, then it should ignore that update and let other controllers
// handle it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is for telling consumers that how to handle the scenario where we may add new statuses in the future. I think it's simpler to reword it like "New statuses can be added in the future. Consumers should check for unknown statuses and fail appropriately".

type PersistentVolumeClaimModifyVolumeStatus string

const (
// When modify volume is complete, the empty string is set by modify volume controller.
PersistentVolumeClaimNoModifyVolumeInProgress PersistentVolumeClaimModifyVolumeStatus = ""
// State set when modify volume controller wants to modify the volume in control-plane
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to specify exactly which controller is handling what. That's more of an implementation detail. I think we can reword it to say something like "Pending indicates that the PersistentVolumeClaim cannot be modified due to requirements not being met, such as the PersistentVolumeClaim being in an invalid state or the specified VolumeAttributesClass is existing"

// State set when modify volume controller starts modifying the volume in control-plane
// When the request has been rejected as invalid by the CSI driver. To resolve this error,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should apply to the "Infeasible" one, not "InProgress"

type PersistentVolumeClaimModifyVolumeStatus string

const (
// When modify volume is complete, the empty string is set by modify volume controller.
PersistentVolumeClaimNoModifyVolumeInProgress PersistentVolumeClaimModifyVolumeStatus = ""
// State set when modify volume controller wants to modify the volume in control-plane
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto below. I think we can remove references to specific controllers in the comments.

@@ -409,12 +449,25 @@ The resource quota controller is the only component capable of monitoring and re
### 3. Add new statuses in PVC API to indicate changes of VolumeAttributesClass and the status of the ModifyVolume operation.

```
type VolumeAttributesClassStatus string
type ModifyVolumeStatus struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would save this for a future cleanup, but I'm not sure why we want to define the api definitions in multiple places in the doc. It means if we want to make changes like this, we need to remember to modify it everywhere.

PersistentVolumeClaimNoModifyVolumeInProgress PersistentVolumeClaimModifyVolumeStatus = ""
// State set when modify volume controller wants to modify the volume in control-plane
// but the VolumeAttributesClass does not exist.
PersistentVolumeClaimControllerModifyVolumePending PersistentVolumeClaimModifyVolumeStatus = "ControllerModifyVolumePending"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we need the "Controller" in the name since we're not going to support Node operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not going to but will we ever consider supporting node operations?

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 24, 2023
@carlory
Copy link
Member

carlory commented Oct 25, 2023

@sunnylovestiramisu

Accroding to the discussion on the API PR, we also need to change the VolumeAttributesClassName description in the PVC and PV definition.

A VAC having empty parameters doesn't make sense. If there is nothing to change then why would someone specify a vacName in their PVC?

On the other hand, we need change the parameters field of the VolumeAttributesClass to required from optional. Users must provide at least one key-value pair for the parameters field.

// TargetVolumeAttributesClassName is the name of the VolumeAttributesClass the PVC currently being reconciled
TargetVolumeAttributesClassName string
// Status is the status of the ControllerModifyVolume operation
Status *PersistentVolumeClaimModifyVolumeStatus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunnylovestiramisu @msau42

Why use a pointer type? Do we want to distinguish a nil value and a zero value? If so, why? If not, why not use a non-pointer type?

Copy link
Contributor Author

@sunnylovestiramisu sunnylovestiramisu Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@sunnylovestiramisu sunnylovestiramisu Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually this should not be a pointer. It should be consistent of the AllocatedResourceStatus

VolumeAttributesClassName string
ModifyVolumeStatus *PersistentVolumeClaimModifyVolumeStatus
// ModifyVolumeStatus represents the status object of ControllerModifyVolume operation
ModifyVolumeStatus ModifyVolumeStatus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const (
...
// PersistentVolumeClaimVolumeAttriburesModifyError - a user trigger modify volume of pvc has error
PersistentVolumeClaimVolumeAttriburesModifyError PersistentVolumeClaimConditionType = "VolumeAttriburesModifyError"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also want to add the "VolumeAttributesModifying" condition based on this discussion with @gnufied #4298 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a modifying? We cannot just say "Err but we are retrying" in the messaging?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does an operator that is possibly monitoring the pvc knows volume is being modified or not? By regexp. for message on the condition? Seems fragile.

...
}

// ModifyVolumeStatus represents the status object of ControllerModifyVolume operation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't want to distinct the ControllerModifyVolume and XXXModifyVolume, Is it possible to drop the Controller prefix in the ModifyVolumeStatus description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already dropped in the newest revision?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ControllerModifyVolume word still exists in L197 and L201

@msau42
Copy link
Member

msau42 commented Oct 31, 2023

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 31, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, sunnylovestiramisu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2023
@k8s-ci-robot k8s-ci-robot merged commit e6ac4bd into kubernetes:master Oct 31, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants