Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover expansion failure #106154

Merged
merged 3 commits into from
Nov 16, 2021

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented Nov 4, 2021

Implement changes for recovering from volume expansion failure.

xref kubernetes/enhancements#1790

Note to reviewers - there is a detailed flow chart diagram attached in the KEP - https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1790-recover-resize-failure/expansion_flow.pdf which describes the full control flow.

Implement support for recovering from volume expansion failures

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 4, 2021
@gnufied gnufied force-pushed the recover-expansion-failure-123 branch from efc63c8 to 199ea31 Compare November 4, 2021 15:14
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2021
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@gnufied
Copy link
Member Author

gnufied commented Nov 4, 2021

cc @kubernetes/sig-storage-api-reviews

@fedebongio
Copy link
Contributor

/remove-sig api-machinery
This seems to be mostly sig storage. BTW, @gnufied can you link to an issue that you are trying to fix (or create one?).
Thanks!

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 4, 2021
@gnufied
Copy link
Member Author

gnufied commented Nov 5, 2021

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 5, 2021
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API changes look good overall, just small picks. When this is all LGTM I can approve.

@@ -472,6 +472,9 @@ type PersistentVolumeClaimSpec struct {
// +optional
Selector *metav1.LabelSelector `json:"selector,omitempty" protobuf:"bytes,4,opt,name=selector"`
// Resources represents the minimum resources the volume should have.
// If RecoverVolumeExpansionFailure is enabled users are allowed to specify resource requirements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear what If RecoverVolumeExpansionFailure is enabled means - can you say something like "If the RecoverVolumeExpansionFailure feature gate is enabled..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -526,6 +529,25 @@ const (
PersistentVolumeClaimFileSystemResizePending PersistentVolumeClaimConditionType = "FileSystemResizePending"
)

type PersistentVolumeClaimResizeStatus string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment and should probably be flagged +enum (@jiahuif FYI) as in #105057.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added +enum here. I wonder if I should have waited for other feature, we will see.


const (
// When expansion is complete, the empty string is set by resize controller or kubelet.
PersistentVolumeClaimNoExpansionInProgress PersistentVolumeClaimResizeStatus = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why empty string and not nil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my original idea was, I thought I needed an invariant which I can use as default (without risking dereferencing a nil). This would also serve as a terminal condition when resizing is complete (for informing user). But I see how this can be problematic, I have marked this as a follow up item - #106429 before the feature goes beta.

// lowered if there are no expansion operations in progress and if the actual volume capacity
// is equal or lower than the requested capacity.
// This is an alpha field and requires enabling RecoverVolumeExpansionFailure feature.
// +optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+featureGate=RecoverVolumeExpansionFailure here and below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// +optional
AllocatedResources ResourceList `json:"allocatedResources,omitempty" protobuf:"bytes,5,rep,name=allocatedResources,casttype=ResourceList,castkey=ResourceName"`
// ResizeStatus stores status of resize operation.
// Defaults to nil but when expansion is complete resizeStatus is set to empty string by resize controller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"nil" is not meaningful to API clients. How about "Is not set by default" or "Has no value by default"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@gnufied gnufied force-pushed the recover-expansion-failure-123 branch from 017b897 to 393451e Compare November 15, 2021 23:39
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API changes LGTM

default:
newSize = pvcSpecSize
}
} else {
Copy link
Contributor

@jingxu97 jingxu97 Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if pvSize.Cmp(pvcSpecSize) == 0, I am wondering the following logic should be the same? Or it could be skipped and do nothing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment below covers why the code is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the following logic for cases of pvSize.Cmp(pvcSpecSize) == 0 and pvSize.Cmp(pvcSpecSize) > 0 might be different. Checking the code below, it should be ok, just in case of pvSize.Cmp(pvcSpecSize) == 0, expansion should be already finished successfully there are some calls could be skipped (such as MarkControllerReisizeInProgress, ExpandVolumeDevice, and UpdatePVSize)?
I am ok to optimize this later..

Copy link
Member Author

@gnufied gnufied Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing is - equality comparison of resource.Quantity is sometimes problematic, esp. because size stored in PV is size reported by volume plugin after expansion. A plugin may align size request to GB/GiB/MB boundaries and return updated size which is greater than what user request by just a fraction and hence I am not special casing equality of pvSize and pvcSpecSize. But I agree we could revisit this later. I will update GH issue I have created to track feature going beta.

},
{
name: "pvc.spec.size = pv.spec.size, recover_expansion=on",
pvc: getTestPVC("test-vol0", "1G", "1G", "", v1.PersistentVolumeClaimNoExpansionInProgress),
Copy link
Contributor

@jingxu97 jingxu97 Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, I feel resize should not be triggered?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed odd. If there is no expansion in progress, should it just return?

Copy link
Member Author

@gnufied gnufied Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this falls under - we should have not ended up in this state ever

What happened is - pvc.status.cap < pvc.spec.cap and pvc.spec.cap == pv.spec.cap and yet resizeStatus is empty. The test and condition exists for completeness purpose, because we still need to decide what to do when this happens.In this case IMO - it is best to let whole resize operation run and set appropriate resizeStatus on pvc and let node adjust pvc.status.cap if necessary. Since we don't shrink volumes ever and resize operations are idempotent, we should be fine and k8s should correct this weird state.

Edited - The reference to line# was incorrect, fixed it. But my main reasoning is still the case.

recoverFeatureGate: true,
expectedResizeStatus: v1.PersistentVolumeClaimNodeExpansionPending,
expectedAllocatedSize: resource.MustParse("2G"),
expectResizeCall: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also check expected pv.spec.size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have filed this as a follow up in - #106429 and I will make sure we test pv size in external-resizer.

case v1.PersistentVolumeClaimControllerExpansionInProgress:
case v1.PersistentVolumeClaimControllerExpansionFailed:
case v1.PersistentVolumeClaimNoExpansionInProgress:
// This is case#2
Copy link
Member

@jsafrane jsafrane Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This can be case #4 if user has shrunk the volume while controller expansion was in progress

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jsafrane
Copy link
Member

almost LGTM from storage POV
/approve

@jingxu97
Copy link
Contributor

/lgtm
thank you for the PR!!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2021
@gnufied gnufied force-pushed the recover-expansion-failure-123 branch from 393451e to 1ddd598 Compare November 16, 2021 16:06
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2021
@jsafrane
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2021
@gnufied
Copy link
Member Author

gnufied commented Nov 16, 2021

/retest

@thockin
Copy link
Member

thockin commented Nov 16, 2021

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied, jsafrane, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 16, 2021
@Jefftree Jefftree mentioned this pull request Nov 16, 2021
@thockin
Copy link
Member

thockin commented Nov 16, 2021

/milestone v1.23

@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Nov 16, 2021
@k8s-ci-robot k8s-ci-robot merged commit f151a40 into kubernetes:master Nov 16, 2021
if volumeToMount.VolumeSpec.ReadOnly {
simpleMsg, detailedMsg := volumeToMount.GenerateMsg("MountVolume.NodeExpandVolume failed", "requested read-only file system")
klog.Warningf(detailedMsg)
og.recorder.Eventf(volumeToMount.Pod, v1.EventTypeWarning, kevents.FileSystemResizeFailed, simpleMsg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied @jsafrane @thockin

the change here seems to generate warning event for all Pods which are mounting an PVC in read-only mode, regardless of whether there was an actual PVC/filesystem resize request or not. Didn't used to be the case, before v1.23.
Before this change, pvcStatusCap.Cmp(pvSpecCap) < 0 check used to consider resize only for volumes for which it was actually requested.

we wouldn't have noticed this, if this was just logging, but events are consumed by higher level tools, and is confusing our users, even though it is an non-issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants