skip resizing volume when extent is less than the minimum of 1MB #196

divyenpatel · 2022-01-27T18:45:50Z

What type of PR is this?
/kind bug

What this PR does / why we need it:
This PR is skipping resize of the volume when extent is less than the minimum of 1MB

Which issue(s) this PR fixes:

Fixes #195

Special notes for your reviewer:
Validated changes with vSphere CSI Driver. Volume is getting expanded when the volume is resized before creating a filesystem on it.

Does this PR introduce a user-facing change?:

skip resizing volume when extent is less than the minimum of 1MB

k8s-ci-robot · 2022-01-27T18:45:57Z

Welcome @divyenpatel!

It looks like this is your first PR to kubernetes-csi/csi-proxy 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/csi-proxy has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2022-01-27T18:45:58Z

Hi @divyenpatel. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mauriciopoppe

It'd be nice to have an integration test too, the code is in integrationtests/volume_v2alpha1_test.go

pkg/os/volume/api.go

mauriciopoppe · 2022-01-27T19:03:41Z

/ok-to-test
/test pull-kubernetes-csi-csi-proxy-integration

xing-yang · 2022-01-27T19:08:06Z

/ok-to-test

xing-yang · 2022-01-27T19:16:58Z

/assign @jingxu97

mauriciopoppe · 2022-01-27T19:35:08Z

pull-kubernetes-csi-csi-proxy-integration failed because of a pending merge of #186 or #189

k8s-ci-robot · 2022-01-27T19:50:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: divyenpatel
To complete the pull request process, please ask for approval from jingxu97 after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/os/volume/api.go

divyenpatel · 2022-01-28T18:16:23Z

/test pull-kubernetes-csi-csi-proxy-integration

jingxu97 · 2022-01-28T19:08:14Z

just want to understand when this scenario might happen. In your test case, you increase from 2GB to 3GB, how it caused failure?

divyenpatel · 2022-01-28T23:01:20Z

just want to understand when this scenario might happen. In your test case, you increase from 2GB to 3GB, how it caused failure?

We created 2 GB volume and then expanded the volume size to 3 GB, without creating the filesystem on the volume.
Then we created a Pod. While creating pod, the volume got formatted and the filesystem is created with 3 GB ( around 3204382720 bytes), later when the volume is attempted to expand, the final size to expand the volume is determined as 3204431360 bytes, which is less than the minimum of 1 MB requirement for resize-parition.

This issue is not happening for the case, when we create a 2 GB volume, create a pod using this volume, then we expand the disk to 3 GB and re-create the pod to expand the filesystem.

pkg/os/volume/api.go

divyenpatel · 2022-02-03T00:06:27Z

@jingxu97 @mauriciopoppe I have squashed all commits. Can we merge this PR?

mauriciopoppe · 2022-02-03T00:16:11Z

it looks good to me, @jingxu97 will lgtm and approve

jingxu97 · 2022-02-03T01:56:06Z

@divyenpatel this failure only happen in Windows case, right?

jingxu97 · 2022-02-03T01:58:40Z

just want to understand when this scenario might happen. In your test case, you increase from 2GB to 3GB, how it caused failure?

We created 2 GB volume and then expanded the volume size to 3 GB, without creating the filesystem on the volume. Then we created a Pod. While creating pod, the volume got formatted and the filesystem is created with 3 GB ( around 3204382720 bytes), later when the volume is attempted to expand, the final size to expand the volume is determined as 3204431360 bytes, which is less than the minimum of 1 MB requirement for resize-parition.

This issue is not happening for the case, when we create a 2 GB volume, create a pod using this volume, then we expand the disk to 3 GB and re-create the pod to expand the filesystem.

cc @gnufied Do you think resize controller can avoid triggering resize in this scenario? When volume is provisioned with 2GB, it is not formatted. Then it gets resized to 3GB and gets formatted. Then in this case, no resize of file system is needed.

jingxu97 · 2022-02-03T01:59:35Z

@divyenpatel this failure only happen in Windows case, right?

If we return error in csi-proxy api, then in current logic, all the driver might need to modify to handle this case, otherwise it will fail here.

gnufied · 2022-02-03T02:06:06Z

when I resize the volume to 3 GB, no file system was present on the volume.
later when I created a Pod using extended volume, Pod got stuck at MountVolume.MountDevice failed while expanding volume for volume "pvc-021b5d9c-ba3a-4c9e-ae22-df2cbfce9e67" : Expander.NodeExpand failed

Isn't NodeExpand supposed to be idempotent and hence if volume was already of size 3GB and expansion on node is again called for 3GB,the driver should return success and everything should fall in place.

jingxu97 · 2022-02-03T02:14:38Z

when I resize the volume to 3 GB, no file system was present on the volume.
later when I created a Pod using extended volume, Pod got stuck at MountVolume.MountDevice failed while expanding volume for volume "pvc-021b5d9c-ba3a-4c9e-ae22-df2cbfce9e67" : Expander.NodeExpand failed

Isn't NodeExpand supposed to be idempotent and hence if volume was already of size 3GB and expansion on node is again called for 3GB,the driver should return success and everything should fall in place.

I might be wrong, in this case, nodeexpand is only called once. The volume is first formatted with 3GB, but in windows, actual file system size will be less then 3GB. Then it calls nodeExpand trying to expand to 3GB and hit the error

gnufied · 2022-02-03T02:41:39Z

I might be wrong, in this case, nodeexpand is only called once. The volume is first formatted with 3GB, but in windows, actual file system size will be less then 3GB. Then it calls nodeExpand trying to expand to 3GB and hit the error.

It sounds like a driver bug. If file system was on 3GB disk (i.e I am not saying file system size but rather total disk space on which file system exists) and kubelet asked to expand to 3GB again, then driver should return success. Why does the driver returns error? Given current behaviour - NodeExpand is never going to be idempotent, because file system size is always going to be smaller than disk size.

gnufied · 2022-02-03T02:53:33Z

pkg/os/volume/api.go

+	if finalSize-currentSize < MinimumExpandSize {
+		return status.Errorf(codes.OutOfRange,
+			"minimum extent size is 1 MB. Skip resize for volume %s from currentBytes=%d to wantedBytes=%d ", volumeID, currentSize, finalSize)
+	}


Does this fixes the issue we observed btw? It sounds like - this will prevent node-expansion if increased size is less than 1MB? Basically it is not skipping - we are now returning an error.

This is also not consistent with how sizing works in other places in k8s. For example - on EBS I can't allocate a 1MB disk and if user asks for a 1MB disk, we make 1GB disk and provide it, rather than throwing an error.

jingxu97 · 2022-02-03T03:59:28Z

I might be wrong, in this case, nodeexpand is only called once. The volume is first formatted with 3GB, but in windows, actual file system size will be less then 3GB. Then it calls nodeExpand trying to expand to 3GB and hit the error.

It sounds like a driver bug. If file system was on 3GB disk (i.e I am not saying file system size but rather total disk space on which file system exists) and kubelet asked to expand to 3GB again, then driver should return success. Why does the driver returns error? Given current behaviour - NodeExpand is never going to be idempotent, because file system size is always going to be smaller than disk size.

I checked some code here, so during FormatVolume it uses the following command, no size needs to be specifies during format. System should try to format the disk as much as it can after reserving some space for system use. With 3GB disk, the file system size seems 3204382720bytes

Get-Volume -UniqueId \"%s\" | Format-Volume -FileSystem ntfs -Confirm:$false", volumeID

When driver calls ResizeVolume, it does not pass a size either which means resize to the max value that this disk can. So maxSize is calculated with following command. For 3GB disk, it returns 3204431360bytes

Get-Volume -UniqueId \"%s\" | Get-partition | Get-PartitionSupportedSize | Select SizeMax | ConvertTo-Json", volumeID

The code also get the current file system size with following command, the value is 3204382720bytes

(Get-Volume -UniqueId \"%s\" | Get-partition).Size", volumeID

If maxSize is the same as current size, the current logic will skip resize and return success. In this case it happens that maxSize is a little bit larger than current size, so it continues to run resize command and failed.

I am wondering how linux handle this case, when tries to resize the volume which is already formatted under the same disk capacity using command like "resize2fs" for ext file system.

Two things I am thinking

whether we can avoid calling resize if format is already for the same disk capacity
resize is called for the same disk capacity that file system is already formatted with, how to identify this situation and return success instead of error. For example, when resize is called without passing a resize number, that means to use the max value disk can support, if it is very close to the existing value like this case. If it does pass a value, which means user explicitly want to resize this much, it can return error. (is this case ever used in any driver?)

divyenpatel · 2022-02-03T20:44:35Z

@jingxu97 I have checked 4 CSI drivers, all of them call resize with SizeBytes 0 to use the maximum available size. &volume.ResizeVolumeRequest{VolumeId: devicePath, SizeBytes: 0}

https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/master/pkg/mounter/safe_mounter_windows.go#L329
https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/resizefs/resizefs_windows.go#L70-L72
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/mounter/safe_mounter_windows.go#L311
https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/pkg/csi/service/mounter/mounter_windows.go#L406

so this is affect all these drivers.

so to be on the safe side do you suggest checking if supplied SizeBytes is 0, and we find if finalSize-currentSize < MinimumExpandSize { then do not error out, but error out only when supplied SizeBytes is not zero and if finalSize-currentSize < MinimumExpandSize

gnufied · 2022-02-03T20:56:23Z

When driver calls ResizeVolume, it does not pass a size either which means resize to the max value that this disk can. So maxSize is calculated with following command. For 3GB disk, it returns 3204431360bytes

Why doesn't ResizeVolume get a size here? Kubelet always passes size to the CSI driver btw.

The code also get the current file system size with following command. If maxSize is the same as current size, the current logic will skip resize and return success. In this case it happens that maxSize is a little bit larger than current size, so it continues to run resize command and failed.

I wonder if there is a way to get disk size on which file system is, vs file system size. In Linux it is possible to get disk size on which file system exists vs file system size. That is one way to determine if file system on the disk needs expansion. This is similar to problem we ran into when a snapshot can be restored to size larger than original volume and hence we had to calculate disk space on which file system exists (which is very different from file system size).

Code in Linux - https://github.com/kubernetes/mount-utils/blob/master/resizefs_linux.go#L106

k8s-ci-robot · 2022-02-08T06:54:28Z

@divyenpatel: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jingxu97 · 2022-02-09T00:29:02Z

resize2fs

We checked a few drivers, when calling resize, none of them pass a size value. I think it tries to resize to the max size that disk can possible hold.

I tried command resize2fs on a disk with file system already exist, it returns

resize2fs 1.46.2 (28-Feb-2021)
The filesystem is already 1572864 (4k) blocks long.  Nothing to do!

So for Linux, resize the filesystem that is already the max size from that disk can pass without error. This is also confirmed with @divyenpatel

jingxu97 · 2022-02-09T00:41:40Z

@jingxu97 I have checked 4 CSI drivers, all of them call resize with SizeBytes 0 to use the maximum available size. &volume.ResizeVolumeRequest{VolumeId: devicePath, SizeBytes: 0}

https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/master/pkg/mounter/safe_mounter_windows.go#L329 https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/resizefs/resizefs_windows.go#L70-L72 https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/mounter/safe_mounter_windows.go#L311 https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/pkg/csi/service/mounter/mounter_windows.go#L406

so this is affect all these drivers.

so to be on the safe side do you suggest checking if supplied SizeBytes is 0, and we find if finalSize-currentSize < MinimumExpandSize { then do not error out, but error out only when supplied SizeBytes is not zero and if finalSize-currentSize < MinimumExpandSize

I agree with the logic. Seems like linux can handle this well without extra logic to check, but Windows we have to handle it ourselves.

jingxu97 · 2022-03-21T22:10:01Z

@divyenpatel just to check the status of this PR?

k8s-triage-robot · 2022-06-19T23:09:20Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

jingxu97 · 2022-06-30T16:29:04Z

do we still need to merge this change?

knabben · 2022-07-20T12:23:19Z

/remove-lifecycle stale

cphvmware · 2022-10-12T03:13:21Z

Can we accept that this patch is adequate for a temporary fix and that we'll fix it with a better one soon?
We are under some pressure to resolve this issue for a big customer.

If that's OK, I'll file another bug to create a better patch and work on it next.

jingxu97 · 2022-10-12T03:48:54Z

I don't have strong objection, @gnufied WDYT?
but the PR need to solve merge conflict. @divyenpatel

mauriciopoppe · 2022-10-12T05:45:40Z

Development of the client/server model has moved to the v1.x branch, please change the base branch to v1.x

k8s-triage-robot · 2023-01-19T22:11:26Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 27, 2022

k8s-ci-robot requested review from kkmsft and xing-yang January 27, 2022 18:45

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 27, 2022

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 27, 2022

mauriciopoppe reviewed Jan 27, 2022

View reviewed changes

pkg/os/volume/api.go Outdated Show resolved Hide resolved

pkg/os/volume/api.go Outdated Show resolved Hide resolved

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 27, 2022

k8s-ci-robot assigned jingxu97 Jan 27, 2022

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 27, 2022

mauriciopoppe reviewed Jan 27, 2022

View reviewed changes

pkg/os/volume/api.go Outdated Show resolved Hide resolved

divyenpatel force-pushed the fix-resize-less-than-1MB branch from eff216a to 9362a2b Compare January 28, 2022 00:13

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 28, 2022

divyenpatel force-pushed the fix-resize-less-than-1MB branch from 9362a2b to 8b6df23 Compare January 28, 2022 00:17

mauriciopoppe reviewed Jan 28, 2022

View reviewed changes

pkg/os/volume/api.go Show resolved Hide resolved

divyenpatel mentioned this pull request Jan 31, 2022

Pod remains in the Pending state with error - Resize-Partition : Size Not Supported #195

Closed

divyenpatel force-pushed the fix-resize-less-than-1MB branch from 603b06b to fd11306 Compare February 3, 2022 00:05

gnufied reviewed Feb 3, 2022

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2022

cphvmware mentioned this pull request Oct 20, 2022

Skip resizing volume when extent is less than the minimum of 1MB. #271

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2023

divyenpatel closed this Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skip resizing volume when extent is less than the minimum of 1MB #196

skip resizing volume when extent is less than the minimum of 1MB #196

divyenpatel commented Jan 27, 2022 •

edited

Loading

k8s-ci-robot commented Jan 27, 2022

k8s-ci-robot commented Jan 27, 2022

mauriciopoppe left a comment

mauriciopoppe commented Jan 27, 2022

xing-yang commented Jan 27, 2022

xing-yang commented Jan 27, 2022

mauriciopoppe commented Jan 27, 2022

k8s-ci-robot commented Jan 27, 2022

divyenpatel commented Jan 28, 2022

jingxu97 commented Jan 28, 2022

divyenpatel commented Jan 28, 2022

divyenpatel commented Feb 3, 2022

mauriciopoppe commented Feb 3, 2022

jingxu97 commented Feb 3, 2022

jingxu97 commented Feb 3, 2022

jingxu97 commented Feb 3, 2022

gnufied commented Feb 3, 2022

jingxu97 commented Feb 3, 2022 •

edited

Loading

gnufied commented Feb 3, 2022

gnufied Feb 3, 2022 •

edited

Loading

jingxu97 commented Feb 3, 2022 •

edited

Loading

divyenpatel commented Feb 3, 2022

gnufied commented Feb 3, 2022 •

edited

Loading

k8s-ci-robot commented Feb 8, 2022

jingxu97 commented Feb 9, 2022 •

edited

Loading

jingxu97 commented Feb 9, 2022

jingxu97 commented Mar 21, 2022

k8s-triage-robot commented Jun 19, 2022

jingxu97 commented Jun 30, 2022

knabben commented Jul 20, 2022

cphvmware commented Oct 12, 2022

jingxu97 commented Oct 12, 2022

mauriciopoppe commented Oct 12, 2022

k8s-triage-robot commented Jan 19, 2023

skip resizing volume when extent is less than the minimum of 1MB #196

skip resizing volume when extent is less than the minimum of 1MB #196

Conversation

divyenpatel commented Jan 27, 2022 • edited Loading

k8s-ci-robot commented Jan 27, 2022

k8s-ci-robot commented Jan 27, 2022

mauriciopoppe left a comment

Choose a reason for hiding this comment

mauriciopoppe commented Jan 27, 2022

xing-yang commented Jan 27, 2022

xing-yang commented Jan 27, 2022

mauriciopoppe commented Jan 27, 2022

k8s-ci-robot commented Jan 27, 2022

divyenpatel commented Jan 28, 2022

jingxu97 commented Jan 28, 2022

divyenpatel commented Jan 28, 2022

divyenpatel commented Feb 3, 2022

mauriciopoppe commented Feb 3, 2022

jingxu97 commented Feb 3, 2022

jingxu97 commented Feb 3, 2022

jingxu97 commented Feb 3, 2022

gnufied commented Feb 3, 2022

jingxu97 commented Feb 3, 2022 • edited Loading

gnufied commented Feb 3, 2022

gnufied Feb 3, 2022 • edited Loading

Choose a reason for hiding this comment

jingxu97 commented Feb 3, 2022 • edited Loading

divyenpatel commented Feb 3, 2022

gnufied commented Feb 3, 2022 • edited Loading

k8s-ci-robot commented Feb 8, 2022

jingxu97 commented Feb 9, 2022 • edited Loading

jingxu97 commented Feb 9, 2022

jingxu97 commented Mar 21, 2022

k8s-triage-robot commented Jun 19, 2022

jingxu97 commented Jun 30, 2022

knabben commented Jul 20, 2022

cphvmware commented Oct 12, 2022

jingxu97 commented Oct 12, 2022

mauriciopoppe commented Oct 12, 2022

k8s-triage-robot commented Jan 19, 2023

divyenpatel commented Jan 27, 2022 •

edited

Loading

jingxu97 commented Feb 3, 2022 •

edited

Loading

gnufied Feb 3, 2022 •

edited

Loading

jingxu97 commented Feb 3, 2022 •

edited

Loading

gnufied commented Feb 3, 2022 •

edited

Loading

jingxu97 commented Feb 9, 2022 •

edited

Loading