Max number of volumes calculation is incorrect #427

tenitski · 2019-12-09T20:25:02Z

/kind bug

What happened?
Unable to start a pod which uses a volume:

Warning FailedAttachVolume 22m (x3 over 70m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-?????" : timed out waiting for the condition
Warning FailedMount 119s (x41 over 92m) kubelet, ip-???????.ap-southeast-2.compute.internal Unable to mount volumes for pod "mysql-0_example(?????)": timeout expired waiting for volumes to attach or mount for pod "example"/"mysql-0". list of unmounted volumes=[mysql-data]. list of unattached volumes=[mysql-data default-token-????]
0

There are over 20 volumes mounted on the instance, no more can be mounted.

What you expected to happen?

When mounting volumes on nodes which use multiple ENIs the max limit is calculated incorrectly as ENIs use some of the resources (it is very well described in this ticket kubernetes/kubernetes#80967).

Driver should check how many ENIs are in use and decrease the number of volumes.
Other option is allow external param to limit number of volumes by admin.

Environment

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T23:42:50Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.8-eks-b7174d", GitCommit:"b7174db5ee0e30c94a0b9899c20ac980c0850fc8", GitTreeState:"clean", BuildDate:"2019-10-18T17:56:01Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

Driver version:
I'm not sure what specific driver version it is. The cluster was created using terraform-aws-eks v6.0.2 module (https://github.com/terraform-aws-modules/terraform-aws-eks?ref=v6.0.2)

The text was updated successfully, but these errors were encountered:

leakingtapan · 2019-12-12T08:09:17Z

@tenitski Thx for creating the issue here. As I mentioned in the other thread

Agree with @justinsb that, to make it working correctly, the challenge comes from storage driver need to be aware of the CNI attachment and vice versa.

We need a way to let CNI plugin and CSI driver to share this attach limit so that both side could work correctly. And the current CSI attach limit feature doesn't address this issue at all.

gnufied · 2019-12-13T20:22:31Z

I do not think we will have a design that will allow flexibility we are looking for in calculating CNI and impact of other network interfaces (at least in short/medium term).

We should fix - #347

otterley · 2019-12-15T00:22:12Z

I disagree with the approach suggested in #347 - I don't believe we should burden our customers with the responsibility of determining what the correct value should be, especially when we can calculate that for them, and because the values could be raised in the future.

It could also lead customers into a false sense of correctness and reliability. If they set the value too high based on how many ENIs are attached when the node is young, another ENI attachment that occurs -- unbeknownst to the customer -- could make the configured limit incorrect, and lead to more errors in scheduling pods. We should strive for the most reliable solution possible.

fejta-bot · 2020-03-14T02:48:01Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

leakingtapan · 2020-03-14T04:33:11Z

/remove-lifecycle stale

excieve · 2020-06-05T18:58:39Z

I've just bumped into this issue on an EKS 1.14.9 cluster with a VPC CNI plugin installed. After reading the (closed) kubernetes/kubernetes#80967 issue, I understand that dynamic limit is now GA with 1.17, but does EBS CSI actually report a correct limit for it?

More general question: is there a way to reliably use the EBS CSI at all? Right now volumes are just stuck at "attaching" once over the "real" limit.

fejta-bot · 2020-09-03T19:31:34Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-10-03T20:14:41Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

otterley · 2020-10-03T20:16:12Z

/remove-lifecycle rotten

fejta-bot · 2021-01-01T20:24:48Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-01-31T21:08:26Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2021-03-02T21:54:17Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-03-02T21:54:23Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ayberk · 2021-03-02T23:25:29Z

/reopen

k8s-ci-robot · 2021-03-02T23:25:35Z

@ayberk: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ayberk · 2021-03-02T23:26:02Z

/remove-lifecycle rotten

fejta-bot · 2021-05-31T23:42:17Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-07-01T00:18:14Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

wongma7 · 2021-07-01T02:35:12Z

/remove-lifecycle rotten
/lifecycle frozen

ashaker-cig · 2021-12-05T18:39:17Z

This issue forces cluster admins to set a static low value for volume-attach-limit to avoid running into attachment issues on nodes using AWS VPC CNI. Which leads to premature Node scaling-out and inefficient use of resources.
It would be great if the driver became aware of the current attachments (ENI & EBS) on its EC2 instance and calculate the max number accordingly.

ashaker-cig · 2021-12-05T18:39:53Z

/remove-lifecycle frozen

ashaker-cig · 2021-12-05T18:48:15Z

I think I just noticed fixing and merging [#1075] might address this issue, on nodes where DescribeInstances is granted in their IAM roles. Right?

k8s-triage-robot · 2022-03-05T19:31:54Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-04-04T19:46:36Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-05-04T20:14:10Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-05-04T20:14:21Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 9, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 3, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 3, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 3, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 1, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 31, 2021

k8s-ci-robot closed this as completed Mar 2, 2021

k8s-ci-robot reopened this Mar 2, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 2, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 1, 2021

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jul 1, 2021

k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Dec 5, 2021

ialidzhikov mentioned this issue Dec 10, 2021

NodeGetInfo: MaxVolumesPerNode is wrong for a lot of nitro instance types #1139

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 4, 2022

k8s-ci-robot closed this as completed May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max number of volumes calculation is incorrect #427

Max number of volumes calculation is incorrect #427

tenitski commented Dec 9, 2019 •

edited

Loading

leakingtapan commented Dec 12, 2019

gnufied commented Dec 13, 2019

otterley commented Dec 15, 2019 •

edited

Loading

fejta-bot commented Mar 14, 2020

leakingtapan commented Mar 14, 2020

excieve commented Jun 5, 2020

fejta-bot commented Sep 3, 2020

fejta-bot commented Oct 3, 2020

otterley commented Oct 3, 2020

fejta-bot commented Jan 1, 2021

fejta-bot commented Jan 31, 2021

fejta-bot commented Mar 2, 2021

k8s-ci-robot commented Mar 2, 2021

ayberk commented Mar 2, 2021

k8s-ci-robot commented Mar 2, 2021

ayberk commented Mar 2, 2021

fejta-bot commented May 31, 2021

fejta-bot commented Jul 1, 2021

wongma7 commented Jul 1, 2021

ashaker-cig commented Dec 5, 2021

ashaker-cig commented Dec 5, 2021

ashaker-cig commented Dec 5, 2021

k8s-triage-robot commented Mar 5, 2022

k8s-triage-robot commented Apr 4, 2022

k8s-triage-robot commented May 4, 2022

k8s-ci-robot commented May 4, 2022

Max number of volumes calculation is incorrect #427

Max number of volumes calculation is incorrect #427

Comments

tenitski commented Dec 9, 2019 • edited Loading

leakingtapan commented Dec 12, 2019

gnufied commented Dec 13, 2019

otterley commented Dec 15, 2019 • edited Loading

fejta-bot commented Mar 14, 2020

leakingtapan commented Mar 14, 2020

excieve commented Jun 5, 2020

fejta-bot commented Sep 3, 2020

fejta-bot commented Oct 3, 2020

otterley commented Oct 3, 2020

fejta-bot commented Jan 1, 2021

fejta-bot commented Jan 31, 2021

fejta-bot commented Mar 2, 2021

k8s-ci-robot commented Mar 2, 2021

ayberk commented Mar 2, 2021

k8s-ci-robot commented Mar 2, 2021

ayberk commented Mar 2, 2021

fejta-bot commented May 31, 2021

fejta-bot commented Jul 1, 2021

wongma7 commented Jul 1, 2021

ashaker-cig commented Dec 5, 2021

ashaker-cig commented Dec 5, 2021

ashaker-cig commented Dec 5, 2021

k8s-triage-robot commented Mar 5, 2022

k8s-triage-robot commented Apr 4, 2022

k8s-triage-robot commented May 4, 2022

k8s-ci-robot commented May 4, 2022

tenitski commented Dec 9, 2019 •

edited

Loading

otterley commented Dec 15, 2019 •

edited

Loading