-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Max number of volumes calculation is incorrect #427
Comments
@tenitski Thx for creating the issue here. As I mentioned in the other thread
We need a way to let CNI plugin and CSI driver to share this attach limit so that both side could work correctly. And the current CSI attach limit feature doesn't address this issue at all. |
I do not think we will have a design that will allow flexibility we are looking for in calculating CNI and impact of other network interfaces (at least in short/medium term). We should fix - #347 |
I disagree with the approach suggested in #347 - I don't believe we should burden our customers with the responsibility of determining what the correct value should be, especially when we can calculate that for them, and because the values could be raised in the future. It could also lead customers into a false sense of correctness and reliability. If they set the value too high based on how many ENIs are attached when the node is young, another ENI attachment that occurs -- unbeknownst to the customer -- could make the configured limit incorrect, and lead to more errors in scheduling pods. We should strive for the most reliable solution possible. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
I've just bumped into this issue on an EKS 1.14.9 cluster with a VPC CNI plugin installed. After reading the (closed) kubernetes/kubernetes#80967 issue, I understand that dynamic limit is now GA with 1.17, but does EBS CSI actually report a correct limit for it? More general question: is there a way to reliably use the EBS CSI at all? Right now volumes are just stuck at "attaching" once over the "real" limit. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@ayberk: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle rotten |
This issue forces cluster admins to set a static low value for volume-attach-limit to avoid running into attachment issues on nodes using AWS VPC CNI. Which leads to premature Node scaling-out and inefficient use of resources. |
/remove-lifecycle frozen |
I think I just noticed fixing and merging [#1075] might address this issue, on nodes where DescribeInstances is granted in their IAM roles. Right? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
What happened?
Unable to start a pod which uses a volume:
There are over 20 volumes mounted on the instance, no more can be mounted.
What you expected to happen?
When mounting volumes on nodes which use multiple ENIs the max limit is calculated incorrectly as ENIs use some of the resources (it is very well described in this ticket kubernetes/kubernetes#80967).
Driver should check how many ENIs are in use and decrease the number of volumes.
Other option is allow external param to limit number of volumes by admin.
Environment
kubectl version
):I'm not sure what specific driver version it is. The cluster was created using
terraform-aws-eks v6.0.2
module (https://github.com/terraform-aws-modules/terraform-aws-eks?ref=v6.0.2)The text was updated successfully, but these errors were encountered: