Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow overriding node attach limits as driver CLI parameter #347

Closed
gnufied opened this issue Aug 15, 2019 · 16 comments
Closed

Allow overriding node attach limits as driver CLI parameter #347

gnufied opened this issue Aug 15, 2019 · 16 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@gnufied
Copy link
Contributor

gnufied commented Aug 15, 2019

Given all the complexities around how limits will be calculated on the node, it is a good idea to just allow an admin to override the limits when starting the csi-ebs driver.

@leakingtapan
Copy link
Contributor

Not sure how is this going to work if the worker nodes are of different instance type where each has a different attach limit. I think this should be a per node attributes.

@gnufied
Copy link
Contributor Author

gnufied commented Aug 15, 2019

The CSINode object where limits are actually saved are per-node basis, so we should be able to support this as per-node attribute.

Are you referring to how will this be set during deploy time?

@gnufied
Copy link
Contributor Author

gnufied commented Aug 15, 2019

The flow of this code is:

csi-ebs-driver-on-node --> store-limit-> CSINode --> scheduler uses CSINode object during counting volumes.

So it is perfectly possible to support this on per-node basis. It is another problem, how will tools like kops will surface this configuration parameter though..

@leakingtapan
Copy link
Contributor

Probably I missed something. How could cluster operator set this new CLI parameter differently for drivers on different node? Since it is deployed as daemon set, I am assuming the pods in daemon set are identical

@gnufied
Copy link
Contributor Author

gnufied commented Aug 27, 2019

Could this be solved by ensuring that daemonset that targets the same kind of nodes use one selector while other kind of nodes uses another selector? I do not think it should be too difficult to do for a kube admin who is running into these kind of issues...

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 25, 2019
@leakingtapan
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 25, 2019
@otterley
Copy link

On EC2 Nitro instances, the number of attachable EBS volumes is not a fixed value that can be calculated and supplied when the CSI driver is launched, regardless of instance type, that is correct throughout the lifetime of the node. Because the formula is MaxEbsAttachableVolumes = MaxAttachments - AttachedENIs, where MaxAttachments depends on the EC2 instance type, the value must be recalculated each time a new ENI is attached or detached from the instance.

@leakingtapan
Copy link
Contributor

Agree that this proposal is very suboptimal

@gnufied
Copy link
Contributor Author

gnufied commented Dec 15, 2019

Agree that this proposal is very suboptimal

Designing a generic mechanism to calculate number of attachments as shared resource is kind of a tricky problem. There are various ways overrides can be configured such as pre-allocating a number for ENIs or percentage of attachments(and subtracting it from max). It would be nice if AWS had a API for determining MaxEbsAttachableVolumes an instance supports, so as we can remove the hacks in this driver and in-tree.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2020
@leakingtapan
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 13, 2020
@rfranzke
Copy link
Contributor

rfranzke commented Jun 3, 2020

Given the fact that prior CSI there was no solution that allowed overriding the volume limits per node but only globally in the whole cluster (https://v1-18.docs.kubernetes.io/docs/concepts/storage/storage-limits/#custom-limits), and given the above discussed complexities: Would it make sense to proceed with the suggested CLI flag for now until a more sophisticated solution can be implemented? This way, at least the existing clusters that were leveraging the KUBE_MAX_PD_VOLS method can continue to work as before when they migrate to CSI. It wouldn't make things worse. WDYT?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 1, 2020
@gnufied
Copy link
Contributor Author

gnufied commented Oct 1, 2020

This is fixed #522

@gnufied gnufied closed this as completed Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants