Ignore topology.ebs.csi.aws.com/zone label for Balance-Similar-NodeGroups #3230

TBBle · 2020-06-18T14:48:23Z

The topology.ebs.csi.aws.com/zone label is added by the AWS EBS CSI driver as a target for Persistent Volume Node Affinity.

If I understand correctly, a Pod that claims such a volume will only be considered by CA for node groups that have this label, but Pods that do not claim such a volume can be scheduled anywhere, so this should not block node-group-similarity in CreateAwsNodeInfoComparator.

It probably should be added to buildGenericLabels like failure-domain.beta.kubernetes.io/zone, so that scale-from-zero works correctly, assuming this won't break systems that don't have the AWS EBS CSI driver installed. That would save having to add the label by-hand to the per-AZ ASGs needed when using EBS. I guess it might go odd if only some of the nodegroups run the CSI Daemonset, but I'm not sure that's a rational/expected setup?

I just saw #2928 and #2493 (which to be fair, is not listed on the FAQ args-list), so I'd understand if the call is to leave this to user-config, and just document it on the AWS page for AWS EBS CSI users. It would still need to be manually added to the ASG tags for the scale-from-zero case.

I'm on k8s 1.15 so unless this or #2493 is backported a long way, it's not going to affect me immediately anyway.

The text was updated successfully, but these errors were encountered:

fejta-bot · 2020-09-16T15:32:27Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

TBBle · 2020-09-16T16:58:02Z

/remove-lifecycle stale

bcha · 2020-09-23T10:20:44Z

Any progress or ETA on this? After some initial troubleshooting I think this label is the cause of the balancing issues that we've been seeing in our EKS clusters.

fejta-bot · 2020-12-22T11:19:59Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

TBBle · 2020-12-22T18:05:35Z

/remove-lifecycle stale

westernspion · 2021-02-04T19:45:15Z

Bump.

This is rather inconvenient, but my workaround is to manually provision the first node of an ASG thats set to 0

westernspion · 2021-02-04T20:15:44Z

That would save having to add the label by-hand to the per-AZ ASGs needed when using EBS
Does this mean there is a way to manually label an ASG for the cluster-autoscaler to consider? I'm not using eks node groups (to get around 1 node minimum)

TBBle · 2021-02-05T00:10:42Z

@westernspion Yes, that's #2493 that I mentioned in the original request.

From Autoscaler 1.19, there is a --balancing-ignore-label parameter, you can see an example of its use (for precisely this issue) at #3515 (comment), and the equivalent of this ticket is

--balancing-ignore-label=topology.ebs.csi.aws.com/zone

If you're using this for multiple labels, and deploying with Helm, I believe you'll hit #3673, but it should work fine with Helm to just add a single label to ignore.

westernspion · 2021-02-05T18:21:16Z

--balancing-ignore-label=topology.ebs.csi.aws.com/zone solves a different problem I believe.

I've tested the ignore label with v1.19.x and v1.20.0 of the controller and it has no effect on the scheduling (still getting volume affinity errors for my statefulset after it's ASG reduces to 0 and I scale up again). I reviewed the source and this makes sense, as this controls which labels should be ignored on the autoscaling groups considered, not outright omitting affinities/selectors on resources such as PVs in the cluster

So the autoscaler is intelligent enough to consider nodeAffinity on a PV, but it is unable to resolve that information from the autoscaling group. In the case of the EBS CSI provisioner, the affinity is to a node label of topology.ebs.csi.aws.com/zone - but there no reason this label couldn't be arbitrary.

So looking through the code, and I haven't seen documentation for this, but the autoscaler appears to consider anything prefixed with 'k8s.io/cluster-autoscaler/node-template/label/' a custom label to resolve

I'm going to test this out as it appears to answer my original question That would save having to add the label by-hand to the per-AZ ASGs needed when using EBS

westernspion · 2021-02-05T19:04:08Z

k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone works like a charm for my needs.

It would be nice to have this label automatically generated from the asg zone lst.

TBBle · 2021-02-06T03:09:27Z

@westernspion Oh, I'm sorry. I totally misread your original question, and also had forgotten the buildGenericLabels part of my original request. >_<

The documentation for the scale-from-zero ASG Tags -> Node Labels mapping is in the AWS-specific documentation, and is also talked about in the scale to 0 FAQ entry.

You are correct, that part of this feature request is not user-doable using #2493, and I see now that I had noted that in the original request: "It would still need to be manually added to the ASG tags for the scale-from-zero case."

It's been a while, but as best I recall, I was not using managed node groups, but using eksctl 'unmanaged' node groups, and so had manually added that tag to the nodegroup definition, since I was defining a nodegroup per-AZ anyway.

It is a little problematic to include in the buildGenericLabels code, as if the EBS CSI Daemonset is only deployed to a subset of nodes for some reason, then a Pod that required that label (i.e with a PVC for an existing EBS volume) would potentially trigger a spin-up from 0 of a nodegroup that was excluded from the EBS CSI Daemonset, and would not be able to schedule until another node was spun up.

The complexity here is because for whatever reason, the EBS CSI Daemonset didn't just use the standard k8s region labels, although I guess in non-EKS setups, they might not be appropriate?

So adding it to the default-ignored list seems pretty safe, but can also be done by the user as --balancing-ignore-label=topology.ebs.csi.aws.com/zone.

And I'm not sure of a good way to ensure this EBS CSI-specific label is visible to CA on scaled-to-zero setups, without either some significant assumptions being made, or direct administrator action.

Edit: I opened kubernetes-sigs/aws-ebs-csi-driver#729 in case we can solve this completely by (optionally) making EBS CSI use a standard node label instead of its own label.

TBBle · 2021-03-29T11:01:17Z

Good news. Per kubernetes-sigs/aws-ebs-csi-driver#729 (comment), the AWS EBS CSI driver 0.10 release will include changes that fix this use-case, along with the use of --balancing-ignore-label=topology.ebs.csi.aws.com/zone.

Basically, the CSI driver will prefer the well-known zone label to the above specific label, and ensure both are applied identically in its CSI API; this means that although the topology.ebs.csi.aws.com/zone will exist, and be honoured by the csi-translation-lib (pending it being updated), the rest of the system can use the well-known zone label that Cluster Autoscaler already knows about.

Once topology.ebs.csi.aws.com/zone is dropped completely, the --balancing-ignore-label=topology.ebs.csi.aws.com/zone can be dropped too.

This means that this proposed change (or the local config, if CA maintainers so-prefer) is sufficient to allow correct behaviour with AWS EBS CSI driver, i.e. no overly-ambitious balanced node groups which scale up but cannot schedule a Pod with an EBS volume, and no falsely-distinguished node-groups when an EBS volume is not present.

That doesn't close this ticket, because I still think it would be nice if this label was automatically ignored on AWS, as originally suggested. The old label may be around for a while, and the AWS EBS CSI driver is going to become much more widespread once AWS starts deploying it by default on EKS and k8s removes the in-tree driver, targeted at 1.21 in the 1.17 release timeframe.

dlaidlaw · 2021-05-05T19:09:59Z

In my not-so-humble opinion, both the eks.amazonaws.com/sourceLaunchTemplateId and topology.ebs.csi.aws.com/zone should always be ignored by the AWS CloudProvider portion of the cluster-autoscaler code. The solution of using an addition CLI arg not ideal since it requires adding settings to the command line args for the cluster-autoscaler that should be defaults.

AWS customers are very likely to discover the hard way that running the AWS CSI Driver will break the cluster-autoscaler --balance-similar-nodegroups feature. And if any of those AWS customers are running a cluster older than 1.19 they are completely out of luck. Most will probably be like me and adopt the AWS CSI Driver, then discover that the cluster-autoscaler no longer balances similar nodegroups (hopefully without a serious production issues) and then start searching for why. If they are lucky they will end up here.

It is far better to just do the right thing at the outset, and have the cluster-autoscaler ignore the labels that break it. Or maybe even use a whitelist of labels instead of blacklisting some. A whitelist would probably solve this issue and the ones that would arise in the future as other software adds labels to nodes. A good start for labels to whitelist would be:

kubernetes.io/arch
topology.kubernetes.io/region
kubernetes.io/os
node.kubernetes.io/instance-type

The eks.amazonaws.com/sourceLaunchTemplateId label is added to nodes by the AWS Nodegroup. It too causes issues with the cluster-autoscaler balance-similar-nodegroups if the nodegroups do not all use the exact same launch template, regardless of whether the templates are similar.

TBBle · 2021-05-05T22:26:54Z

Just so I understand, your suggestion is that instead of having a list of labels to ignore when considering nodegroups for balancing (as we have now, a combination of generic and AWS-specific), there should be a list of labels to consider for differentiation, and other labels should not be considered, unless explicitly added to say "The value of this label must be the same for the nodegroups to be considered the same"?

I ask because your example list there seems to mix both "must differentiate" (e.g., node.kubernetes.io/instance-type) with "never differentiate" (e.g., topology.kubernetes.io/zone), and also "may differentiate, depends on the cluster admin" (e.g., kubernetes.io/arch)

I'm somewhat wary of ignoring eks.amazonaws.com/sourceLaunchTemplateId by default, because there's plenty of things you could do in a Launch Template that would make the nodegroups non-interchangable, but I haven't actually played with custom launch templates in EKS, so am unaware if there's a reason you cannot use the same launch template for different nodegroups.

Adding a label to the default list is basically saying "No value we find here will affect whether this node is the same as some other node"; where "different launch templates produce interchangable nodes" seems more like a cluster-admin commitment than a generically true statement. But I am aware that I may be wrong about this.

dlaidlaw · 2021-05-06T12:17:05Z

Yes, that is it exactly. Sorry about the topology.kubernetes.io/zone in the list, you are correct that that one should not be in a whitelist of labels. I will edit my list above to keep that clear.

What this approach is designed to do is to keep the cluster-autoscaler working even as new labels are added to nodes. The new labels may be added manually, or via controllers, operators, or deployment automation for any number of reasons. I would prefer if that did not break the cluster-autoscaler when it happens.

I could live with the eks.amazonaws.com/sourceLaunchTemplateId being in a whitelist. I can see how that would be useful in some circumstances. For our use cases, the ID of the launch template is not a differentiator for autoscaling. Similarly, I can see how kubernetes.io/arch could be a problem for others. So definitely, it would be good to be able to tweek the list with other options to add labels, and also to remove some of the default labels from the list.

TBBle · 2021-05-06T14:24:22Z

It's an interesting idea. I don't expect it'll see much traction here, so might be worth opening as a feature request/change suggestion if you want to pursue it further.

k8s-triage-robot · 2021-08-04T14:55:15Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

TBBle · 2021-08-04T23:35:42Z

/remove-lifecycle stale

ArchiFleKs · 2021-09-02T23:40:50Z

When using custom launch template, more labels are added.

I guess there should be ignore also for this to work ?

I see there is some work done here : https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodegroupset/aws_nodegroups.go#L30

Is this where we should add the ignore label set.

For example here is an EKS managed node group with custom launch template:

Labels:         arch=arm64                                                                                                                                                    
                    beta.kubernetes.io/arch=arm64                                                                                                                                 
                    beta.kubernetes.io/instance-type=t4g.medium                                                                                                                   
                    beta.kubernetes.io/os=linux                                                                                                                                   
                    eks.amazonaws.com/capacityType=ON_DEMAND                                                                                                                      
                    eks.amazonaws.com/nodegroup=default-c-20210828222002582300000005                                                                        
                    eks.amazonaws.com/nodegroup-image=ami-037d1e5f9638c6c6f                                                                                                       
                    eks.amazonaws.com/sourceLaunchTemplateId=lt-0d60b9506f387d2d6                                                                                                 
                    eks.amazonaws.com/sourceLaunchTemplateVersion=1                                                                                                               
                    failure-domain.beta.kubernetes.io/region=eu-west-1                                                                                                            
                    failure-domain.beta.kubernetes.io/zone=eu-west-1c                                                                                                             
                    kubernetes.io/arch=arm64                                                                                                                                      
                    kubernetes.io/hostname=ip-10-0-84-153.eu-west-1.compute.internal                                                                                              
                    kubernetes.io/os=linux                                                                                                                                        
                    network=private                                                                                                                                               
                    node.kubernetes.io/instance-type=t4g.medium                                                                                                                   
                    size=medium                                                                                                                                                   
                    topology.ebs.csi.aws.com/zone=eu-west-1c                                                                                                                      
                    topology.kubernetes.io/region=eu-west-1                                                                                                                       
                    topology.kubernetes.io/zone=eu-west-1c

I have 3 similar node pool like this, but in addition of CSI, these might differ also:

                    eks.amazonaws.com/nodegroup-image=ami-037d1e5f9638c6c6f                                                                                                       
                    eks.amazonaws.com/sourceLaunchTemplateId=lt-0d60b9506f387d2d6                                                                                                 
                    eks.amazonaws.com/sourceLaunchTemplateVersion=1

TBBle · 2021-09-03T09:11:54Z

That would be up to the local administrator to use with --balancing-ignore-label, as they are best-placed to know if their different launch templates lead to materially different nodes, as far as region-balancing is concerned.

The defaults should be very conservative, and only include labels that can never be material to the equivalence of the nodes for balancing purposes.

Particularly because I don't think the command-line API for this lets you override an existing default, so if eks.amazonaws.com/nodegroup-image was added to the defaults, it'd be impossible for the local administrator to specific that this label does matter for balancing.

elmiko · 2021-11-03T14:48:50Z

fwiw, this is something we are running into with the Cluster API provider as AWS is one of the possible infrastructure providers for CAPI. given the discussion that has happened here kubernetes-sigs/aws-ebs-csi-driver#729 , i wonder if we couldn't make a patch to ignore this label as part of the "well-known labels" until the aws-ebs-csi-driver has completed deprecating the old value in favor of the common one (topology.kubernetes.io/zone)?

edit:
this breaks all scale ups on CAPI using AWS with --balance-similar-node-groups as the nodes do not properly balance between the groups.

when balancing similar nodes, the autoscaler will look for differences in labels as one of its checks for similarity. by default it will ignore the well known kubernetes zone label (topology.kubernetes.io/zone), but the aws-ebs-csi-driver will add a unique label for its zone, `topology.ebs.csi.aws.com/zone`. this label will eventually be deprecated in favor of the common well known topology label. for more information about this see the references. ref: kubernetes/autoscaler#3230 ref: kubernetes-sigs/aws-ebs-csi-driver#729

This change adds the aforementioned label to the list of ignored labels in the AWS nodegroupset processor. This change is being made in response to the addition of this label by the aws-ebs-csi-driver. This label will eventually be deprecated by the driver, but its use will prevent AWS users from properly balancing similar nodes. Also adds unit test for the AWS processor. ref: kubernetes#3230 ref: kubernetes-sigs/aws-ebs-csi-driver#729

TBBle · 2021-11-16T12:11:28Z

I'm glad to see #4458, and it addresses the title of the issue, but there's still the issue in the second paragraph, which can't be worked-around by CA command-line:

It probably should be added to buildGenericLabels like failure-domain.beta.kubernetes.io/zone, so that scale-from-zero works correctly, assuming this won't break systems that don't have the AWS EBS CSI driver installed. That would save having to add the label by-hand to the per-AZ ASGs needed when using EBS. I guess it might go odd if only some of the nodegroups run the CSI Daemonset, but I'm not sure that's a rational/expected setup?

However, unlike the label-ignore change, I'm not as-certain it's the right fix, and it's probably not as-problematic anyway, since it only affects scale-from-zero, and after that, balancing should work correctly now. (And it can be worked around by adding AWS tags to the ASG if needed anyway).

Perhaps it needs to be a new issue with more consideration. It's also possible I just misdiagnosed it. I honestly haven't thought about it since #3230 (comment), when it was shown useful.

@elmiko I'm not sure if CAPI has the same problem or not. but a quick look suggests CAPI doesn't yet support scale-from-zero anyway.

elmiko · 2021-11-16T14:48:10Z

@TBBle i think you are correct that there could be an underlying issue with scale from zero, i haven't tested it thoroughly yet. that said...

I'm not sure if CAPI has the same problem or not. but a quick look suggests CAPI doesn't yet support scale-from-zero anyway.

it does not yet, but i am currently working on a patch for it. i will try to play around with the zone labels when i am testing it, perhaps i can find the issue you are talking about.

TBBle · 2021-11-17T07:17:20Z

Sounds good. To avoid confusion, that should probably be a separate ticket to isolate just that behaviour.

elmiko · 2021-11-17T14:11:45Z

To avoid confusion, that should probably be a separate ticket to isolate just that behaviour.

agreed

This change adds the aforementioned label to the list of ignored labels in the AWS nodegroupset processor. This change is being made in response to the addition of this label by the aws-ebs-csi-driver. This label will eventually be deprecated by the driver, but its use will prevent AWS users from properly balancing similar nodes. Also adds unit test for the AWS processor. ref: kubernetes#3230 ref: kubernetes-sigs/aws-ebs-csi-driver#729

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 16, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 16, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2020

Xyaren mentioned this issue Jan 25, 2021

Scale up from 0 does not work with existing AWS EBS CSI PersistentVolume #3845

Closed

TBBle mentioned this issue Feb 6, 2021

Is it possible to use the standard k8s topology labels? kubernetes-sigs/aws-ebs-csi-driver#729

Closed

himanshu-kun mentioned this issue Jun 15, 2021

Scale from zero fails to work on k8s >= 1.18 AWS gardener/autoscaler#83

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 4, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 4, 2021

jbartosik added the area/cluster-autoscaler label Sep 15, 2021

enxebre mentioned this issue Nov 3, 2021

Bug 2001027: add topology.ebs.csi.aws.com/zone to ignored labels openshift/cluster-autoscaler-operator#224

Closed

elmiko mentioned this issue Nov 10, 2021

Allow AWS and ClusterAPI providers to ignore topology.ebs.csi.aws.com/zone label #4458

Merged

k8s-ci-robot closed this as completed in #4458 Nov 15, 2021

TBBle mentioned this issue Nov 19, 2021

[EKS] [request]: Nodegroup should support tagging ASGs aws/containers-roadmap#608

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore topology.ebs.csi.aws.com/zone label for Balance-Similar-NodeGroups #3230

Ignore topology.ebs.csi.aws.com/zone label for Balance-Similar-NodeGroups #3230

TBBle commented Jun 18, 2020 •

edited

Loading

fejta-bot commented Sep 16, 2020

TBBle commented Sep 16, 2020

bcha commented Sep 23, 2020 •

edited

Loading

fejta-bot commented Dec 22, 2020

TBBle commented Dec 22, 2020

westernspion commented Feb 4, 2021

westernspion commented Feb 4, 2021

TBBle commented Feb 5, 2021

westernspion commented Feb 5, 2021 •

edited

Loading

westernspion commented Feb 5, 2021

TBBle commented Feb 6, 2021 •

edited

Loading

TBBle commented Mar 29, 2021

dlaidlaw commented May 5, 2021 •

edited

Loading

TBBle commented May 5, 2021

dlaidlaw commented May 6, 2021

TBBle commented May 6, 2021

k8s-triage-robot commented Aug 4, 2021

TBBle commented Aug 4, 2021

ArchiFleKs commented Sep 2, 2021 •

edited

Loading

TBBle commented Sep 3, 2021 •

edited

Loading

elmiko commented Nov 3, 2021 •

edited

Loading

TBBle commented Nov 16, 2021 •

edited

Loading

elmiko commented Nov 16, 2021

TBBle commented Nov 17, 2021

elmiko commented Nov 17, 2021

Ignore topology.ebs.csi.aws.com/zone label for Balance-Similar-NodeGroups #3230

Ignore topology.ebs.csi.aws.com/zone label for Balance-Similar-NodeGroups #3230

Comments

TBBle commented Jun 18, 2020 • edited Loading

fejta-bot commented Sep 16, 2020

TBBle commented Sep 16, 2020

bcha commented Sep 23, 2020 • edited Loading

fejta-bot commented Dec 22, 2020

TBBle commented Dec 22, 2020

westernspion commented Feb 4, 2021

westernspion commented Feb 4, 2021

TBBle commented Feb 5, 2021

westernspion commented Feb 5, 2021 • edited Loading

westernspion commented Feb 5, 2021

TBBle commented Feb 6, 2021 • edited Loading

TBBle commented Mar 29, 2021

dlaidlaw commented May 5, 2021 • edited Loading

TBBle commented May 5, 2021

dlaidlaw commented May 6, 2021

TBBle commented May 6, 2021

k8s-triage-robot commented Aug 4, 2021

TBBle commented Aug 4, 2021

ArchiFleKs commented Sep 2, 2021 • edited Loading

TBBle commented Sep 3, 2021 • edited Loading

elmiko commented Nov 3, 2021 • edited Loading

TBBle commented Nov 16, 2021 • edited Loading

elmiko commented Nov 16, 2021

TBBle commented Nov 17, 2021

elmiko commented Nov 17, 2021

TBBle commented Jun 18, 2020 •

edited

Loading

bcha commented Sep 23, 2020 •

edited

Loading

westernspion commented Feb 5, 2021 •

edited

Loading

TBBle commented Feb 6, 2021 •

edited

Loading

dlaidlaw commented May 5, 2021 •

edited

Loading

ArchiFleKs commented Sep 2, 2021 •

edited

Loading

TBBle commented Sep 3, 2021 •

edited

Loading

elmiko commented Nov 3, 2021 •

edited

Loading

TBBle commented Nov 16, 2021 •

edited

Loading