Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix capacityType label in AWS ManagedNodeGroup #6261

Merged
merged 1 commit into from
Dec 4, 2023

Conversation

sibucan
Copy link
Contributor

@sibucan sibucan commented Nov 8, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

When using a ManagedNodeGroup (MNG), CAS is supposed to use the EKS's DescribeNodegroup API call to get the information about the nodes in the NodeGroup to know where a Pod is supposed to go if no nodes currently exist for the MNG. This is because there's currently no native way of setting tags on a MNG's underlying ASG using the MNG API, and CAS traditionally depended on the ASG's tags to be able to tell when it should scale up from 0 when a Pod could potentially fit there.

However, the capacityType label that CAS gets from the DescribeNodegroup API call is missing a crucial prefix. A Spot instance node will have the eks.amazonaws.com/capacityType label placed on it, but when CAS gets the node label list from the API, it'll look like this:

I1005 17:40:24.332839       1 aws_manager.go:310] node.Labels : map[amiType:BOTTLEROCKET_x86_64 capacityType:SPOT cluster-name:eks01 eks.amazonaws.com/nodegroup:bottlerocket_x86_spot-20231003194631949400000003 k8sVersion:1.28 kubernetes.io/arch:amd64 kubernetes.io hostname:eks-bottlerocket_x86_spot-20231003194631949400000003-bcc57b89-f4ef-9c63-7c70-b4ab5325eb78-asg-6921437539965377742 kubernetes.io/os:linux node.kubernetes.io/instance-type:m5.xlarge nodegroup-name:bottlerocket_x86_spot-20231003194631949400000003 topology.kubernetes.io/region:us-east-1 topology.kubernetes.io/zone:us-east-1a]

Note that the label CAS determined from the API call was:

capacityType:SPOT

In a real node, the labels look like this (taken from kubectl describe node):

Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m6i.xlarge
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=SPOT
                    eks.amazonaws.com/nodegroup=bottlerocket_x86_spot-20231003194631949400000003
                    eks.amazonaws.com/nodegroup-image=ami-09c9ab15237fd5960
                    eks.amazonaws.com/sourceLaunchTemplateId=lt-0d333d9b550f53aad
                    eks.amazonaws.com/sourceLaunchTemplateVersion=3
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-250-242-179.ec2.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m6i.xlarge
                    topology.ebs.csi.aws.com/zone=us-east-1a
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1a

So if I tried to schedule a Pod using the following nodeaffinity:

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: "eks.amazonaws.com/capacityType"
                  operator: In
                  values: ["SPOT"]

The CAS will see that my Spot MNG (with discovered label capacityType: SPOT) will "never" be able to fit my Pod that requests eks.amazonaws.com/capacityType: SPOT, and says "we can't fit your Pod in the cluster":

I1005 17:56:50.997360       1 orchestrator.go:546] Pod autoscaler-test/test-59f845b4f8-qvtll can't be scheduled on eks-bottlerocket_x86_spot-20231003194631949400000003-bcc57b89-f4ef-9c63-7c70-b4ab5325eb78, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=

However, If I used the "wrong" label capacityType: SPOT, CAS will autoscale the correct MNG, but my Pod then stays Pending because the label on the actual node has the eks.amazonaws.com prefix and so my Pod's nodeAffinity differs and can't actually get scheduled into that node. 😢

The fix in here makes it so the autodiscovered label is the same one that's placed on the nodes once the MNG scales up for the first time.

Note: This does not happen when there's at least one node in the MNG, as CAS will then cache the correct set of labels, since it will circumvent the need for EKS's DescribeNodegroup API.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed an issue where the capacityType label inferred from an empty AWS ManagedNodeGroup does not match the same label on the nodes after it scales from 0 -> 1.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/provider/aws Issues or PRs related to aws provider labels Nov 8, 2023
@k8s-ci-robot k8s-ci-robot added area/cluster-autoscaler area/vertical-pod-autoscaler size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 8, 2023
@sibucan sibucan force-pushed the aws-capacity-type-fix branch from 7cd73bc to 17e99a2 Compare November 8, 2023 16:55
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 8, 2023
Fixes an issue where the capacityType label inferred from an empty
EKS ManagedNodeGroup does not match the same label on the node after it
is created and joins the cluster
@sibucan sibucan force-pushed the aws-capacity-type-fix branch from 17e99a2 to 7ffb662 Compare November 8, 2023 17:39
@vadasambar
Copy link
Member

/assign @vadasambar

@vadasambar
Copy link
Member

A nit, LGTM otherwise. Description is easy to understand (thank you)

@vadasambar
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 15, 2023
@vadasambar
Copy link
Member

vadasambar commented Nov 15, 2023

@drmorr0 , @gjtempleton need your approval.

@gjtempleton
Copy link
Member

Thanks for this!
/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gjtempleton, sibucan, vadasambar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 4, 2023
@k8s-ci-robot k8s-ci-robot merged commit c0661b5 into kubernetes:master Dec 4, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants