Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Karpenter v0.32.4 does not work when deployed via eksctl #7454

Closed
iankouls-aws opened this issue Jan 6, 2024 · 10 comments · Fixed by #7778
Closed

[Bug] Karpenter v0.32.4 does not work when deployed via eksctl #7454

iankouls-aws opened this issue Jan 6, 2024 · 10 comments · Fixed by #7778

Comments

@iankouls-aws
Copy link

Summary:
Karpenter deployment is successful but it fails to create new nodes

What were you trying to accomplish?

An EKS cluster was created using eksctl version 0.167.0 using the following manifest:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: do-eks-yaml-karpenter
  version: "1.28"
  region: us-west-2
  tags:
    karpenter.sh/discovery: do-eks-yaml-karpenter

iam:
  withOIDC: true

addons:
  - name: aws-ebs-csi-driver
    version: v1.26.0-eksbuild.1
    wellKnownPolicies:
      ebsCSIController: true

karpenter:
  version: 'v0.32.4'
  createServiceAccount: true
  #  defaultInstanceProfile: 'KarpenterInstanceProfile'
  withSpotInterruptionQueue: true

managedNodeGroups:
  - name: c5-xl-do-eks-karpenter-ng
    instanceType: c5.xlarge
    instancePrefix: c5-xl
    privateNetworking: true
    minSize: 0
    desiredCapacity: 2
    maxSize: 10
    volumeSize: 300
    iam:
      withAddonPolicies:
        cloudWatch: true
        ebs: true

eksctl create cluster -f ./eks-karpenter.yaml

The cluster creation finishes successfully. See logs below.

Apply NodePool and EC2NodeClass, then create a deployment that requires a GPU. The pod enters Pending state. It is expected that karpenter will add a GPU node to the cluster

What happened?

No nodes get added to the cluster
Karpenter pods are in the Running state
Karpenter pod logs show errors:

[karpenter-84bf6fff97-v5v2k] {"level":"ERROR","time":"2024-01-06T09:15:56.457Z","logger":"controller","message":"Reconciler error","commit":"fdf67d0","controller":"nodeclass","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","EC2NodeClass":{"name":"default"},"namespace":"","name":"default","reconcileID":"fe2de351-d378-4d82-aff7-556160f4d128","error":"creating instance profile, getting instance profile "do-eks-yaml-karpenter_4067990795380418201", AccessDenied: User: arn:aws:sts::<account_id>:assumed-role/eksctl-do-eks-yaml-karpenter-iamservice-role/1704531887056119458 is not authorized to perform: iam:GetInstanceProfile on resource: instance profile do-eks-yaml-karpenter_4067990795380418201 because no identity-based policy allows the iam:GetInstanceProfile action\n\tstatus code: 403, request id: f3a80d84-31cc-44ad-a6a4-91b4d3e56de3"}

How to reproduce it?

  1. Create cluster using the manifest shared above and command eksctl create cluster -f ./eks-karpenter.yaml
  2. Create NodePool and EC2NodeClass by cloning project https://github.com/aws-samples/aws-do-eks, and executing script Container-Root/eks/deployment/karpenter/provisioner-deploy-v1beta1.sh
  3. Create deployment whith requests and limits of 1 nvdia.com/gpu by running script https://github.com/aws-samples/aws-do-eks/blob/main/Container-Root/eks/deployment/horizontal-pod-autoscaler/hpa-example/run.sh
  4. Tail karpenter pod logs: kubectl -n karpenter logs -f $(kubectl -n karpenter get pod | grep karpenter | head -n 1 | cut -d ' ' -f 1)

Logs
Cluster creation log:

eksctl create cluster -f /aws-do-eks/Container-Root/eks/conf/eksctl/yaml/eks-karpenter.yaml

2024-01-06 05:42:46 [ℹ]  eksctl version 0.167.0
2024-01-06 05:42:46 [ℹ]  using region us-west-2
2024-01-06 05:42:47 [ℹ]  setting availability zones to [us-west-2b us-west-2a us-west-2c]
2024-01-06 05:42:47 [ℹ]  subnets for us-west-2b - public:192.168.0.0/19 private:192.168.96.0/19
2024-01-06 05:42:47 [ℹ]  subnets for us-west-2a - public:192.168.32.0/19 private:192.168.128.0/19
2024-01-06 05:42:47 [ℹ]  subnets for us-west-2c - public:192.168.64.0/19 private:192.168.160.0/19
2024-01-06 05:42:47 [ℹ]  nodegroup "c5-xl-do-eks-karpenter-ng" will use "" [AmazonLinux2/1.28]
2024-01-06 05:42:47 [ℹ]  using Kubernetes version 1.28
2024-01-06 05:42:47 [ℹ]  creating EKS cluster "do-eks-yaml-karpenter" in "us-west-2" region with managed nodes
2024-01-06 05:42:47 [ℹ]  1 nodegroup (c5-xl-do-eks-karpenter-ng) was included (based on the include/exclude rules)
2024-01-06 05:42:47 [ℹ]  will create a CloudFormation stack for cluster itself and 0 nodegroup stack(s)
2024-01-06 05:42:47 [ℹ]  will create a CloudFormation stack for cluster itself and 1 managed nodegroup stack(s)
2024-01-06 05:42:47 [ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --cluster=do-eks-yaml-karpenter'
2024-01-06 05:42:47 [ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "do-eks-yaml-karpenter" in "us-west-2"
2024-01-06 05:42:47 [ℹ]  CloudWatch logging will not be enabled for cluster "do-eks-yaml-karpenter" in "us-west-2"
2024-01-06 05:42:47 [ℹ]  you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-west-2 --cluster=do-eks-yaml-karpenter'
2024-01-06 05:42:47 [ℹ]  
2 sequential tasks: { create cluster control plane "do-eks-yaml-karpenter", 
    2 sequential sub-tasks: { 
        5 sequential sub-tasks: { 
            wait for control plane to become ready,
            associate IAM OIDC provider,
            2 sequential sub-tasks: { 
                create IAM role for serviceaccount "kube-system/aws-node",
                create serviceaccount "kube-system/aws-node",
            },
            restart daemonset "kube-system/aws-node",
            1 task: { create addons },
        },
        create managed nodegroup "c5-xl-do-eks-karpenter-ng",
    } 
}
2024-01-06 05:42:47 [ℹ]  building cluster stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:42:47 [ℹ]  deploying stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:43:17 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:43:47 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:44:47 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:45:47 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:46:47 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:47:47 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:48:47 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:49:47 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:50:47 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:51:47 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:52:47 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:54:48 [ℹ]  building iamserviceaccount stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-kube-system-aws-node"
2024-01-06 05:54:48 [ℹ]  deploying stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-kube-system-aws-node"
2024-01-06 05:54:48 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-kube-system-aws-node"
2024-01-06 05:55:19 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-kube-system-aws-node"
2024-01-06 05:55:19 [ℹ]  serviceaccount "kube-system/aws-node" already exists
2024-01-06 05:55:19 [ℹ]  updated serviceaccount "kube-system/aws-node"
2024-01-06 05:55:19 [ℹ]  daemonset "kube-system/aws-node" restarted
2024-01-06 05:55:19 [ℹ]  building managed nodegroup stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:55:19 [ℹ]  deploying stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:55:19 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:55:49 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:56:45 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:58:03 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:58:03 [ℹ]  waiting for the control plane to become ready
2024-01-06 05:58:04 [✔]  saved kubeconfig as "/root/.kube/config"
2024-01-06 05:58:04 [ℹ]  no tasks
2024-01-06 05:58:04 [✔]  all EKS cluster resources for "do-eks-yaml-karpenter" have been created
2024-01-06 05:58:04 [ℹ]  creating role using provided well known policies
2024-01-06 05:58:05 [ℹ]  deploying stack "eksctl-do-eks-yaml-karpenter-addon-aws-ebs-csi-driver"
2024-01-06 05:58:05 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-aws-ebs-csi-driver"
2024-01-06 05:58:35 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-aws-ebs-csi-driver"
2024-01-06 05:59:34 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-aws-ebs-csi-driver"
2024-01-06 05:59:34 [ℹ]  creating addon
2024-01-06 06:00:23 [ℹ]  addon "aws-ebs-csi-driver" active
2024-01-06 06:00:24 [ℹ]  1 task: { create karpenter for stack "do-eks-yaml-karpenter" }
2024-01-06 06:00:24 [ℹ]  building nodegroup stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:00:24 [ℹ]  deploying stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:00:24 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:00:54 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:01:44 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:02:16 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:02:52 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:04:04 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:04:04 [ℹ]  1 task: { create IAM role for serviceaccount "karpenter/karpenter" }
2024-01-06 06:04:04 [ℹ]  1 task: { create IAM role for serviceaccount "karpenter/karpenter" }
2024-01-06 06:04:04 [ℹ]  building iamserviceaccount stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-karpenter-karpenter"
2024-01-06 06:04:04 [ℹ]  deploying stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-karpenter-karpenter"
2024-01-06 06:04:04 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-karpenter-karpenter"
2024-01-06 06:04:34 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-karpenter-karpenter"
2024-01-06 06:04:34 [ℹ]  adding identity "arn:aws:iam::159553542841:role/eksctl-KarpenterNodeRole-do-eks-yaml-karpenter" to auth ConfigMap
2024-01-06 06:04:34 [ℹ]  adding Karpenter to cluster do-eks-yaml-karpenter
E0106 06:04:35.661520    1614 memcache.go:206] couldn't get resource list for karpenter.k8s.aws/v1alpha1: the server could not find the requested resource
E0106 06:04:35.732564    1614 memcache.go:206] couldn't get resource list for karpenter.k8s.aws/v1beta1: the server could not find the requested resource
E0106 06:04:35.821871    1614 memcache.go:206] couldn't get resource list for karpenter.sh/v1beta1: the server could not find the requested resource
2024-01-06 06:04:50 [ℹ]  kubectl command should work with "/root/.kube/config", try 'kubectl get nodes'
2024-01-06 06:04:50 [✔]  EKS cluster "do-eks-yaml-karpenter" in "us-west-2" region is ready

Sat Jan  6 06:04:50 UTC 2024
Done creating cluster using /aws-do-eks/Container-Root/eks/conf/eksctl/yaml/eks-karpenter.yaml
/aws-do-eks/Container-Root/eks

Karpenter pod log:

[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:07.220Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:08.221Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:09.221Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:10.222Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:11.222Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:12.222Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:13.223Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:14.224Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"ERROR","time":"2024-01-06T09:51:14.749Z","logger":"controller","message":"Reconciler error","commit":"fdf67d0","controller":"nodeclass","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","EC2NodeClass":{"name":"default"},"namespace":"","name":"default","reconcileID":"4020d4ad-afda-4687-9f05-6ed98b8506f3","error":"creating instance profile, getting instance profile \"do-eks-yaml-karpenter_4067990795380418201\", AccessDenied: User: arn:aws:sts::159553542841:assumed-role/eksctl-do-eks-yaml-karpenter-iamservice-role/1704531887056119458 is not authorized to perform: iam:GetInstanceProfile on resource: instance profile do-eks-yaml-karpenter_4067990795380418201 because no identity-based policy allows the iam:GetInstanceProfile action\n\tstatus code: 403, request id: 63aaf857-2b29-45d2-a60e-e6e1fe9889a2"}

Anything else we need to know?
To build the container image for the deployment, from the cloned project directory, execute the following commands:
cd Container-Root/eks/deployment/horizontal-pod-autoscaler/hpa-example
./build.sh
./push.sh

Older versions of Karpenter (e.g. 0.29.0) used with Provisioner and AWSNodeTemplate work as expected.
In this case the v1alpha5 API is used: https://github.com/aws-samples/aws-do-eks/blob/main/Container-Root/eks/deployment/karpenter/provisioner-deploy-v1alpha5.sh

Karpenter works as expected, when the cluster is created without Karpenter, then Karpenter v0.32.4 is deployed by following the instructions here: https://karpenter.sh/v0.32/getting-started/getting-started-with-karpenter/#4-install-karpenter

It appears like eksctl lacks support for the versions of Karpenter that support API v1beta1.

Versions

$ eksctl info
eksctl version: 0.167.0
kubectl version: v1.28.2
OS: linux
Copy link
Contributor

github-actions bot commented Jan 6, 2024

Hello iankouls-aws 👋 Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

@yuxiang-zhang
Copy link
Member

ec2CreateFleet,
ec2CreateLaunchTemplate,
ec2CreateTags,
ec2DescribeAvailabilityZones,
ec2DescribeInstanceTypeOfferings,
ec2DescribeInstanceTypes,
ec2DescribeInstances,
ec2DescribeLaunchTemplates,
ec2DescribeSecurityGroups,
ec2DescribeSubnets,
ec2DeleteLaunchTemplate,
ec2RunInstances,
ec2TerminateInstances,
ec2DescribeImages,
ec2DescribeSpotPriceHistory,
iamPassRole,
iamCreateServiceLinkedRole,
ssmGetParameter,
pricingGetProducts,

Looks like the controller policy is missing the AllowPassingInstanceRole defined here:

https://github.com/aws/karpenter-provider-aws/blob/daeb5da355fce14f718f51c4956ca8f9319103dd/website/content/en/docs/getting-started/getting-started-with-karpenter/cloudformation.yaml#L186-L196

Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@ibnjunaid
Copy link
Contributor

Hi @yuxiang-zhang . Can you please assign this issue to me.

@pstast
Copy link

pstast commented Mar 28, 2024

I fixed this issue in my cluster by manually adding these permissions to eksctl-KarpenterControllerPolicy-CLUSTERNAME policy:

  • iam:GetInstanceProfile
  • iam:CreateInstanceProfile
  • iam:TagInstanceProfile
  • iam:AddRoleToInstanceProfile

These are apparently missing when configuring Karpenter by eksctl.

@ibnjunaid
Copy link
Contributor

Thanks @pstast for the policies. I am yet to raise a PR for the issue.

Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Apr 28, 2024
Copy link
Contributor

github-actions bot commented May 4, 2024

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 4, 2024
@siennathesane
Copy link
Contributor

I ran into this issue with 0.36.2 as well, and @pstast's recommended fix resolved it for me.

@piotrblasiak
Copy link

Still there with karpenter 0.37.0 and eksctl 0.183.0

@cPu1 cPu1 closed this as completed in #7778 Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants