-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.33.2 Karpenter won't scale nodes on 1.28 EKS version: #5594
Comments
Can you share the status from your EC2NodeClass? Typically, you will see this error when Karpenter isn't able to discover your subnets and you don't have any zones that Karpenter can leverage for scheduling pods to instance types. |
Hi @jonathan-innis Thanks for responding! I figured out the issue. The core issue was that the initial error message:
was misleading! The problem was that the EC2NodeClass manifest was referring to a role:
It is mentioned that the |
@jonathan-innis Another issue that I noticed, when I create |
@VladFCarsDevops are you using a private cluster?
Is this when you are using |
@engedaam My EKS cluster endpoints are both Public and Private. Yes! When I switched to using |
@VladFCarsDevops would you be will make to a PR for the documentation update? |
@engedaam Sure, can you point me to the right location? |
It would be here https://github.com/aws/karpenter-provider-aws/tree/main/website/content/en. You will need to make the same changes to v0.32, v0.33, and v0.34 |
@VladFCarsDevops This is surprising to me. From what I know about the current state of the code, we shouldn't return back a different response during scheduling when using an instance profile vs. using a role. It's a bit hard to parse the terraform manifests that you pasted above (also, unfortunately none of the maintainers on the karpenter team are TF experts). Do you have direct access to the cluster and if you do, could you post the YAML version of the EC2NodeClass and NodePool when you have the instance profile vs. when you have the role? Also, as for surfacing this information better. We're currently talking about how we can improve observability for Karpenter using status conditions across all of our resources. This is talked about here: kubernetes-sigs/karpenter#493. I'd imagine that surfacing a condition directly like |
@jonathan-innis Oh I tried creating EC2NodeClass and NodePool with plain yamls and had the same errors up until I changed from role to an InstanceProfile, a friend of mine working at another company faced the same issue. I think updating the instructions in the docs, will save a ton of time for people debugging a wrong log output when it has nothing to do with resources. |
I don't disagree if this is really what is happening, but what I am trying to say is that these issues seem potentially unrelated to me. From looking over the code and reasoning about where we evaluate the instance profile and the role when it comes to making scheduling decisions, we don't have them affect scheduling decisions at all, which is why I'm thinking that it's odd that you are seeing "Could not schedule pod, incompatible with nodepool" and pointing back to the fact that you were using a role vs. an instance profile as the reason. Do you know if the EC2NodeClass that you were referencing was properly resolving the subnets or security groups that you were specifying by checking the status? One common problem that we see is that subnets don't get resolved; therefore, the instance types aren't able to produce zones and so you will see the error that you pasted over here when you are scheduling pods |
@VladFCarsDevops can you please share your final ec2nc and where do you get the right instance-profile from? |
@ahoehma You have to create Instance profile separately , give permissions and reference it in you ec2nc
|
Yes, SG and Subnets were properly setup |
Description
Observed Behavior:
Karpenter pod logs:
Could not schedule pod, incompatible with nodepool "np-244-nodepool", daemonset overhead={"cpu":"780m","memory":"1120Mi","pods":"6"}, no instance type satisfied resources {"cpu":"6780m","memory":"11360Mi","pods":"7"} and requirements karpenter.k8s.aws/instance-category In [c m r], karpenter.k8s.aws/instance-cpu In [16 32 36 4 48 and 1 others], karpenter.sh/capacity-type In [on-demand], karpenter.sh/nodepool In [np-244-nodepool], kubernetes.io/arch In [amd64], topology.kubernetes.io/zone In [us-east-1a us-east-1b us-east-1c] (no instance type has enough resources)
I tried to remove everything from the
requirements
to make NodePool flexible as much as possible, but I got the same error:Could not schedule pod, incompatible with nodepool "np-244-nodepool", daemonset overhead={"cpu":"780m","memory":"1120Mi","pods":"6"}, no instance type satisfied resources (no instance type has enough resources)
Expected Behavior:
Karpenter scales nodes dynamically regardless of the workload.
Reproduction Steps (Please include YAML):
`resource "helm_release" "karpenter" {
namespace = "karpenter"
create_namespace = true
name = "karpenter"
repository = "oci://public.ecr.aws/karpenter"
chart = "karpenter"
version = "v0.33.2"
wait = true
set {
name = "serviceAccount.annotations.eks\.amazonaws\.com/role-arn"
value = var.karpenter_controller_arn
}
set {
name = "settings.clusterName"
value = var.eks_name
}
set {
name = "settings.clusterEndpoint"
value = var.cluster_endpoint
}
set {
name = "settings.defaultInstanceProfile"
value = "np-244-KarpenterNodeInstanceProfile"
}
set {
name = "logLevel"
value = "debug"
}
}
resource "kubectl_manifest" "nodepool" {
yaml_body = <<-YAML
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: ${var.workspace}-nodepool
spec:
template:
spec:
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: "In"
values: ["c", "m", "r"]
- key: "karpenter.k8s.aws/instance-cpu"
operator: "In"
values: ["4", "8", "16", "32", "36", "48"]
- key: "topology.kubernetes.io/zone"
operator: "In"
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
- key: "kubernetes.io/arch"
operator: "In"
values: ["amd64"]
- key: "karpenter.sh/capacity-type"
operator: "In"
values: ["on-demand"]
nodeClassRef:
name: ${var.workspace}-node-class
limits:
cpu: "1000"
disruption:
consolidationPolicy: "WhenUnderutilized"
expireAfter: "720h"
YAML
}
resource "kubectl_manifest" "ec2nodeclass" {
yaml_body = <<-YAML
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: ${var.workspace}-node-class
spec:
amiFamily: "AL2"
role: "${var.workspace}-karpenter-controller"
subnetSelectorTerms:
- tags:
"karpenter.sh/discovery": "${var.workspace}"
securityGroupSelectorTerms:
- tags:
"karpenter.sh/discovery": "${var.workspace}"
blockDeviceMappings:
- deviceName: "/dev/xvda"
ebs:
volumeSize: 100
volumeType: "gp2"
encrypted: true
deleteOnTermination: true
YAML
}
IAM configurations:
data "aws_iam_policy_document" "karpenter_controller_assume_role_policy" {
statement {
actions = ["sts:AssumeRoleWithWebIdentity"]
effect = "Allow"
}
}
resource "aws_iam_policy" "karpenter_policy" {
name = "${var.workspace}-KarpenterPolicy"
path = "/"
description = "Policy for Karpenter"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "KarpenterInstanceProfileManagement",
"Effect": "Allow",
"Action": [
"iam:CreateInstanceProfile",
"iam:AddRoleToInstanceProfile",
"iam:RemoveRoleFromInstanceProfile",
"iam:PassRole",
"iam:GetInstanceProfile",
"iam:TagInstanceProfile"
],
"Resource": ""
},
{
"Sid": "KarpenterEC2Actions",
"Effect": "Allow",
"Action": [
"ec2:RunInstances",
"ec2:DescribeSubnets",
"ec2:DescribeSpotPriceHistory",
"ec2:DescribeSecurityGroups",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeImages",
"ec2:DeleteLaunchTemplate",
"ec2:CreateTags",
"ec2:CreateLaunchTemplate",
"ec2:CreateFleet",
"ssm:GetParameter",
"pricing:GetProducts"
],
"Resource": ""
},
{
"Sid": "ConditionalEC2Termination",
"Effect": "Allow",
"Action": "ec2:TerminateInstances",
"Resource": "*",
"Condition": {
"StringLike": {
"ec2:ResourceTag/Name": "karpenter"
}
}
}
]
}
EOF
}
resource "aws_iam_role" "karpenter_controller" {
assume_role_policy = data.aws_iam_policy_document.karpenter_controller_assume_role_policy.json
name = "${var.workspace}-karpenter-controller"
}
resource "aws_iam_policy" "karpenter_controller" {
policy = aws_iam_policy.karpenter_policy.policy
name = "${var.workspace}-karpenter-controller"
}
resource "aws_iam_role_policy_attachment" "karpenter_controller_attach" {
role = aws_iam_role.karpenter_controller.name
policy_arn = aws_iam_policy.karpenter_controller.arn
}
resource "aws_iam_instance_profile" "karpenter" {
name = "${var.workspace}-KarpenterNodeInstanceProfile"
role = aws_iam_role.kubernetes-worker-role.name
}`
Versions:
1.28
):The text was updated successfully, but these errors were encountered: