-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karpenter launches more EC2 instances than required #1291
Comments
Thank you very much for bring up this problem. Can you also share the Karpenter logs? Meanwhile I will try to recreate it. |
I tried your configurations and the launch template. In my environment Karpenter launches 11 * 8xlarge instances. I am wondering if the 5 * 4xlarge instances and 11 * 8xlarge instances are launched at the same time. I am just thinking out loud here. There is a possibility that some constraints are preventing pods to be scheduled by the k8s scheduler even when there are nodes with enough extra resources. That will make the pod un-schedulable and trigger Karpenter to launch new nodes. This also seems like what happened in your second case for a brief moment. Do you run any DaemonSets in your cluster? |
Thanks for looking into this problem! It seems you are right. It looks like the reason is:
Then, after a short time, spark created the second executor. It could not be assigned to instance#1 because this instance was not yet ready. In the events of the second pod, I saw:
And Karpenter launched a new instance for it:
Next, Spark almost simultaneously created the remaining pods. They also could not be assigned to the first 2 instances due to their 'not-ready' status. But this time, Karpenter processed them in a batch and did not create redundant instances.
I was able to remove this overprovisioning by adding to the pod spec:
But this doesn't seem to be the best solution.
Yes, there are a couple of DaemonSets, but they require few resources and, in theory, should not prevent the executor pods from launching. |
After discussion, I think this is a duplicate of #1044. |
Yes we are tracking this feature request in #1044. Closing this since it is a duplicate. Please reopen if it is not. |
Version
Karpenter: v0.5.6
Kubernetes: v1.19
Expected Behavior
Karpenter launches exactly as many ec2 instances as required by existing pods.
Actual Behavior
Karpenter was used to run Spark applications on AWS EKS. I ran 3 Spark applications simultaneously, each of them required 14 executor pods (42 executor pods in total).
As a result, 5 * 4xlarge instances and 11 * 8xlarge instances were launched, which is significantly more than required for these pods.
Thus, the total capacity of the instances was: 5 * 2 + 11 * 4 = 54 pods. But the apps together required only 42 executors pods (overprovisioning - 28%).
As a result, some of the instances were only half filled with pods.
Resource Specs and Logs
The provisioner was created specifically for executors of Spark applications:
All spark applications have the identical settings for their executor's pods, such that 4xlarge instance can hold 2 executor pods, 8xlarge - 4 pods.
Ec2 instances are created using the custom launchTemplate to increase a disk size:
And another similar case
Provisioner instance-type was defined as ["r5dn.2xlarge", "r5.2xlarge","r5.4xlarge","r5.8xlarge"]
Pods additionally had in the spec:
One Spark-app was started requiring 10 executors. Karpenter launched 10 ec2 instances (r5dn.2xlarge) for them.
Then I killed one of the executor pods, and as a result, a new executor was created instead of the killed one, on the same instance.
But at the same time, Karpenter launched a new ec2 instance, that was left unused and was deleted after ttlSecondsAfterEmpty.
Expected behavior: Karpenter should not create unused instances.
The text was updated successfully, but these errors were encountered: