-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flexibility in pod density for non-VPC CNI nodes #1032
Conversation
✔️ Deploy Preview for karpenter-docs-prod canceled. 🔨 Explore the source changes: e359983 🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61e893336590c20007dd8830 |
This doesn't really solve your problem. If you set maxPod on kubelet, Karpenter will still try to over/under allocate. This is the bit I mentioned in #954 (comment) that is actually causing problem. Using your approach, you need to get the maxPod value into the code above, overriding the logic in |
But doesn't Karpenter only do scheduling if a pod is unscheduled? It's not trying to take the place of the kube-scheduler, right? I'm currently running into the problem that we have underutilized nodes due to this setting being in place. I guess I'm not understanding where/when this is being used. |
It replaces kube-scheduler in the case of Pending pods. |
Ok. That's what I thought. The problem I'm having is I'm getting into that Pending state earlier that expected because I can't pack more pods onto a node (because I can't set this flag). So, to me, it sounds like we need to have both the setting, but then make it available down in this calculation you're referring to. If the maxPod value is set, we should use that. Otherwise, we assume (possibly wrongly) that the AWS VPC CNI is being used and fallback to what value makes sense for that instance type. That make sense? |
Many are not using AMIs that assume VPC CNI (e.g I just use Ubuntu default), so they don't run into your specific part of the problem. But yes, to solve your case, you need both. |
I amended your commit here: main...olemarkus:445-specify-max-pods-setting
|
I like your updates @olemarkus and I'll fold them into my PR (if that's cool with you). |
Please do. That was what I hoped for 😊 |
Updating this PR comment thread based on conversations on Slack and in the working group... We've decided not to introduce any API changes at this point, but add a CLI flag to indicate we are not running in "AWS ENI mode". The flag, |
Alrighty @ellistarn and @olemarkus... Finally got around to adding the CLI flag, setting the correct user-data config, and updating the InstanceType.Pods. Couple of notes/thoughts:
|
@@ -259,6 +259,12 @@ exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1 | |||
nodeTaintsArgs := p.getNodeTaintArgs(constraints) | |||
kubeletExtraArgs := strings.Trim(strings.Join([]string{nodeLabelArgs, nodeTaintsArgs.String()}, " "), " ") | |||
|
|||
if !injection.GetOptions(ctx).AWSENILimitedPodDensity { | |||
userData.WriteString(` \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind moving towards string concatenation, now that you're in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... now I see why it was a bytes.Buffer instead of string concatenation. The userData has to be base64 encoded, which needs a byte[].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha
@@ -70,6 +72,10 @@ func (i *InstanceType) Memory() *resource.Quantity { | |||
} | |||
|
|||
func (i *InstanceType) Pods() *resource.Quantity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to update the Overhead logic to always use the ENILimited MaxPods value, even if not using max pods. We're gonna need a nice verbose comment on this including the fact that we want to make it more accurate XD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. So... are you proposing we don't make any changes (minus a comment update) in this func then since that's what it's currently doing/using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this is a bit confusing.
The Pods() function should return
- 110 if not eni limited
- the limit if eni limited
The Overhead() function should return
- overhead calculated by the eni limited pod count, regardless of the result of Pods().
I think you're going to minimally need a help to extract it out. I think this was outlined in slack, but it's pretty confusing. The idea was to just follow the eksoptimizedami convention. @suket22 may be the best person for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Gotcha... I haven't spent enough time in the scheduling part, so had totally missed the Overhead()
function. I can take a stab at it and get feedback on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alrighty... ready for another look. I added some comments, but nothing too crazy. The blame can take you back to this PR with more history too. Feel free to add more if you'd like 👍
LGTM. Nice work. Make sure you get an approval from @suket22 |
Thanks @ellistarn for the input and help! Not sure about the failing netlify deploy tasks. But, looks like we're good to go now that the CI builds are passing. Will pass it on the @suket22 👍 |
I think you might need a rebase to fix netlify. |
Signed-off-by: Michael Irwin <[email protected]>
This reverts commit 6b52220.
Signed-off-by: Michael Irwin <[email protected]> When set to false, it's currently setting maxPods value to 110, the default for kubelet. Instead of using the defaults, explicitly setting it lets us use that value in calculations for instance sizing, etc.
Signed-off-by: Michael Irwin <[email protected]>
Co-authored-by: Ellis Tarn <[email protected]>
Signed-off-by: Michael Irwin <[email protected]>
Signed-off-by: Michael Irwin <[email protected]>
Ok. Thought I had rebased. Anywho... did it again and it looks like we're good to go now 👍 |
Signed-off-by: Michael Irwin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
1. Issue, if available:
Resolves #445 and associated with #954
2. Description of changes:
This adds the ability to specify
spec.kubeletConfiguration.maxPods
.While the kubelet type only supports non-negative values, there are cases in which other providers (such as the AL2 EKS-optimized AMI) use other flags. In the AL2 case, it uses
--use-max-pods=false
to indicate it shouldn't refer to a max ENI listing for the number of pods (which means it's making the assumption you're using the AWS VPC CNI). So, if a negative number is provided, it's expected that the provider will not use the negative number, but indicate no value should be specified and defaults used. In the case of the AL2 provider, we'd add the--use-max-pods=false
flag.As we only have the AL2 provider right now, I updated the user data generation to specify either the
--use-max-pods=false
flag or add a--max-pods
flag to the--extra-kubelet-args
.3. Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.