Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flexibility in pod density for non-VPC CNI nodes #1032

Merged
merged 8 commits into from
Jan 19, 2022
Merged

Add flexibility in pod density for non-VPC CNI nodes #1032

merged 8 commits into from
Jan 19, 2022

Conversation

mikesir87
Copy link
Contributor

1. Issue, if available:
Resolves #445 and associated with #954

2. Description of changes:
This adds the ability to specify spec.kubeletConfiguration.maxPods.

While the kubelet type only supports non-negative values, there are cases in which other providers (such as the AL2 EKS-optimized AMI) use other flags. In the AL2 case, it uses --use-max-pods=false to indicate it shouldn't refer to a max ENI listing for the number of pods (which means it's making the assumption you're using the AWS VPC CNI). So, if a negative number is provided, it's expected that the provider will not use the negative number, but indicate no value should be specified and defaults used. In the case of the AL2 provider, we'd add the --use-max-pods=false flag.

As we only have the AL2 provider right now, I updated the user data generation to specify either the --use-max-pods=false flag or add a --max-pods flag to the --extra-kubelet-args.

3. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Dec 21, 2021

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: e359983

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61e893336590c20007dd8830

@olemarkus
Copy link
Contributor

This doesn't really solve your problem. If you set maxPod on kubelet, Karpenter will still try to over/under allocate.
Look at https://github.com/aws/karpenter/blob/main/pkg/cloudprovider/aws/instance.go#L253 (and the similar call some lines below) and the algorithm it uses here https://github.com/aws/karpenter/blob/12cbb2d3b2a32e5f467749750da779a0164ca12d/pkg/cloudprovider/aws/instancetype.go#L72-L77

This is the bit I mentioned in #954 (comment) that is actually causing problem.

Using your approach, you need to get the maxPod value into the code above, overriding the logic in InstanceType.Pods()

@mikesir87
Copy link
Contributor Author

But doesn't Karpenter only do scheduling if a pod is unscheduled? It's not trying to take the place of the kube-scheduler, right? I'm currently running into the problem that we have underutilized nodes due to this setting being in place. I guess I'm not understanding where/when this is being used.

@olemarkus
Copy link
Contributor

It replaces kube-scheduler in the case of Pending pods.
So e.g if you have 20 Pods unscheduled, it will disregard t3.medium because it thinks the Pods cannot fit. Similarly, if the kubelet is configured with default maxPods of 110, it will think it can fit 700 Pods onto a t3.16xlarge.

@mikesir87
Copy link
Contributor Author

Ok. That's what I thought. The problem I'm having is I'm getting into that Pending state earlier that expected because I can't pack more pods onto a node (because I can't set this flag).

So, to me, it sounds like we need to have both the setting, but then make it available down in this calculation you're referring to. If the maxPod value is set, we should use that. Otherwise, we assume (possibly wrongly) that the AWS VPC CNI is being used and fallback to what value makes sense for that instance type. That make sense?

@olemarkus
Copy link
Contributor

Many are not using AMIs that assume VPC CNI (e.g I just use Ubuntu default), so they don't run into your specific part of the problem. But yes, to solve your case, you need both.

@olemarkus
Copy link
Contributor

I amended your commit here: main...olemarkus:445-specify-max-pods-setting

  • The settings propagates also into Pods
  • Also added support for --pods-per-core should anyone want the more dynamic limitation.

@mikesir87
Copy link
Contributor Author

I like your updates @olemarkus and I'll fold them into my PR (if that's cool with you).

@olemarkus
Copy link
Contributor

Please do. That was what I hoped for 😊

@mikesir87
Copy link
Contributor Author

Updating this PR comment thread based on conversations on Slack and in the working group...

We've decided not to introduce any API changes at this point, but add a CLI flag to indicate we are not running in "AWS ENI mode". The flag, --aws-eni-limited-pod-density will be a boolean flag and default to true. When set to true, the current behavior will run. Otherwise, we will set maxPod to 110. We all recognize that this still isn't configurable, but isn't a long-term answer while we perform additional analysis on overhead and how this can/should be solved longer term.

@mikesir87 mikesir87 changed the title Add ability to specify maxPods Add flexibility in pod density for non-VPC CNI nodes Jan 10, 2022
@mikesir87
Copy link
Contributor Author

Alrighty @ellistarn and @olemarkus... Finally got around to adding the CLI flag, setting the correct user-data config, and updating the InstanceType.Pods. Couple of notes/thoughts:

  • I'm not a fan of adding the options.Options to the InstanceType. But, figured I could start there and adjust based on input.
  • I didn't do anything related to memory overhead calculations. Where do you expect those to be applied?

@@ -259,6 +259,12 @@ exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
nodeTaintsArgs := p.getNodeTaintArgs(constraints)
kubeletExtraArgs := strings.Trim(strings.Join([]string{nodeLabelArgs, nodeTaintsArgs.String()}, " "), " ")

if !injection.GetOptions(ctx).AWSENILimitedPodDensity {
userData.WriteString(` \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind moving towards string concatenation, now that you're in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... now I see why it was a bytes.Buffer instead of string concatenation. The userData has to be base64 encoded, which needs a byte[].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha

@@ -70,6 +72,10 @@ func (i *InstanceType) Memory() *resource.Quantity {
}

func (i *InstanceType) Pods() *resource.Quantity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to update the Overhead logic to always use the ENILimited MaxPods value, even if not using max pods. We're gonna need a nice verbose comment on this including the fact that we want to make it more accurate XD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So... are you proposing we don't make any changes (minus a comment update) in this func then since that's what it's currently doing/using?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this is a bit confusing.

The Pods() function should return

  • 110 if not eni limited
  • the limit if eni limited

The Overhead() function should return

  • overhead calculated by the eni limited pod count, regardless of the result of Pods().

I think you're going to minimally need a help to extract it out. I think this was outlined in slack, but it's pretty confusing. The idea was to just follow the eksoptimizedami convention. @suket22 may be the best person for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Gotcha... I haven't spent enough time in the scheduling part, so had totally missed the Overhead() function. I can take a stab at it and get feedback on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty... ready for another look. I added some comments, but nothing too crazy. The blame can take you back to this PR with more history too. Feel free to add more if you'd like 👍

@ellistarn
Copy link
Contributor

LGTM. Nice work. Make sure you get an approval from @suket22

@mikesir87
Copy link
Contributor Author

Thanks @ellistarn for the input and help! Not sure about the failing netlify deploy tasks. But, looks like we're good to go now that the CI builds are passing. Will pass it on the @suket22 👍

@ellistarn
Copy link
Contributor

I think you might need a rebase to fix netlify.

mikesir87 and others added 7 commits January 19, 2022 13:21
Signed-off-by: Michael Irwin <[email protected]>

When set to false, it's currently setting maxPods value to 110,
the default for kubelet. Instead of using the defaults, explicitly
setting it lets us use that value in calculations for instance
sizing, etc.
Signed-off-by: Michael Irwin <[email protected]>
@mikesir87
Copy link
Contributor Author

Ok. Thought I had rebased. Anywho... did it again and it looks like we're good to go now 👍

Signed-off-by: Michael Irwin <[email protected]>
Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify max pods on provisioner resource
3 participants