Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizes capacity creation #518

Merged
merged 8 commits into from
Jul 21, 2021
Merged

Parallelizes capacity creation #518

merged 8 commits into from
Jul 21, 2021

Conversation

ellistarn
Copy link
Contributor

@ellistarn ellistarn commented Jul 19, 2021

Description of changes:
Previously, the capacity API took multiple instances to allow cloud providers to batch create/get/delete API calls. However, this had a linearizing side effect such that all instances must be launched before karpenter was able to bind pods. This resulted in increased cases where the scheduler would schedule to capacity (launch by the linear loop) before Karpenter could execute the binds. Parallelizing also has the obvious side effect of making things launch a lot faster (unmeasured).

There are some long term considerations to this change:

  1. Cloud providers will no longer (easily) be able to batch capacity API calls
  2. Cloud providers will no longer be able to see multiple packings at once, which could allow for re-packing.

The primary benefits are:

  1. The Capacity API is now simpler and more natural.
  2. Launching capacity for constraint groups is now parallelized in vendor neutral code.
  3. A slow cloud provider implementation can no longer hang on linear capacity creation.
  4. More code (node object construction, parallelization) is moved out of the AWS Cloud Provider and into Karpenter's core.

Additionally, I swapped the verbosity of a few log statements.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ellistarn ellistarn changed the title Improve bind behavior at larger scales and simplified cloud provider API Parallelizes capacity creation to improve bind behavior at larger scales Jul 19, 2021
@ellistarn ellistarn force-pushed the tuning branch 2 times, most recently from 7e0262d to e060836 Compare July 19, 2021 01:14
@ellistarn ellistarn changed the title Parallelizes capacity creation to improve bind behavior at larger scales Parallelizes capacity creation Jul 19, 2021
}
}
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this function always be returning true? Wouldn't we want it to only return true if it does contain the string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Range function continues if you return true. We want to process the entire list, so we always return true. The check for adding the LT to the list is on line 103.

return nil, fmt.Errorf("expected a single instance, got %d", len(describeInstancesOutput.Reservations[0].Instances))
}
instance := *describeInstancesOutput.Reservations[0].Instances[0]
zap.S().Infof("Launched instance: %s, type: %s, zone: %s, hostname: %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to include this outside of the cloudprovider, so that the logging logic remains the same across cloud providers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this code was just moved, but unfortunately, it's a little difficult to wire this up with our current return type being a node, which doesn't hold this specific information until it actually comes online.

@@ -68,7 +68,7 @@ func (u *Utilization) markUnderutilized(ctx context.Context, provisioner *v1alph
if err := u.kubeClient.Patch(ctx, node, client.MergeFrom(persisted)); err != nil {
return fmt.Errorf("patching node %s, %w", node.Name, err)
}
zap.S().Debugf("Added TTL and label to underutilized node %s", node.Name)
zap.S().Infof("Added TTL and label to underutilized node %s", node.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the two log lines in this file should be Debug. I think that adding TTL and removing TTL could happen a lot, and don't think it's as important as creating and terminating an instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to discuss offline. I'm not sure this should be happening a lot, and if it is, I think it makes sense to be visible.

packedNodes, err := c.CloudProvider.Create(ctx, provisioner, packings)
// 6. Create capacity
errs := make([]error, len(packings))
workqueue.ParallelizeUntil(ctx, len(packings), len(packings), func(index int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should limit the workers so that we don't get throttled immediately, 5 might be a good starting point based on https://docs.aws.amazon.com/AWSEC2/latest/APIReference/throttling.html#throttling-limits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this limiting should happen on the cloud provider side. I don't want to artificially limit things on the karp/core side now that we have the async interface. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah definitely on CP side 👍

@@ -27,7 +27,7 @@ import (
// CloudProvider holds contains the methods necessary in a cloud provider
type CloudProvider interface {
// Create a set of nodes for each of the given constraints.
Create(context.Context, *v1alpha3.Provisioner, []*Packing) ([]*PackedNode, error)
Create(context.Context, *v1alpha3.Provisioner, *Packing) (*v1.Node, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned this limits the cloud providers' ability to do any optimization. We're kind of brute-forcing our way out of it at the controller level. At the expense of making the Create interface more complicated, would it be worth making it asynchronous with a richer response than just a v1.Node. The controller could track the actual provisioning status and drive the packings down until the desired state is reached.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One optimization that can probably happen pretty easily in the cloud provider currently is on the usual scale up case where Create will be launching a few or a bunch of the largest nodes possible. We can probably get significantly better scaling speeds for this case if we do one request for multiple instances. We should even be able to maintain a diverse capacity request since the pod density should be the same in this situation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want us to limit that capability as we find more cases or more capabilities are introduced to improve the cloud provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree with you here and played with the idea of using a callback model.

Create(ctx, provisioner, packing, func(node v1.Node) error {})

In the callback, we'd execute all of the bind logic.

Can we punt on this until we're ready to make more than one request at a time? I can also play with it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha love it!

@ellistarn ellistarn force-pushed the tuning branch 3 times, most recently from 159bd71 to 7b18a4a Compare July 21, 2021 00:29
bwagner5
bwagner5 previously approved these changes Jul 21, 2021
Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work on this!

"github.com/awslabs/karpenter/pkg/utils/project"
"go.uber.org/zap"
v1 "k8s.io/api/core/v1"
"knative.dev/pkg/apis"
)

const (
// CreationQPS limits the number of requests per second to CreateFleet
// https://docs.aws.amazon.com/AWSEC2/latest/APIReference/throttling.html#throttling-limits
CreationQPS = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe configurable.. but we can wait and maybe figure out how to do that.. maybe

}

// NewWorkQueue instantiates a new WorkQueue
func NewWorkQueue(qps int, burst int) *WorkQueue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ellistarn ellistarn merged commit cf2a869 into aws:main Jul 21, 2021
@ellistarn ellistarn deleted the tuning branch July 21, 2021 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants