-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Offerings to InstanceType to hold zone and capacity type, remove OS from InstanceType, promote capacity-type label #780
Conversation
✔️ Deploy Preview for karpenter-docs-prod canceled. 🔨 Explore the source changes: 4652547 🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/618c0f66cd96c400085956c3 |
pkg/cloudprovider/aws/instance.go
Outdated
for _, offering := range instanceType.Offerings() { | ||
zone := offering.Zone | ||
// we can't assume that all zones will be available for all capacity types, hence this check | ||
// (we shouldn't ever have duplicate zones, just being paranoid to create a well-formed set of overrides) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of check is like wearing kevlar in your own living room in case you accidentally shoot yourself. There's no customer exposed surface here and Offerings() is a set (right?). IMO we should skip the zone uniqueness check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offerings() is not a set, it's a slice, so I fundamentally/philosophically disagree with you here. A duplicate is very much possible. A bug could easily be introduced where we have duplicate offerings. Because there's no constructor like Java, it's not like I can validate the initialization of the InstanceType struct (correct me if I'm wrong here), so there's nothing to maintain the invariant that the Offerings are all unique, unless I invent a set type specifically for Offering, which seems like way more work than adding 2 lines of code here.
The downside if I'm wrong and we never have a duplicate is we have an extra set that has a handful of elements in it and an extra conditional check. The downside if you're wrong is that users will fail to scale out because of invalid LT overrides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear ya'. I don't think there's going to be a great solution to this type of thing until Go adds generics to the language, at which point it would be pretty easy to make Offerings() return a set. FWIW, I think that the "house style" of a lot of Go code (in Kubernetes land) is to assume validation has occurred elsewhere and not keep checking invariants. Personally, I would suggest you just make the call here whether to keep or not, although I would probably remove the check myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ellis and I came up with a good compromise - I'm going to move the validation to the instance type provider. Update coming momentarily.
Team work makes the dream work 🌈
@@ -65,16 +65,15 @@ IDs.](https://docs.aws.amazon.com/ram/latest/userguide/working-with-az-ids.html) | |||
|
|||
### Capacity Type | |||
|
|||
- key: `node.k8s.aws/capacity-type` | |||
- key: `karpenter.sh/capacity-type` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be a lil' careful on updating docs before releasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following your comment here. Are you saying I shouldn't update the docs? Or is there some other issue with my doc update?
This change is essentially backwards incompatible so it's important IMO that the docs match the functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs are published immediately when the PR is merged, but we won't release a new version immediately when merged. So the docs will be updated with the new karpenter.sh/capacity-type
but our latest published container image will not include it. Anyone following the docs between that time will get errors.
It might be better to open a second PR w/ the docs changes and label it with the release to coordinate the docs release at the same time as the artifact releases? We don't have a great way of handling it today :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a PITA. Should we have a doc branch then? I don't know what to do with this second PR exactly - what does approval of it mean, because we can't pull it until we're ready to cut a release, right? And then, if there are multiple doc changes from multiple PRs someone has to go through the fun exercise of resolving conflicts. Am I missing something?
Is there some special syntax with labeling the commit? Do you mean just putting v0.5 in the commit message? Or do you mean adding the label in the PR in the GitHib UI? If so, I also have no idea how to do this - on this very PR on the right side the labels say "None yet" and there's no edit button or anything.
I'll back this change out, but I don't know how we make sure everyone currently contributing to Karpenter knows about this in the interim because it is a backwards incompatible change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: labeling - am I supposed to do something like this?
git tag v0.4.2 <commit_sha>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant in the github UI, you should have labeling permissions. Maybe we should just cut a v0.4.3 release when this gets merged since we are in a state of rapid iteration here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure I don't have those permissions, how can I get them? I should already be in awslabs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eptiger I just made you an Admin on this repo, so hopefully that'll work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh Brandon gave me the permissions I needed, but I don't see a label for v0.4.3 yet - should I create one?
I appreciate you including docs changes, it's helping me keep the thread. |
func ExpectPodNotScheduled(c client.Client, name string, namespace string) { | ||
pod := &v1.Pod{} | ||
Expect(c.Get(context.Background(), client.ObjectKey{Name: name, Namespace: namespace}, pod)).To(Succeed()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a chance to fully review this. Feel free to push forward w/ Jacob's review. I agree with the high level direction.
@@ -59,6 +60,15 @@ func ExpectNotFound(c client.Client, objects ...client.Object) { | |||
} | |||
} | |||
|
|||
func ExpectPodNotScheduled(c client.Client, name string, namespace string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scheduling only happens to pods, so you could simplify to ExpectNotScheduled
@@ -59,6 +60,15 @@ func ExpectNotFound(c client.Client, objects ...client.Object) { | |||
} | |||
} | |||
|
|||
func ExpectPodNotScheduled(c client.Client, name string, namespace string) { | |||
pod := &v1.Pod{} | |||
Expect(c.Get(context.Background(), client.ObjectKey{Name: name, Namespace: namespace}, pod)).To(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use pod = ExpectPodExists()
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
//TODO filter out possible zones and capacity types using an ICE cache https://github.com/awslabs/karpenter/issues/371 | ||
offerings := []cloudprovider.Offering{} | ||
for zone := range subnetZones.Intersection(instanceTypeZones[instanceType.Name()]) { | ||
for _, capacityType := range instanceType.SupportedUsageClasses { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sugg: may want to protect against uniqueness here
for capacityType := range sets.NewString(aws.StringValue(instanceType.SupportedUsageClasses)...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one final comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I noticed a couple of optional things. You should feel totally free to merge as is. Happy to approve again in case you want to tweak.
pkg/cloudprovider/aws/instance.go
Outdated
@@ -172,7 +172,12 @@ func (p *InstanceProvider) getLaunchTemplateConfigs(ctx context.Context, constra | |||
func (p *InstanceProvider) getOverrides(instanceTypeOptions []cloudprovider.InstanceType, subnets []*ec2.Subnet, capacityType string) []*ec2.FleetLaunchTemplateOverridesRequest { | |||
var overrides []*ec2.FleetLaunchTemplateOverridesRequest | |||
for i, instanceType := range instanceTypeOptions { | |||
for zone := range instanceType.Zones() { | |||
for _, offering := range instanceType.Offerings() { | |||
zone := offering.Zone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: I might just inline this where its used. I don't think there's much of a perf angle, and it adds an extra concept to think about as you read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was an artifact of the uniqueness check
if constraints.Requirements.CapacityTypes().Has(o.CapacityType) { | ||
if constraints.Requirements.Zones().Has(o.Zone) { | ||
zone = o.Zone | ||
capacityType = o.CapacityType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to break here? Does it matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh shoot yeah, good catch. I mean it ultimately doesn't matter, but yeah I was supposed to. I don't know how I missed that =P
amdGPUs resource.Quantity | ||
awsNeurons resource.Quantity | ||
name string | ||
zones sets.String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible to remove zone/capacityTypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Darn it, in the midst of test debugging I missed this - thanks!
if schedule.Requirements.OperatingSystems().Intersection(p.OperatingSystems()).Len() == 0 { | ||
return fmt.Errorf("operating system %s is not in %v", p.OperatingSystems(), schedule.Requirements.OperatingSystems().List()) | ||
func (p *Packable) validateZones(schedule *scheduling.Schedule) error { | ||
packableZones := sets.String{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional:
Not sure if I mentioned earlier, but go style would be to let the "packable" piece of this variable be inferred by context of it being within a Packable struct. This is known as "stuttering". Sometimes if you have multiple concepts that collide (i.e. two different sources of zone), you need to get a bit clever about naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is I think actually an artifact of me doing something stupidly complicated here where I had two sets of zones and had to differentiate them. Fixed.
…OS from InstanceType
…houldn't have been scheduled
1. Issue, if available:
Part of #371
2. Description of changes:
I'm not entirely sure if I need to add additional unit tests at this time - I think we're covered with what's there but open to feedback.
3. Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.