-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented topology spread constraints for zone and hostname #619
Conversation
✔️ Deploy Preview for karpenter-docs-prod canceled. 🔨 Explore the source changes: 43216b8 🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/612d198a40f8d60008313d63 |
5587747
to
ebf0483
Compare
ffec861
to
cb564dd
Compare
cc85f59
to
6589255
Compare
fe6f800
to
143a20f
Compare
// getSchedules separates pods into a set of schedules. All pods in each group | ||
// contain compatible scheduling constarints and can be deployed together on the | ||
// same node, or multiple similar nodes if the pods exceed one node's capacity. | ||
func (s *Scheduler) getSchedules(ctx context.Context, provisioner *v1alpha3.Provisioner, pods []*v1.Pod) ([]*Schedule, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind, this code is untouched and a direct port of constraints.go (now removed)
143a20f
to
25f783d
Compare
} | ||
|
||
// computeZonalTopology for the topology group. Zones include viable zones for | ||
// the {cloudprovider, provisioner, pod }. If these zones change over time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra space }
?
} | ||
domain, ok := node.Labels[topologyGroup.Constraint.TopologyKey] | ||
if !ok { | ||
continue // Don't include pods if node doesn't contain domain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could you change the comment to explain why not including the domain matters? as it is now, the comment basically just restates what the code already says.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a link to all the details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking - I don't see the update but maybe I'm missing it?
type TopologyGroup struct { | ||
Constraint v1.TopologySpreadConstraint | ||
Pods []*v1.Pod | ||
// spread is an internal field used to track current spread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but what is spread in the first place, conceptually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe just clarify you mean the upstream sense of spread?
} | ||
} | ||
|
||
// Increment increments a domain if known |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't it increment the "spread" of a domain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
} | ||
} | ||
|
||
// Spread chooses a domain that minimizes skew and increments its count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this NextSpread()
or something? Right now the name of the function sounds kinda stateless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha I toyed with the idea of calling it Next() or Spread(). What do you think about NextDomain()
Overall I didn't see any problems, just a few comments about documentation/comments. |
@@ -46,15 +47,15 @@ import ( | |||
|
|||
const ( | |||
maxBatchWindow = 10 * time.Second | |||
batchIdleTimeout = 2 * time.Second | |||
batchIdleTimeout = 1 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
|
||
// 2. Separate pods into schedules of compatible scheduling constraints. | ||
schedules, err := s.getSchedules(ctx, provisioner, pods) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this errors for some reason, then the injected labels do not get removed. It might be fine if we're always retrying a reconciliation so we rebuild everything. If something was parallelized while solving, the pods would be getting additional labels added and removed. This just seems a bit leaky. Do you think copying the pods would be too much overhead? Or maybe defering the label deletion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. It felt a little bit dirty doing this in memory mutation thing in general, but it really really simplified the code. The extra annoying bit is that I can leave the injected zone because it'll end up being the same string, but I can't know the hostname (since I can't specify ec2 instance ids), so I have to rip out the hostname after scheduling.
The key detail is that in the injected NodeSelector is never deleted from the pod in this reconcile loop, but it is rebuilt on the next reconcile loop. I'm wary of the complexity it adds to remove these, but it might be tenable if I do it in this function and don't allow it to leak anywhere else. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth while to remove in this function. I'm not entirely sure where a leak would cause issue, but if it does, it might be difficult to spot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a small comment on the leakiness of the label injection to make the topology scheduling simpler, but overall, feel free to merge, looks great!
Issue, if available:
#481
Description of changes:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.