-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented volume topology aware scheduling #1015
Conversation
✔️ Deploy Preview for karpenter-docs-prod ready! 🔨 Explore the source changes: 3b6c7f7 🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61d5d1546835410008e61ff9 😎 Browse the preview: https://deploy-preview-1015--karpenter-docs-prod.netlify.app |
ffd50ce
to
afa565c
Compare
@@ -55,6 +56,10 @@ const ( | |||
CacheCleanupInterval = 10 * time.Minute | |||
) | |||
|
|||
func init() { | |||
v1alpha5.NormalizedLabels = functional.UnionStringMaps(v1alpha5.NormalizedLabels, map[string]string{"topology.ebs.csi.aws.com/zone": v1.LabelTopologyZone}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
if err := p.kubeClient.Get(ctx, types.NamespacedName{Name: volume.PersistentVolumeClaim.ClaimName, Namespace: pod.Namespace}, pvc); err != nil { | ||
return fmt.Errorf("getting persistent volume claim %s, %w", volume.PersistentVolumeClaim.ClaimName, err) | ||
} | ||
pvc.Annotations["volume.kubernetes.io/selected-node"] = node.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is necessary for some routine to wait for every PVC provision
to succeed/fail after setting this annotation to trigger PVC provision
. Step 3 here: https://github.com/kubernetes/design-proposals-archive/blob/main/storage/volume-topology-scheduling.md#bind.
Because if a PVC provision
fails then the CSI driver's provisioner controller signals failure by removing the selected-node annotation https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/a35bec6eb688ef7a0e31b340ed77bdb84010ace7/controller/controller.go#L1298. And some routine has to notice that + add the annotation again to trigger another PVC provision
.
The default scheduler prebind implementation of this wait routine is: https://github.com/kubernetes/kubernetes/blob/39c76ba2edeadb84a115cc3fbd9204a2177f1c28/pkg/scheduler/framework/plugins/volumebinding/binder.go#L462-L465
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I may just implement a PVC scheduler that watches PVCs and pods and makes sure that the annotation is set on the PVC for the node/pod. Thanks for pointing this out.
👀👏 |
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (5)ebs Previously acknowledged words that are now absentEbsAvailable dictionaries could cover words not in the dictionarycspell:cpp/cpp.txt (104293) covers 72 of them Consider adding them using:
To stop checking additional dictionaries, add:
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ellistarn/karpenter.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (1)ebs Previously acknowledged words that are now absentEbsAvailable dictionaries could cover words not in the dictionarycspell:cpp/cpp.txt (104293) covers 72 of them Consider adding them using:
To stop checking additional dictionaries, add:
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ellistarn/karpenter.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (1)ebs Previously acknowledged words that are now absentEbsAvailable dictionaries could cover words not in the dictionarycspell:cpp/cpp.txt (104293) covers 72 of them Consider adding them using:
To stop checking additional dictionaries, add:
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ellistarn/karpenter.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (1)ebs Previously acknowledged words that are now absentEbsAvailable dictionaries could cover words not in the dictionarycspell:cpp/cpp.txt (104293) covers 72 of them Consider adding them using:
To stop checking additional dictionaries, add:
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ellistarn/karpenter.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (2)ebs Previously acknowledged words that are now absentEbsAvailable dictionaries could cover words not in the dictionarycspell:cpp/cpp.txt (104293) covers 72 of them Consider adding them using:
To stop checking additional dictionaries, add:
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ellistarn/karpenter.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (2)ebs Previously acknowledged words that are now absentEbsAvailable dictionaries could cover words not in the dictionarycspell:cpp/cpp.txt (104293) covers 72 of them Consider adding them using:
To stop checking additional dictionaries, add:
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ellistarn/karpenter.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (3)ebs Previously acknowledged words that are now absentEbsAvailable dictionaries could cover words not in the dictionarycspell:cpp/cpp.txt (104293) covers 72 of them Consider adding them using:
To stop checking additional dictionaries, add:
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ellistarn/karpenter.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (2)amazonvpc Previously acknowledged words that are now absentebsAvailable dictionaries could cover words not in the dictionarycspell:cpp/cpp.txt (104293) covers 55 of them Consider adding them using:
To stop checking additional dictionaries, add:
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ellistarn/karpenter.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (2)amazonvpc Previously acknowledged words that are now absentebsAvailable dictionaries could cover words not in the dictionarycspell:cpp/cpp.txt (104293) covers 55 of them Consider adding them using:
To stop checking additional dictionaries, add:
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ellistarn/karpenter.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (2)amazonvpc Previously acknowledged words that are now absentebsAvailable dictionaries could cover words not in the dictionarycspell:cpp/cpp.txt (104293) covers 55 of them Consider adding them using:
To stop checking additional dictionaries, add:
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ellistarn/karpenter.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (5)amazonvpc Available dictionaries could cover words not in the dictionarycspell:cpp/cpp.txt (104293) covers 55 of them Consider adding them using:
To stop checking additional dictionaries, add:
To accept these unrecognized words as correct, run the following commands... in a clone of the [email protected]:ellistarn/karpenter.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
} | ||
for _, volume := range o.(*v1.Pod).Spec.Volumes { | ||
if volume.PersistentVolumeClaim == nil { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a pod has no PVCs at all, is there a way to short-circuit the reconcile? with eventFilters?
if err := p.coreV1Client.Pods(pods[i].Namespace).Bind(ctx, &v1.Binding{TypeMeta: pods[i].TypeMeta, ObjectMeta: pods[i].ObjectMeta, Target: v1.ObjectReference{Name: node.Name}}, metav1.CreateOptions{}); err != nil { | ||
logging.FromContext(ctx).Errorf("Failed to bind %s/%s to %s, %s", pods[i].Namespace, pods[i].Name, node.Name, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that we inlined since binding
was used just once, but it did make this line tougher for me to parse IMO :)
pvc.Annotations = functional.UnionStringMaps(pvc.Annotations, map[string]string{SelectedNodeAnnotation: pod.Spec.NodeName}) | ||
if err := c.kubeClient.Update(ctx, pvc); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm missing something obvious but why is this necessary? Once we bind the pod with the PVC volume to that node, won't this happen anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No! I thought that the PVC controller should be responsible for this, but it turns out that it's a prebind implementation within the kube scheduler. Any scheduler implementation (i.e. karpenter) is responsible for also doing volume binding.
This is unintuitive to me, and I feel like the kvc or pvc controller (or kubescheduler, for that matter) should be responsible for this. This was brought up by @wongma7 in a previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL!
preferences: NewPreferences(), | ||
volumeTopology: NewVolumeTopology(kubeClient), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we wrap VolumeTopology into Preferences and not expose it to the Controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are fully separate concepts. Preferences is all about relaxing preferred node affinities. Volume topology is a fully separate scheduling plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, as we continue to build out scheduling features, I'm always game for a reimagining of the implementation to something more streamlined -- there just wasn't anything obvious to me as I've built it piece by piece. Along these lines, I'd love to figure out how to get all the scheduling logic up into the selection controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preferences is all about relaxing preferred node affinities.
Aaah okay, that makes sense - on first reading I thought Preferences was the way we're getting all the scheduling logic into the controller. I suppose it was just the first.
Agreed though, it might be nice to have some kind of uniform contract for all our scheduler plugins to implement and unifying all of them into a different wrapper maybe?
Name string | ||
Labels map[string]string | ||
Annotations map[string]string | ||
metav1.ObjectMeta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a nice refactor 💯
1. Issue, if available:
#622
2. Description of changes:
Tested With:
Dynamic Provisioning
Static Provisioning
3. Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.