remove the specialness from GPU requests #1489

tzneal · 2022-03-09T14:45:37Z

1. Issue, if available:

N/A

2. Description of changes:

This modifies Karpenter to treat resource requests identically. It
removes the special logic for some AWS specific GPU types and
allows CloudProvider implementers to provide their own custom
resource types along with an ordering to be used as a hint for
binpacking regarding which instance types to prefer.

3. How was this change tested?

Unit tests & deploying GPU/non-GPU workloads. Quota limits prevented GPU instances from being
created, but I could see the requests.

4. Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: link to issue
No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify · 2022-03-09T14:45:42Z

✅ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: 908be5e

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/6238cd1e12a6f200081a3611

pkg/controllers/provisioning/binpacking/packable.go

pkg/controllers/provisioning/binpacking/packer.go

pkg/cloudprovider/aws/suite_test.go

pkg/cloudprovider/types.go

bwagner5

Nice! It is nice to get the AWS stuff out of the core code!

pkg/cloudprovider/aws/instancetype.go

pkg/cloudprovider/types.go

pkg/controllers/provisioning/scheduling/scheduler.go

pkg/utils/resources/resources.go

pkg/cloudprovider/aws/ami.go

pkg/cloudprovider/types.go

ellistarn · 2022-03-17T23:37:05Z

pkg/cloudprovider/aws/cloudprovider.go

@@ -58,6 +60,9 @@ const (

 func init() {
 	v1alpha5.NormalizedLabels = functional.UnionStringMaps(v1alpha5.NormalizedLabels, map[string]string{"topology.ebs.csi.aws.com/zone": v1.LabelTopologyZone})


Man it would be sweet to register NormalizedLabels in the same way!

pkg/cloudprovider/registry.go

pkg/controllers/provisioning/binpacking/packer.go

pkg/cloudprovider/aws/instancetype.go

ellistarn · 2022-03-19T19:58:47Z

pkg/cloudprovider/aws/instancetype.go

+		}
+	}
+
+	// These are meant to give some relative weighting


This comment makes me uneasy as a reader. They're meant to do X, but do they? Is it accurate to be more concrete?

Maybe I should just remove that comment entirely. Trying to indicate that it's not a real "price", but people should be able to figure that out easily enough.

ellistarn · 2022-03-19T20:06:19Z

pkg/cloudprovider/registry.go

+	// creation unless a pod specifically requests this resource type.  This is useful for preventing non-GPU workloads
+	// from possibly scheduling to more expensive GPU instance type, or from causing a GPU instance type to scale up to
+	// the next larger type due to a non-GPU workload
+	ResourceFlagMinimizeUsage


I keep thinking about this complexity. I'm wondering if it's enough to purely delineate on WellKnown vs NotWellKnown resource types? If we know about it (e.g., CPU, Memory, EphemeralStorage) then we can use the ResourceFlagNone behavior. If we don't know about it, we can use ResourceFlagMinimizeUsage behavior. This will enable us to simply not configure any of this (for now).

My guess is that we will need to be more sophisticated with binpacking in the future, but I don't think we have enough use cases to have clarity on what the parameters are. Given that with currently known resource types, the above suggestion is equivalent in implementation.

Happy to move forward in either direction.

I left a longer comment below, but I'm starting to think this complexity isn't worth it since kube-scheduler won't follow this logic and we're left with using taints/tolerations again which are already built-in.

pkg/cloudprovider/types.go

tzneal commented Mar 9, 2022

View reviewed changes

pkg/controllers/provisioning/binpacking/packable.go Outdated Show resolved Hide resolved

tzneal commented Mar 9, 2022

View reviewed changes

pkg/controllers/provisioning/binpacking/packer.go Show resolved Hide resolved

tzneal commented Mar 9, 2022

View reviewed changes

pkg/cloudprovider/aws/suite_test.go Outdated Show resolved Hide resolved

tzneal commented Mar 9, 2022

View reviewed changes

pkg/cloudprovider/types.go Outdated Show resolved Hide resolved

bwagner5 reviewed Mar 9, 2022

View reviewed changes

ellistarn reviewed Mar 9, 2022

View reviewed changes

pkg/cloudprovider/aws/ami.go Outdated Show resolved Hide resolved

ellistarn reviewed Mar 9, 2022

View reviewed changes

pkg/cloudprovider/types.go Outdated Show resolved Hide resolved

tzneal force-pushed the make-gpu-requests-generic branch 6 times, most recently from 6560c2e to 1e148d7 Compare March 17, 2022 15:29

ellistarn reviewed Mar 17, 2022

View reviewed changes

pkg/cloudprovider/registry.go Outdated Show resolved Hide resolved

ellistarn reviewed Mar 17, 2022

View reviewed changes

pkg/cloudprovider/registry.go Outdated Show resolved Hide resolved

ellistarn reviewed Mar 17, 2022

View reviewed changes

pkg/controllers/provisioning/binpacking/packer.go Show resolved Hide resolved

tzneal force-pushed the make-gpu-requests-generic branch 2 times, most recently from 85cd89a to 2f3a0fe Compare March 18, 2022 17:38

tzneal requested review from ellistarn and bwagner5 March 18, 2022 17:41

ellistarn reviewed Mar 18, 2022

View reviewed changes

pkg/cloudprovider/aws/instancetype.go Outdated Show resolved Hide resolved

tzneal force-pushed the make-gpu-requests-generic branch 2 times, most recently from dfc4d26 to 0a91fdd Compare March 19, 2022 02:14

ellistarn reviewed Mar 19, 2022

View reviewed changes

pkg/cloudprovider/aws/instancetype.go Outdated Show resolved Hide resolved

ellistarn reviewed Mar 19, 2022

View reviewed changes

pkg/cloudprovider/types.go Outdated Show resolved Hide resolved

cloudgeek7 mentioned this pull request Oct 24, 2023

[Snyk] Upgrade autoprefixer from 10.4.13 to 10.4.16 cloudgeek7/karpenter#139

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove the specialness from GPU requests #1489

remove the specialness from GPU requests #1489

tzneal commented Mar 9, 2022

netlify bot commented Mar 9, 2022 •

edited

Loading

bwagner5 left a comment

ellistarn Mar 17, 2022

ellistarn Mar 19, 2022

tzneal Mar 20, 2022

ellistarn Mar 19, 2022

tzneal Mar 20, 2022

		@@ -58,6 +60,9 @@ const (

		func init() {
		v1alpha5.NormalizedLabels = functional.UnionStringMaps(v1alpha5.NormalizedLabels, map[string]string{"topology.ebs.csi.aws.com/zone": v1.LabelTopologyZone})

remove the specialness from GPU requests #1489

remove the specialness from GPU requests #1489

Conversation

tzneal commented Mar 9, 2022

netlify bot commented Mar 9, 2022 • edited Loading

bwagner5 left a comment

Choose a reason for hiding this comment

ellistarn Mar 17, 2022

Choose a reason for hiding this comment

ellistarn Mar 19, 2022

Choose a reason for hiding this comment

tzneal Mar 20, 2022

Choose a reason for hiding this comment

ellistarn Mar 19, 2022

Choose a reason for hiding this comment

tzneal Mar 20, 2022

Choose a reason for hiding this comment

netlify bot commented Mar 9, 2022 •

edited

Loading