Fix the insufficient memory instance type bug #1080

felix-zhe-huang · 2022-01-04T19:02:08Z

1. Issue, if available:
resolve issue #1034

2. Description of changes:

3. Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: link to issue
No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify · 2022-01-04T19:02:14Z

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: ec6ef9d

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61d60fc157c7d8000767376c

ellistarn

Can you include a test to prevent this from regressing?

pkg/controllers/provisioning/binpacking/packable.go

pkg/controllers/provisioning/binpacking/packer.go

felix-zhe-huang · 2022-01-05T02:57:55Z

Can you include a test to prevent this from regressing?

Happy to do so.

pkg/controllers/provisioning/suite_test.go

ellistarn · 2022-01-05T03:08:48Z

pkg/utils/injection/injection.go

@@ -79,3 +79,17 @@ func GetControllerName(ctx context.Context) string {
 	}
 	return name.(string)
 }
+
+type testInstanceKey struct{}


This injection pattern is useful to keep interfaces in production code clean. I'm wary of coupling test implementation details to this injection package. There are some other examples of injecting data into the EC2 fake. Happy to help offline if you need.

Yeah this may not be ideal. I am removing it and implements a function to switch on and off those special instance types instead.

bwagner5 · 2022-01-05T21:09:34Z

pkg/controllers/provisioning/suite_test.go

+					test.PodOptions{
+						ResourceRequirements: v1.ResourceRequirements{Requests: v1.ResourceList{v1.ResourceCPU: resource.MustParse("3"), v1.ResourceMemory: resource.MustParse("3Gi")}},
+					},
+				))[0]


I'm not sure what the output is if ExpectedProvisioned returns 0 pods, but I'm guessing it's a not very graceful crash. Would it be more informative to the user to add an explicit:

Expect(len(pods)).To(Equal(1)) node := ExpectScheduled(ctx, env.Client, pod[0])

bwagner5 · 2022-01-05T21:18:34Z

pkg/cloudprovider/fake/cloudprovider.go

@@ -39,6 +39,17 @@ func (c *CloudProvider) Create(_ context.Context, constraints *v1alpha5.Constrai
 	for i := 0; i < quantity; i++ {
 		name := strings.ToLower(randomdata.SillyName())
 		instance := instanceTypes[0]
+		// To create error test cases, give preferences to the illy constructed instance types


I'd be confused looking at just the tests on how the bad instance types are guaranteed to cause a test failure. I think it would be more apparent if the fake cloudprovider was more generic and the setup of a bad set of instance types is set in the test.

You could use the exported InstanceTypes field in the CloudProvider struct to set a specific set of bad instance types that would be apparent in the actual test that it is guaranteed to fail if the binpacking short circuit logic is not working. Wdyt?

bwagner5 · 2022-01-05T21:19:23Z

pkg/controllers/provisioning/suite_test.go

+					},
+				))[0]
+				node := ExpectScheduled(ctx, env.Client, pod)
+				Expect(*node.Status.Allocatable.Cpu()).To(Equal(resource.MustParse("4")))


This seems a bit fragile if we change the fake instance type data, the functionality of not choosing an instance type may still work properly but the test could fail if the fake instance type doesn't have 4 vcpus and 4 GBs of allocatable RAM.

pkg/controllers/provisioning/binpacking/packer.go

ellistarn

Discussed offline. Testing in this way is fairly misleading, and is better done in real CI w/ chaos testing. Let's cut an issue for chaos testing of pod resource requests in a real CI environment and merge this with just the code change.

bwagner5

lgtm

felix-zhe-huang requested review from bwagner5 and ellistarn January 4, 2022 19:16

ellistarn reviewed Jan 4, 2022

View reviewed changes

pkg/controllers/provisioning/binpacking/packable.go Outdated Show resolved Hide resolved

ellistarn reviewed Jan 4, 2022

View reviewed changes

pkg/controllers/provisioning/binpacking/packer.go Outdated Show resolved Hide resolved

Fix the insufficient memory instance type bug

ddd310c

felix-zhe-huang force-pushed the issue1034 branch from 089c388 to ddd310c Compare January 4, 2022 23:11

felix-zhe-huang force-pushed the issue1034 branch from bca6c40 to a30dd08 Compare January 5, 2022 02:59

Add pod number check, add unit test case

4ade204

felix-zhe-huang force-pushed the issue1034 branch from a30dd08 to 4ade204 Compare January 5, 2022 03:03

ellistarn reviewed Jan 5, 2022

View reviewed changes

felix-zhe-huang force-pushed the issue1034 branch 2 times, most recently from 1b8e510 to ce043cd Compare January 5, 2022 20:32

bwagner5 reviewed Jan 5, 2022

View reviewed changes

pkg/controllers/provisioning/binpacking/packer.go Show resolved Hide resolved

ellistarn reviewed Jan 5, 2022

View reviewed changes

Implement suggested changes

ec6ef9d

felix-zhe-huang force-pushed the issue1034 branch from ce043cd to ec6ef9d Compare January 5, 2022 21:38

bwagner5 approved these changes Jan 5, 2022

View reviewed changes

ellistarn approved these changes Jan 5, 2022

View reviewed changes

felix-zhe-huang merged commit dd34ef7 into aws:main Jan 6, 2022

felix-zhe-huang mentioned this pull request Jan 6, 2022

Binpacking algorithm incorrectly selects instance type with insufficient allocatable memory #1034

Closed

felix-zhe-huang deleted the issue1034 branch January 11, 2022 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the insufficient memory instance type bug #1080

Fix the insufficient memory instance type bug #1080

felix-zhe-huang commented Jan 4, 2022

netlify bot commented Jan 4, 2022 •

edited

Loading

ellistarn left a comment

felix-zhe-huang commented Jan 5, 2022

ellistarn Jan 5, 2022

felix-zhe-huang Jan 5, 2022

bwagner5 Jan 5, 2022

bwagner5 Jan 5, 2022

bwagner5 Jan 5, 2022

ellistarn left a comment

bwagner5 left a comment

Fix the insufficient memory instance type bug #1080

Fix the insufficient memory instance type bug #1080

Conversation

felix-zhe-huang commented Jan 4, 2022

netlify bot commented Jan 4, 2022 • edited Loading

ellistarn left a comment

Choose a reason for hiding this comment

felix-zhe-huang commented Jan 5, 2022

ellistarn Jan 5, 2022

Choose a reason for hiding this comment

felix-zhe-huang Jan 5, 2022

Choose a reason for hiding this comment

bwagner5 Jan 5, 2022

Choose a reason for hiding this comment

bwagner5 Jan 5, 2022

Choose a reason for hiding this comment

bwagner5 Jan 5, 2022

Choose a reason for hiding this comment

ellistarn left a comment

Choose a reason for hiding this comment

bwagner5 left a comment

Choose a reason for hiding this comment

netlify bot commented Jan 4, 2022 •

edited

Loading