Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the insufficient memory instance type bug #1080

Merged
merged 3 commits into from
Jan 6, 2022

Conversation

felix-zhe-huang
Copy link
Contributor

1. Issue, if available:
resolve issue #1034

2. Description of changes:

3. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Jan 4, 2022

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: ec6ef9d

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61d60fc157c7d8000767376c

Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include a test to prevent this from regressing?

@felix-zhe-huang
Copy link
Contributor Author

Can you include a test to prevent this from regressing?

Happy to do so.

@@ -79,3 +79,17 @@ func GetControllerName(ctx context.Context) string {
}
return name.(string)
}

type testInstanceKey struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This injection pattern is useful to keep interfaces in production code clean. I'm wary of coupling test implementation details to this injection package. There are some other examples of injecting data into the EC2 fake. Happy to help offline if you need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this may not be ideal. I am removing it and implements a function to switch on and off those special instance types instead.

@felix-zhe-huang felix-zhe-huang force-pushed the issue1034 branch 2 times, most recently from 1b8e510 to ce043cd Compare January 5, 2022 20:32
test.PodOptions{
ResourceRequirements: v1.ResourceRequirements{Requests: v1.ResourceList{v1.ResourceCPU: resource.MustParse("3"), v1.ResourceMemory: resource.MustParse("3Gi")}},
},
))[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the output is if ExpectedProvisioned returns 0 pods, but I'm guessing it's a not very graceful crash. Would it be more informative to the user to add an explicit:

Expect(len(pods)).To(Equal(1))
node := ExpectScheduled(ctx, env.Client, pod[0])

@@ -39,6 +39,17 @@ func (c *CloudProvider) Create(_ context.Context, constraints *v1alpha5.Constrai
for i := 0; i < quantity; i++ {
name := strings.ToLower(randomdata.SillyName())
instance := instanceTypes[0]
// To create error test cases, give preferences to the illy constructed instance types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be confused looking at just the tests on how the bad instance types are guaranteed to cause a test failure. I think it would be more apparent if the fake cloudprovider was more generic and the setup of a bad set of instance types is set in the test.

You could use the exported InstanceTypes field in the CloudProvider struct to set a specific set of bad instance types that would be apparent in the actual test that it is guaranteed to fail if the binpacking short circuit logic is not working. Wdyt?

},
))[0]
node := ExpectScheduled(ctx, env.Client, pod)
Expect(*node.Status.Allocatable.Cpu()).To(Equal(resource.MustParse("4")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit fragile if we change the fake instance type data, the functionality of not choosing an instance type may still work properly but the test could fail if the fake instance type doesn't have 4 vcpus and 4 GBs of allocatable RAM.

Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. Testing in this way is fairly misleading, and is better done in real CI w/ chaos testing. Let's cut an issue for chaos testing of pod resource requests in a real CI environment and merge this with just the code change.

Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants