-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignores stuck pods rather than deleting them to avoid stateful set edge cases #678
Conversation
✔️ Deploy Preview for karpenter-docs-prod canceled. 🔨 Explore the source changes: 87ce0e4 🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61413727d55dfb0007f5425c |
cc31e47
to
6121faf
Compare
@@ -108,7 +109,7 @@ var _ = Describe("Controller", func() { | |||
Expect(n.DeletionTimestamp.IsZero()).To(BeTrue()) | |||
|
|||
// Simulate time passing | |||
node.Now = func() time.Time { | |||
injectabletime.Now = func() time.Time { | |||
return time.Now().Add(time.Duration(*provisioner.Spec.TTLSecondsUntilExpired) * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seem accurate - it will add potentially add more than that amount of time (granted this may not matter for a particular test), since presumably straight up time.Now()
was called earlier in the test, then some time went by executing stuff, then this is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. This effectively forces time to be 30 seconds into the future (which is the explicit behavior we're trying to test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the previous steps take a few seconds to execute, won't it now be (say) 32 seconds in the future, since we are calling "live" time.Now
?
@@ -66,7 +67,7 @@ var _ = Describe("Controller", func() { | |||
}) | |||
|
|||
AfterEach(func() { | |||
node.Now = time.Now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to use straight up time.Now here or something more deterministic (always return the same value, for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO time should work normally unless we explicitly need to control it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it's causing a problem right now, so not a blocking issue, but in general this will add variability - say the unit test process itself takes longer to schedule (on a heavily loaded machine running a lot of github actions for example), then some of these times might stretch longer. Probably not a big deal here since the code should generally do the right thing if things take "at least N seconds", but could lead to flakiness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work and nice niche catches!
niche caches! |
6121faf
to
87ce0e4
Compare
1. Issue, if available:
2. Description of changes:
@anguslees pointed out that we must allow the kubelet to delete pods that are terminating to avoid violating stateful set guarantees if a kubelet is partitioned. Instead, we simply ignore pods that are past their grace window and delete the node. This ensures the guarantee is met, since the pod will be deleted by KCM once the node no longer exists.
Testing
3. Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.