Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit "provisioner does not exist" logging and fix startup reconciliation bug #517

Merged
merged 5 commits into from
Jul 19, 2021

Conversation

bwagner5
Copy link
Contributor

Issue, if available:
N/A

Description of changes:

  • Limits the number of times the error is logged when a provisioner is not found
  • Fixes a bug where if pods are pending when karpenter starts up, they will be reconciled to any provisioner (previous racing behavior). The matchesProvisioner func was not completely correct when checking the default provisioner.
  • Moves the default provisioner to a reference in the apis pkg.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Previous Log:

karpenter-controller-86cf989bcd-jkqfm manager 2021-07-16T15:41:13.188Z	ERROR	Retrieving provisioner, create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-86cf989bcd-jkqfm manager 2021-07-16T15:41:13.200Z	ERROR	Retrieving provisioner, create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-86cf989bcd-jkqfm manager 2021-07-16T15:41:13.568Z	ERROR	Retrieving provisioner, create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-86cf989bcd-jkqfm manager 2021-07-16T15:41:13.568Z	ERROR	Retrieving provisioner, create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-86cf989bcd-jkqfm manager 2021-07-16T15:41:13.571Z	ERROR	Retrieving provisioner, create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-86cf989bcd-jkqfm manager 2021-07-16T15:41:13.571Z	ERROR	Retrieving provisioner, create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
... 
... # number of pods + 1

New Log (50 pod scale-up w/ no provisioner - 2 times, one per pod batch + 1):

karpenter-controller-6769b9899b-cr99p manager 2021-07-16T18:18:56.455Z	ERROR	No provisioner found. Create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-6769b9899b-cr99p manager 2021-07-16T18:18:59.455Z	ERROR	No provisioner found. Create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name

New Log (1000 pod scale-up w/ no provisioner - prints 7 times, one per pod batch + 1):

karpenter-controller-6769b9899b-r66pp manager 2021-07-16T18:13:35.542Z	ERROR	No provisioner found. Create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-6769b9899b-r66pp manager 2021-07-16T18:13:45.542Z	ERROR	No provisioner found. Create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-6769b9899b-r66pp manager 2021-07-16T18:13:56.542Z	ERROR	No provisioner found. Create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-6769b9899b-r66pp manager 2021-07-16T18:14:07.542Z	ERROR	No provisioner found. Create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-6769b9899b-r66pp manager 2021-07-16T18:14:17.542Z	ERROR	No provisioner found. Create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-6769b9899b-r66pp manager 2021-07-16T18:14:22.541Z	ERROR	No provisioner found. Create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name
karpenter-controller-6769b9899b-r66pp manager 2021-07-16T18:14:24.542Z	ERROR	No provisioner found. Create a default provisioner, or specify an alternative using the nodeSelector karpenter.sh/provisioner-name

Why is it "per pod batch +1" you ask?

The way the batch and work queue function. Karpenter watches for pod updates and adds the associated provisioner to the reconcile request work queue as they come in. So there will be a bunch of the same provisioner for a Deployment scale up. Reconcile will wait for a batch before starting, and then reconcile state for the cluster on the provisioner in the reconcile request. Since Reconcile(..) starts almost immediately (depending on the number of unique provisioners ahead in the queue and the configured reconcile workers which is 4 for karpenter currently), then a pod will be added to the work queue after reconcile has triggered. The work queue implementation won't know if the reconcile included the cluster state in the reconcile request, so it will dequeue one more after the batching ends, and since no provisioner requests are enqueued after that reconcile kicks off, no more will be dequeued.

@bwagner5 bwagner5 requested a review from ellistarn July 16, 2021 18:24
@bwagner5 bwagner5 changed the title limit no provisioner logging and fix startup reconciliation bug [WIP] limit no provisioner logging and fix startup reconciliation bug Jul 16, 2021
if errors.IsNotFound(err) {
// Queue and batch a reconcile request for a non-existent, empty provisioner
// This will reduce the number of repeated error messages about a provisioner not existing
c.Batcher.Add(&v1alpha3.Provisioner{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a nasty hack, but I don't have a better suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

return nil
}
if name == provisioner.Name {
if !ok && v1alpha3.DefaultProvisioner.Name == provisioner.Name {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we invert the predicate it read a bit more smoothly

if ok && provisioner.Name == name
if !ok && provisioner.Name == v1alpha3.DefaultProvisioner.Name

If we could somehow assume that provisioner.Name was always populated, we could even remove the ok bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's worth assuming that. Maybe we could, but feels like something that could be forgotten at some point. IMO the ok isn't too dirty and we should probably just leave it.

@bwagner5 bwagner5 changed the title [WIP] limit no provisioner logging and fix startup reconciliation bug Limit "provisioner does not exist" logging and fix startup reconciliation bug Jul 16, 2021
@bwagner5 bwagner5 requested a review from ellistarn July 16, 2021 22:31
@bwagner5 bwagner5 requested a review from ellistarn July 19, 2021 16:32
Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job going red!

@ellistarn ellistarn merged commit d02f133 into aws:main Jul 19, 2021
@bwagner5 bwagner5 deleted the clean-allocation-logs branch July 19, 2021 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants