Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for updated pods when waiting on DaemonSet #102

Merged
merged 2 commits into from
May 19, 2021

Conversation

gravesm
Copy link
Member

@gravesm gravesm commented May 11, 2021

SUMMARY

We have been seeing infrequent CI failures recently during the DaemonSet tests in the waiter test suite. The problem can be reliably reproduced by using a container that ignores SIGTERM and forces the replacement cycle to run out the grace period for termination. The logic we're using to determine when a DaemonSet is ready checks whether the number of nodes that should be running the daemon pod is the same as the number nodes currently running a daemon pod. In this case, however, the pod is the old pod that just hasn't been replaced, yet. This change makes sure the number of nodes that should be running the daemon pod is the same as the number of nodes currently running the updated daemon pod.

From what I can tell, this follows the same logic used by kubectl when waiting on a rollout:
https://github.com/kubernetes/kubectl/blob/ac49920c0ccb0dd0899d5300fc43713ee2dfcdc9/pkg/polymorphichelpers/rollout_status.go#L107-L115.

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

k8s

ADDITIONAL INFORMATION

@codecov
Copy link

codecov bot commented May 11, 2021

Codecov Report

Merging #102 (b98fbe8) into main (5856948) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #102   +/-   ##
=======================================
  Coverage   24.02%   24.02%           
=======================================
  Files           1        1           
  Lines         154      154           
  Branches       29       29           
=======================================
  Hits           37       37           
  Misses        112      112           
  Partials        5        5           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5856948...b98fbe8. Read the comment docs.

Copy link
Member

@Akasurde Akasurde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@abikouo abikouo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

gravesm added 2 commits May 18, 2021 15:36
The exising logic that's used to determine when a DaemonSet is ready
fails to account for the fact that a RollingUpdate first kills the pod
and then creates a new one. Simply checking if the
desiredNumberScheduled equals the numberReady will succeed in cases
when the old pod takes time to shut down, and would report that the new
Deployment is ready despite the fact that the old pod has not been
replaced, yet.
@gravesm gravesm merged commit 0bbc9ca into ansible-collections:main May 19, 2021
@gravesm gravesm deleted the daemonset-wait branch May 19, 2021 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants