Account for updated pods when waiting on DaemonSet #102

gravesm · 2021-05-11T19:19:20Z

SUMMARY

We have been seeing infrequent CI failures recently during the DaemonSet tests in the waiter test suite. The problem can be reliably reproduced by using a container that ignores SIGTERM and forces the replacement cycle to run out the grace period for termination. The logic we're using to determine when a DaemonSet is ready checks whether the number of nodes that should be running the daemon pod is the same as the number nodes currently running a daemon pod. In this case, however, the pod is the old pod that just hasn't been replaced, yet. This change makes sure the number of nodes that should be running the daemon pod is the same as the number of nodes currently running the updated daemon pod.

From what I can tell, this follows the same logic used by kubectl when waiting on a rollout:
https://github.com/kubernetes/kubectl/blob/ac49920c0ccb0dd0899d5300fc43713ee2dfcdc9/pkg/polymorphichelpers/rollout_status.go#L107-L115.

ISSUE TYPE

Bugfix Pull Request

COMPONENT NAME

k8s

ADDITIONAL INFORMATION

codecov · 2021-05-11T19:23:01Z

Codecov Report

Merging #102 (b98fbe8) into main (5856948) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #102   +/-   ##
=======================================
  Coverage   24.02%   24.02%           
=======================================
  Files           1        1           
  Lines         154      154           
  Branches       29       29           
=======================================
  Hits           37       37           
  Misses        112      112           
  Partials        5        5

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5856948...b98fbe8. Read the comment docs.

Akasurde

LGTM

abikouo

LGTM

The exising logic that's used to determine when a DaemonSet is ready fails to account for the fact that a RollingUpdate first kills the pod and then creates a new one. Simply checking if the desiredNumberScheduled equals the numberReady will succeed in cases when the old pod takes time to shut down, and would report that the new Deployment is ready despite the fact that the old pod has not been replaced, yet.

Akasurde approved these changes May 17, 2021

View reviewed changes

abikouo approved these changes May 17, 2021

View reviewed changes

gravesm force-pushed the daemonset-wait branch from aec8268 to 8ec004e Compare May 17, 2021 20:55

gravesm added 2 commits May 18, 2021 15:36

Add changelog fragment

b98fbe8

gravesm force-pushed the daemonset-wait branch from 8ec004e to b98fbe8 Compare May 18, 2021 19:36

gravesm merged commit 0bbc9ca into ansible-collections:main May 19, 2021

gravesm deleted the daemonset-wait branch May 19, 2021 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for updated pods when waiting on DaemonSet #102

Account for updated pods when waiting on DaemonSet #102

gravesm commented May 11, 2021

codecov bot commented May 11, 2021 •

edited

Loading

Akasurde left a comment

abikouo left a comment

Account for updated pods when waiting on DaemonSet #102

Account for updated pods when waiting on DaemonSet #102

Conversation

gravesm commented May 11, 2021

SUMMARY

ISSUE TYPE

COMPONENT NAME

ADDITIONAL INFORMATION

codecov bot commented May 11, 2021 • edited Loading

Codecov Report

Akasurde left a comment

Choose a reason for hiding this comment

abikouo left a comment

Choose a reason for hiding this comment

codecov bot commented May 11, 2021 •

edited

Loading