Reduce default status wait timeout #575

tklauser · 2021-10-07T10:18:15Z

First two commits are cleanup/refactoring.

The third commit fixes the Cilium status check before restarting unmanaged pods and also avoids checking the status again in case --wait (this is the default) was specified and Cilium is known to be ready.

Commits four and five unify and reduce the default timeout for the status wait operations, following up on review comments in #564 (comment)

See individual commit messages for details.

This was introduced in commit 8c6b059 ("clustermesh: Extend status command with connectivity status") but never used. Signed-off-by: Tobias Klauser <[email protected]>

Use the existing defaults.WaitRetryInterval (2 seconds) instead of locally duplicating it. Signed-off-by: Tobias Klauser <[email protected]>

gandro

Looks good overall! One minor (non-blocking) nit

install/install.go

…naged pods In case the `--wait` option is set, we already wait for Cilium to become ready. Re-use that status check for the case where this is needed before restarting unmanaged pods. Note that this will also wait for the operator to become ready, which wasn't the case previously when invoked with `--wait=false --restart-unmanaged-pods=true`. As Sebastian points out however, the agent will likely implicitly wait for the operator to become ready before becoming ready itself. Signed-off-by: Tobias Klauser <[email protected]>

Unify the maximum duration to wait for status in a single constant and use it across the commands. This is in preparation for adjusting the wait duration. Signed-off-by: Tobias Klauser <[email protected]>

Checking recent CI runs [1], `cilium install` and `cilium hubble enable` both took no longer than ~1min. As suggested by Joe [2], reduce the default status timeout to 5 minutes which should still be plenty for the normal case and if something gets stuck e.g. in CI we're not randomly waiting 15 minutes until eventually timing out. [1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++ [2] #564 (comment) Suggested-by: Joe Stringer <[email protected]> Signed-off-by: Tobias Klauser <[email protected]>

tklauser added 2 commits October 7, 2021 12:12

status: remove unused type ClusterConnectivityInfo

6f48311

This was introduced in commit 8c6b059 ("clustermesh: Extend status command with connectivity status") but never used. Signed-off-by: Tobias Klauser <[email protected]>

install, status: use defaults.WaitRetryInterval

eb57526

Use the existing defaults.WaitRetryInterval (2 seconds) instead of locally duplicating it. Signed-off-by: Tobias Klauser <[email protected]>

tklauser requested a review from a team as a code owner October 7, 2021 10:18

tklauser requested review from a team October 7, 2021 10:18

tklauser temporarily deployed to ci October 7, 2021 10:18 Inactive

tklauser requested review from ldelossa and gandro October 7, 2021 10:18

maintainer-s-little-helper bot assigned ldelossa Oct 7, 2021

tklauser requested a review from nathanjsweet October 7, 2021 10:18

maintainer-s-little-helper bot assigned gandro and nathanjsweet Oct 7, 2021

tklauser mentioned this pull request Oct 7, 2021

Wait correctly for Cilium to be ready before deploying Hubble Relay #564

Merged

gandro approved these changes Oct 7, 2021

View reviewed changes

install/install.go Outdated Show resolved Hide resolved

maintainer-s-little-helper bot unassigned gandro Oct 7, 2021

tklauser added 3 commits October 7, 2021 14:37

defaults: introduce StatusWaitDuration

2328fc4

Unify the maximum duration to wait for status in a single constant and use it across the commands. This is in preparation for adjusting the wait duration. Signed-off-by: Tobias Klauser <[email protected]>

tklauser force-pushed the pr/tklauser/wait-timeout-reduce branch from a0ca958 to 365b165 Compare October 7, 2021 12:41

tklauser temporarily deployed to ci October 7, 2021 12:41 Inactive

joestringer approved these changes Oct 7, 2021

View reviewed changes

tklauser merged commit bcc992b into master Oct 8, 2021

tklauser deleted the pr/tklauser/wait-timeout-reduce branch October 8, 2021 11:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce default status wait timeout #575

Reduce default status wait timeout #575

tklauser commented Oct 7, 2021

gandro left a comment

Reduce default status wait timeout #575

Reduce default status wait timeout #575

Conversation

tklauser commented Oct 7, 2021

gandro left a comment

Choose a reason for hiding this comment