Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce default status wait timeout #575

Merged
merged 5 commits into from
Oct 8, 2021
Merged

Conversation

tklauser
Copy link
Member

@tklauser tklauser commented Oct 7, 2021

First two commits are cleanup/refactoring.

The third commit fixes the Cilium status check before restarting unmanaged pods and also avoids checking the status again in case --wait (this is the default) was specified and Cilium is known to be ready.

Commits four and five unify and reduce the default timeout for the status wait operations, following up on review comments in #564 (comment)

See individual commit messages for details.

This was introduced in commit 8c6b059 ("clustermesh: Extend status
command with connectivity status") but never used.

Signed-off-by: Tobias Klauser <[email protected]>
Use the existing defaults.WaitRetryInterval (2 seconds) instead of
locally duplicating it.

Signed-off-by: Tobias Klauser <[email protected]>
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! One minor (non-blocking) nit

install/install.go Outdated Show resolved Hide resolved
…naged pods

In case the `--wait` option is set, we already wait for Cilium to become
ready. Re-use that status check for the case where this is needed before
restarting unmanaged pods. Note that this will also wait for the
operator to become ready, which wasn't the case previously when invoked
with `--wait=false --restart-unmanaged-pods=true`. As Sebastian points
out however, the agent will likely implicitly wait for the operator to
become ready before becoming ready itself.

Signed-off-by: Tobias Klauser <[email protected]>
Unify the maximum duration to wait for status in a single constant and
use it across the commands. This is in preparation for adjusting the
wait duration.

Signed-off-by: Tobias Klauser <[email protected]>
Checking recent CI runs [1], `cilium install` and `cilium hubble enable`
both took no longer than ~1min. As suggested by Joe [2], reduce the
default status timeout to 5 minutes which should still be plenty for the
normal case and if something gets stuck e.g. in CI we're not randomly
waiting 15 minutes until eventually timing out.

[1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++
[2] #564 (comment)

Suggested-by: Joe Stringer <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>
@tklauser tklauser force-pushed the pr/tklauser/wait-timeout-reduce branch from a0ca958 to 365b165 Compare October 7, 2021 12:41
@tklauser tklauser temporarily deployed to ci October 7, 2021 12:41 Inactive
@tklauser tklauser merged commit bcc992b into master Oct 8, 2021
@tklauser tklauser deleted the pr/tklauser/wait-timeout-reduce branch October 8, 2021 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants