-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait correctly for Cilium to be ready before deploying Hubble Relay #564
Conversation
This is not yet enough to drop |
1957c15
to
d78fea8
Compare
Fixed in what now is the third commit:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just one specific nit around user-facing flags / wait timers below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me! Thanks! One minor question
Handle the nil check in (*Status).Format instead of having to check it at every call site. Also make sure to not print an empty line in case (*Status).Format returns an empty result string. Signed-off-by: Tobias Klauser <[email protected]>
… Relay Like on install and upgrade, use a K8sStatusCollector to wait for Cilium to be fully ready before deploying Hubble Relay. Moreover, report the status in case Cilium didn't become ready within the given timeout. Fixes #389 Signed-off-by: Tobias Klauser <[email protected]>
d78fea8
to
5faaf80
Compare
By default, wait for the Relay (and UI, if `--ui` is specified) deployments to be ready as part of enabling Hubble. This behavior can be disabled by setting `--wait=false`. In combination with the previous commit this allows to avoid invoking `cilium status --wait` after enabling Hubble. Fixes #164 Signed-off-by: Tobias Klauser <[email protected]>
With the introduction of the `--wait-duration` flag in the previous commit, we now have two flags specifying a timeout to wait for a particular aspect of the `hubble enable` command. As Joe points out, this makes it increasingly difficult for users to understand or use properly. To simplify the user experience, deprecate `cilium-ready-timeout` and make `wait-duration` the total timeout to wait for Hubble to be enabled. For backwards compatibility `cilium-ready-timeout` will be retained until release 0.9.3 as an alias for `wait-duration`. Signed-off-by: Tobias Klauser <[email protected]>
Now that `cilium hubble enable` correctly waits for Cilium and Hubble to be ready, there is no need anymore for separate `cilium status --wait` calls after enabling Hubble. Signed-off-by: Tobias Klauser <[email protected]>
All CI passed: GKE: https://github.com/cilium/cilium-cli/actions/runs/1303339705 Removing |
3bae2d5
to
51f2101
Compare
@@ -318,7 +318,7 @@ func (k *K8sHubble) Enable(ctx context.Context) error { | |||
collector, err := status.NewK8sStatusCollector(ctx, k.client, status.K8sStatusParameters{ | |||
Namespace: k.params.Namespace, | |||
Wait: true, | |||
WaitDuration: k.params.WaitDuration, | |||
WaitDuration: k.params.WaitDuration - dur, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to think about whether it makes sense to calculate & pass down timelines like this or just configure the contexts such that the timeouts are implicitly handed down & account for the entire processing time, but I'm fine with this solution for now 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It looks like the (*K8sStatusCollector).Status
configures the timeout using context.WithTimeout
, so we would probably need to refactor a bit such that the context with timeout is passed at each call site and omit the WaitDuration
parameter altogether.
I'll look into that and send a follow-up PR.
Signed-off-by: Tobias Klauser <[email protected]>
580009b
to
bd56d99
Compare
Checking recent CI runs [1], `cilium install` and `cilium hubble enable` both took no longer than ~1min. As suggested by Joe [2], reduce the default status timeout to 5 minutes which should still be plenty for the normal case and if something gets stuck e.g. in CI we're not randomly waiting 15 minutes until eventually timing out. [1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++ [2] #564 (comment) Suggested-by: Joe Stringer <[email protected]> Signed-off-by: Tobias Klauser <[email protected]>
Checking recent CI runs [1], `cilium install` and `cilium hubble enable` both took no longer than ~1min. As suggested by Joe [2], reduce the default status timeout to 5 minutes which should still be plenty for the normal case and if something gets stuck e.g. in CI we're not randomly waiting 15 minutes until eventually timing out. [1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++ [2] #564 (comment) Suggested-by: Joe Stringer <[email protected]> Signed-off-by: Tobias Klauser <[email protected]>
Checking recent CI runs [1], `cilium install` and `cilium hubble enable` both took no longer than ~1min. As suggested by Joe [2], reduce the default status timeout to 5 minutes which should still be plenty for the normal case and if something gets stuck e.g. in CI we're not randomly waiting 15 minutes until eventually timing out. [1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++ [2] #564 (comment) Suggested-by: Joe Stringer <[email protected]> Signed-off-by: Tobias Klauser <[email protected]>
Checking recent CI runs [1], `cilium install` and `cilium hubble enable` both took no longer than ~1min. As suggested by Joe [2], reduce the default status timeout to 5 minutes which should still be plenty for the normal case and if something gets stuck e.g. in CI we're not randomly waiting 15 minutes until eventually timing out. [1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++ [2] cilium#564 (comment) Suggested-by: Joe Stringer <[email protected]> Signed-off-by: Tobias Klauser <[email protected]>
Checking recent CI runs [1], `cilium install` and `cilium hubble enable` both took no longer than ~1min. As suggested by Joe [2], reduce the default status timeout to 5 minutes which should still be plenty for the normal case and if something gets stuck e.g. in CI we're not randomly waiting 15 minutes until eventually timing out. [1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++ [2] cilium/cilium-cli#564 (comment) Suggested-by: Joe Stringer <[email protected]> Signed-off-by: Tobias Klauser <[email protected]>
Like on install and upgrade, use a
K8sStatusCollector
to wait for Ciliumto be fully ready before deploying Hubble Relay. Moreover, report the
status in case Cilium didn't become ready within the given timeout.
The first commit refactors some common code in preparation for the second commit, which implements the actual functionality. The third commit changes
cilium hubble enable
to - by default - wait for the Hubble deployments to be ready (this behavior can be disabled using--wait=false
as with other commands). The fourth commit drops the now superfluouscilium status --wait
commands aftercilium hubble enable
from CI.See individual commit messages for details.
Fixes #164
Fixes #389