Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait correctly for Cilium to be ready before deploying Hubble Relay #564

Merged
merged 6 commits into from
Oct 5, 2021

Conversation

tklauser
Copy link
Member

@tklauser tklauser commented Oct 1, 2021

Like on install and upgrade, use a K8sStatusCollector to wait for Cilium
to be fully ready before deploying Hubble Relay. Moreover, report the
status in case Cilium didn't become ready within the given timeout.

The first commit refactors some common code in preparation for the second commit, which implements the actual functionality. The third commit changes cilium hubble enable to - by default - wait for the Hubble deployments to be ready (this behavior can be disabled using --wait=false as with other commands). The fourth commit drops the now superfluous cilium status --wait commands after cilium hubble enable from CI.

See individual commit messages for details.

Fixes #164
Fixes #389

@tklauser tklauser requested a review from gandro October 1, 2021 13:53
@tklauser tklauser requested review from a team as code owners October 1, 2021 13:53
@tklauser tklauser requested review from a team, nebril and michi-covalent October 1, 2021 13:53
@tklauser tklauser requested a review from joestringer October 1, 2021 13:53
@tklauser tklauser temporarily deployed to ci October 1, 2021 13:54 Inactive
@tklauser
Copy link
Member Author

tklauser commented Oct 1, 2021

Like on install and upgrade, use a K8sStatusCollector to wait for Cilium to be fully ready before deploying Hubble Relay. Moreover, report the status in case Cilium didn't become ready within the given timeout.

The first commit refactors some common code in preparation for the second commit, which implements the actual functionality. The third commit drops the now superfluous cilium status --wait commands after cilium hubble enable from CI.

This is not yet enough to drop cilium status --wait from CI (3rd commit). We also need to wait for the Hubble Relay Deployment to be ready after cilium hubble enable. I'll add that check.

@tklauser tklauser force-pushed the pr/tklauser/hubble-enable-wait-for-cilium branch from 1957c15 to d78fea8 Compare October 1, 2021 14:21
@tklauser tklauser temporarily deployed to ci October 1, 2021 14:21 Inactive
@tklauser
Copy link
Member Author

tklauser commented Oct 1, 2021

Like on install and upgrade, use a K8sStatusCollector to wait for Cilium to be fully ready before deploying Hubble Relay. Moreover, report the status in case Cilium didn't become ready within the given timeout.
The first commit refactors some common code in preparation for the second commit, which implements the actual functionality. The third commit drops the now superfluous cilium status --wait commands after cilium hubble enable from CI.

This is not yet enough to drop cilium status --wait from CI (3rd commit). We also need to wait for the Hubble Relay Deployment to be ready after cilium hubble enable. I'll add that check.

Fixed in what now is the third commit:

hubble: wait for deployments to be ready after enabling them

By default, wait for the Relay (and UI, if `--ui` is specified)
deployments to be ready as part of enabling Hubble. This behavior can be
disabled by setting `--wait=false`.

In combination with the previous commit this allows to avoid invoking
`cilium status --wait` after enabling Hubble.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just one specific nit around user-facing flags / wait timers below.

internal/cli/cmd/hubble.go Show resolved Hide resolved
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me! Thanks! One minor question

install/install.go Outdated Show resolved Hide resolved
Handle the nil check in (*Status).Format instead of having to check it
at every call site. Also make sure to not print an empty line in case
(*Status).Format returns an empty result string.

Signed-off-by: Tobias Klauser <[email protected]>
… Relay

Like on install and upgrade, use a K8sStatusCollector to wait for Cilium
to be fully ready before deploying Hubble Relay. Moreover, report the
status in case Cilium didn't become ready within the given timeout.

Fixes #389

Signed-off-by: Tobias Klauser <[email protected]>
@tklauser tklauser force-pushed the pr/tklauser/hubble-enable-wait-for-cilium branch from d78fea8 to 5faaf80 Compare October 4, 2021 11:22
@tklauser tklauser temporarily deployed to ci October 4, 2021 11:23 Inactive
@tklauser tklauser requested a review from joestringer October 4, 2021 11:23
By default, wait for the Relay (and UI, if `--ui` is specified)
deployments to be ready as part of enabling Hubble. This behavior can be
disabled by setting `--wait=false`.

In combination with the previous commit this allows to avoid invoking
`cilium status --wait` after enabling Hubble.

Fixes #164

Signed-off-by: Tobias Klauser <[email protected]>
With the introduction of the `--wait-duration` flag in the previous
commit, we now have two flags specifying a timeout to wait for a
particular aspect of the `hubble enable` command. As Joe points out,
this makes it increasingly difficult for users to understand or use
properly. To simplify the user experience, deprecate
`cilium-ready-timeout` and make `wait-duration` the total timeout to
wait for Hubble to be enabled. For backwards compatibility
`cilium-ready-timeout` will be retained until release 0.9.3 as an alias
for `wait-duration`.

Signed-off-by: Tobias Klauser <[email protected]>
Now that `cilium hubble enable` correctly waits for Cilium and Hubble to
be ready, there is no need anymore for separate `cilium status --wait`
calls after enabling Hubble.

Signed-off-by: Tobias Klauser <[email protected]>
@tklauser tklauser force-pushed the pr/tklauser/hubble-enable-wait-for-cilium branch from 3bae2d5 to 51f2101 Compare October 4, 2021 12:26
@tklauser tklauser temporarily deployed to ci October 4, 2021 12:26 Inactive
@@ -318,7 +318,7 @@ func (k *K8sHubble) Enable(ctx context.Context) error {
collector, err := status.NewK8sStatusCollector(ctx, k.client, status.K8sStatusParameters{
Namespace: k.params.Namespace,
Wait: true,
WaitDuration: k.params.WaitDuration,
WaitDuration: k.params.WaitDuration - dur,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to think about whether it makes sense to calculate & pass down timelines like this or just configure the contexts such that the timeouts are implicitly handed down & account for the entire processing time, but I'm fine with this solution for now 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It looks like the (*K8sStatusCollector).Status configures the timeout using context.WithTimeout, so we would probably need to refactor a bit such that the context with timeout is passed at each call site and omit the WaitDuration parameter altogether.

I'll look into that and send a follow-up PR.

internal/cli/cmd/hubble.go Show resolved Hide resolved
@tklauser tklauser temporarily deployed to ci October 5, 2021 07:46 Inactive
@tklauser tklauser force-pushed the pr/tklauser/hubble-enable-wait-for-cilium branch from 580009b to bd56d99 Compare October 5, 2021 07:50
@tklauser tklauser temporarily deployed to ci October 5, 2021 07:50 Inactive
@tklauser tklauser merged commit 49a3064 into master Oct 5, 2021
@tklauser tklauser deleted the pr/tklauser/hubble-enable-wait-for-cilium branch October 5, 2021 08:33
tklauser added a commit that referenced this pull request Oct 7, 2021
Checking recent CI runs [1], `cilium install` and `cilium hubble enable`
both took no longer than ~1min. As suggested by Joe [2], reduce the
default status timeout to 5 minutes which should still be plenty for the
normal case and if something gets stuck e.g. in CI we're not randomly
waiting 15 minutes until eventually timing out.

[1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++
[2] #564 (comment)

Suggested-by: Joe Stringer <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>
tklauser added a commit that referenced this pull request Oct 7, 2021
Checking recent CI runs [1], `cilium install` and `cilium hubble enable`
both took no longer than ~1min. As suggested by Joe [2], reduce the
default status timeout to 5 minutes which should still be plenty for the
normal case and if something gets stuck e.g. in CI we're not randomly
waiting 15 minutes until eventually timing out.

[1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++
[2] #564 (comment)

Suggested-by: Joe Stringer <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>
tklauser added a commit that referenced this pull request Oct 8, 2021
Checking recent CI runs [1], `cilium install` and `cilium hubble enable`
both took no longer than ~1min. As suggested by Joe [2], reduce the
default status timeout to 5 minutes which should still be plenty for the
normal case and if something gets stuck e.g. in CI we're not randomly
waiting 15 minutes until eventually timing out.

[1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++
[2] #564 (comment)

Suggested-by: Joe Stringer <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>
aditighag pushed a commit to aditighag/cilium-cli that referenced this pull request Apr 21, 2023
Checking recent CI runs [1], `cilium install` and `cilium hubble enable`
both took no longer than ~1min. As suggested by Joe [2], reduce the
default status timeout to 5 minutes which should still be plenty for the
normal case and if something gets stuck e.g. in CI we're not randomly
waiting 15 minutes until eventually timing out.

[1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++
[2] cilium#564 (comment)

Suggested-by: Joe Stringer <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>
michi-covalent pushed a commit to michi-covalent/cilium that referenced this pull request May 30, 2023
Checking recent CI runs [1], `cilium install` and `cilium hubble enable`
both took no longer than ~1min. As suggested by Joe [2], reduce the
default status timeout to 5 minutes which should still be plenty for the
normal case and if something gets stuck e.g. in CI we're not randomly
waiting 15 minutes until eventually timing out.

[1] https://github.com/cilium/cilium-cli/actions?query=branch%3Amaster+event%3Aschedule++
[2] cilium/cilium-cli#564 (comment)

Suggested-by: Joe Stringer <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants