Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw startup error if TeleportReadyEvent is not emitted #11725

Merged
merged 9 commits into from
Apr 6, 2022

Conversation

vitorenesduarte
Copy link
Contributor

@vitorenesduarte vitorenesduarte commented Apr 5, 2022

Fixes #11724.

Before this PR, the TeleportReadyEvent was only waited for when a process reload occurred. Thus, if a bug exists in the code that emits this event (as it's currently the case since the MetricsReady and WindowsDesktopReady events are never emitted - #11724), such a bug may go unnoticed for a while.

This PR ensures that the TeleportReadyEvent is always waited for on startup, and throws an error if the event is not emitted (after some timeout).

This PR also:

  • removes the MetricsReady event (as this is not produced by a component that sends heartbeats, which is the case of every other event required by the TeleportReadyEvent event mapping)
  • ensures that WindowsDesktopReady event is emitted
  • refactors some of the code in lib/service/supervisor.go
  • moves the event mapping registration to a new registerTeleportReadyEvent function

Testing

None.

Before this commit, the `TeleportReadyEvent` was only waited for when a
process reload occurred. Thus, if a bug exists in the code that emits
this event (as it's currently the case since the `MetricsReady` and
`WindowsDesktopReady` events are never emitted), such a bug may go
unnoticed for a while.

This commit ensures that the `TeleportReadyEvent` is always waited for
on startup, and throws an error if the event is not emitted (after some
timeout).

This commit also:
- removes the `MetricsReady` event (as this is not produced by a
  component that sends heartbeats, which is the case of every other
  event required by the `TeleportReadyEvent` event mapping)
- ensures that `WindowsDesktopReady` event is emitted
- refactors some of the code in `lib/service/supervisor.go`
@vitorenesduarte vitorenesduarte force-pushed the vitor/teleport-ready-event branch from 72f4452 to 1c76b73 Compare April 5, 2022 13:04
Copy link
Contributor

@espadolini espadolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to find objections to not waiting for the new process to emit TeleportReady before shutting down the old one but I couldn't find any

lib/service/supervisor.go Outdated Show resolved Hide resolved
lib/service/supervisor.go Outdated Show resolved Hide resolved
Comment on lines 513 to 524
// Wait for the service to report that it has started.
startTimeoutCtx, startCancel := context.WithTimeout(ctx, signalPipeTimeout)
defer startCancel()
eventC := make(chan Event, 1)
srv.WaitForEvent(startTimeoutCtx, TeleportReadyEvent, eventC)
select {
case <-eventC:
cfg.Log.Infof("Service has started successfully.")
case <-startTimeoutCtx.Done():
warnOnErr(srv.Close(), cfg.Log)
return trace.BadParameter("service has failed to start")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks that waitAndReload logic already waits for TeleportReadyEvent so after exiting waitAndReload the logic triggers second wait for TeleportReadyEvent event but it the TeleportReadyEvent event was ready broadcasted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. In a way this is ok since an already broadcast event will still be emitted if waited on, but there might be a better alternative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this PR remove that WaitForEvent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that a similar situation would already occur if two process reloads were triggered.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this PR remove that WaitForEvent?

You're right. Sorry @smallinsky @espadolini, I misunderstood what waitAndReload was doing.

I pushed a new change (f73af18) to ensure that we always for the the TeleportReadyEvent after calling LocalSupervisor.Start() (both on the initial startup, and after a process reload), reverting the changes to waitAndReload.

espadolini
espadolini previously approved these changes Apr 5, 2022
@espadolini espadolini self-requested a review April 6, 2022 13:23
@espadolini espadolini dismissed their stale review April 6, 2022 13:24

significant changes

@vitorenesduarte
Copy link
Contributor Author

CI seems to be passing now. Could you please take another look @espadolini?

@vitorenesduarte vitorenesduarte merged commit 933e247 into master Apr 6, 2022
@vitorenesduarte vitorenesduarte deleted the vitor/teleport-ready-event branch April 6, 2022 15:10
vitorenesduarte pushed a commit that referenced this pull request Apr 7, 2022
…#11249)

Fixes #11065.

This commit:
- ensures  that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready
- changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`)
- uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065

Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that:
1. if `componentCount` is lower than it should, then the service fails to start (due to #11725)
2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`.
vitorenesduarte pushed a commit that referenced this pull request Apr 7, 2022
* Throw startup error if `TeleportReadyEvent` is not emitted

Before this commit, the `TeleportReadyEvent` was only waited for when a
process reload occurred. Thus, if a bug exists in the code that emits
this event (as it's currently the case since the `MetricsReady` and
`WindowsDesktopReady` events are never emitted), such a bug may go
unnoticed for a while.

This commit ensures that the `TeleportReadyEvent` is always waited for
on startup, and throws an error if the event is not emitted (after some
timeout).

This commit also:
- removes the `MetricsReady` event (as this is not produced by a
  component that sends heartbeats, which is the case of every other
  event required by the `TeleportReadyEvent` event mapping)
- ensures that `WindowsDesktopReady` event is emitted
- refactors some of the code in `lib/service/supervisor.go`
- moves the event mapping registration to a new `registerTeleportReadyEvent` function
vitorenesduarte pushed a commit that referenced this pull request Apr 7, 2022
…#11249)

Fixes #11065.

This commit:
- ensures  that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready
- changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`)
- uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065

Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that:
1. if `componentCount` is lower than it should, then the service fails to start (due to #11725)
2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`.
vitorenesduarte pushed a commit that referenced this pull request Apr 7, 2022
* Throw startup error if `TeleportReadyEvent` is not emitted

Before this commit, the `TeleportReadyEvent` was only waited for when a
process reload occurred. Thus, if a bug exists in the code that emits
this event (as it's currently the case since the `MetricsReady` and
`WindowsDesktopReady` events are never emitted), such a bug may go
unnoticed for a while.

This commit ensures that the `TeleportReadyEvent` is always waited for
on startup, and throws an error if the event is not emitted (after some
timeout).

This commit also:
- removes the `MetricsReady` event (as this is not produced by a
  component that sends heartbeats, which is the case of every other
  event required by the `TeleportReadyEvent` event mapping)
- ensures that `WindowsDesktopReady` event is emitted
- refactors some of the code in `lib/service/supervisor.go`
- moves the event mapping registration to a new `registerTeleportReadyEvent` function
vitorenesduarte pushed a commit that referenced this pull request Apr 7, 2022
…#11249)

Fixes #11065.

This commit:
- ensures  that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready
- changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`)
- uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065

Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that:
1. if `componentCount` is lower than it should, then the service fails to start (due to #11725)
2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`.
vitorenesduarte pushed a commit that referenced this pull request Apr 8, 2022
* Throw startup error if `TeleportReadyEvent` is not emitted (#11725)

* Throw startup error if `TeleportReadyEvent` is not emitted

Before this commit, the `TeleportReadyEvent` was only waited for when a
process reload occurred. Thus, if a bug exists in the code that emits
this event (as it's currently the case since the `MetricsReady` and
`WindowsDesktopReady` events are never emitted), such a bug may go
unnoticed for a while.

This commit ensures that the `TeleportReadyEvent` is always waited for
on startup, and throws an error if the event is not emitted (after some
timeout).

This commit also:
- removes the `MetricsReady` event (as this is not produced by a
  component that sends heartbeats, which is the case of every other
  event required by the `TeleportReadyEvent` event mapping)
- ensures that `WindowsDesktopReady` event is emitted
- refactors some of the code in `lib/service/supervisor.go`
- moves the event mapping registration to a new `registerTeleportReadyEvent` function

* Ensure stateOK is reported only when all components have sent updates (#11249)

Fixes #11065.

This commit:
- ensures  that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready
- changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`)
- uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065

Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that:
1. if `componentCount` is lower than it should, then the service fails to start (due to #11725)
2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`.

* Make `PortList.Pop()` thread-safe (#11799)
vitorenesduarte pushed a commit that referenced this pull request Apr 8, 2022
* Throw startup error if `TeleportReadyEvent` is not emitted (#11725)

* Throw startup error if `TeleportReadyEvent` is not emitted

Before this commit, the `TeleportReadyEvent` was only waited for when a
process reload occurred. Thus, if a bug exists in the code that emits
this event (as it's currently the case since the `MetricsReady` and
`WindowsDesktopReady` events are never emitted), such a bug may go
unnoticed for a while.

This commit ensures that the `TeleportReadyEvent` is always waited for
on startup, and throws an error if the event is not emitted (after some
timeout).

This commit also:
- removes the `MetricsReady` event (as this is not produced by a
  component that sends heartbeats, which is the case of every other
  event required by the `TeleportReadyEvent` event mapping)
- ensures that `WindowsDesktopReady` event is emitted
- refactors some of the code in `lib/service/supervisor.go`
- moves the event mapping registration to a new `registerTeleportReadyEvent` function

* Ensure stateOK is reported only when all components have sent updates (#11249)

Fixes #11065.

This commit:
- ensures  that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready
- changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`)
- uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065

Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that:
1. if `componentCount` is lower than it should, then the service fails to start (due to #11725)
2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`.

* Make `PortList.Pop()` thread-safe (#11799)
@webvictim webvictim mentioned this pull request Apr 19, 2022
r0mant added a commit that referenced this pull request Apr 26, 2022
r0mant added a commit that referenced this pull request Apr 26, 2022
r0mant added a commit that referenced this pull request Apr 26, 2022
@r0mant r0mant mentioned this pull request Apr 26, 2022
r0mant added a commit that referenced this pull request Apr 26, 2022
…2242)

* Revert "Backport #11725 #11249 #11799 to branch/v8 (#11794)"

This reverts commit 48eedcd.

* Revert "Fix ProxyKube not reporting its readiness (#12153)"

This reverts commit b924b40.
r0mant added a commit that referenced this pull request Apr 26, 2022
* Revert "Make `PortList.Pop()` thread-safe (#11799)"

This reverts commit a17337d.

* Revert "Ensure stateOK is reported only when all components have sent updates (#11249)"

This reverts commit b749302.

* Revert "Throw startup error if `TeleportReadyEvent` is not emitted (#11725)"

This reverts commit 933e247.

* Revert "Fix ProxyKube not reporting its readiness (#12150)"

This reverts commit 6cdcfe7.
r0mant added a commit that referenced this pull request Apr 26, 2022
* Revert "Backport #11725 #11249 #11799 to branch/v9 (#11795)"

This reverts commit ea8ee94.

* Revert "Fix ProxyKube not reporting its readiness (#12152)"

This reverts commit ce301f0.
@webvictim webvictim mentioned this pull request Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MetricsReady and WindowsDesktopReady events are never emitted
4 participants