-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stateOK
reported in processState.getState()
before all components have sent updates
#11065
Labels
Comments
Started working on a fix in 35806dd but couldn't quickly figure out how to get the number of components given a |
vitorenesduarte
pushed a commit
that referenced
this issue
Apr 7, 2022
…#11249) Fixes #11065. This commit: - ensures that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready - changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`) - uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065 Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that: 1. if `componentCount` is lower than it should, then the service fails to start (due to #11725) 2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`.
vitorenesduarte
pushed a commit
that referenced
this issue
Apr 7, 2022
…#11249) Fixes #11065. This commit: - ensures that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready - changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`) - uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065 Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that: 1. if `componentCount` is lower than it should, then the service fails to start (due to #11725) 2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`.
vitorenesduarte
pushed a commit
that referenced
this issue
Apr 7, 2022
…#11249) Fixes #11065. This commit: - ensures that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready - changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`) - uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065 Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that: 1. if `componentCount` is lower than it should, then the service fails to start (due to #11725) 2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`.
vitorenesduarte
pushed a commit
that referenced
this issue
Apr 8, 2022
* Throw startup error if `TeleportReadyEvent` is not emitted (#11725) * Throw startup error if `TeleportReadyEvent` is not emitted Before this commit, the `TeleportReadyEvent` was only waited for when a process reload occurred. Thus, if a bug exists in the code that emits this event (as it's currently the case since the `MetricsReady` and `WindowsDesktopReady` events are never emitted), such a bug may go unnoticed for a while. This commit ensures that the `TeleportReadyEvent` is always waited for on startup, and throws an error if the event is not emitted (after some timeout). This commit also: - removes the `MetricsReady` event (as this is not produced by a component that sends heartbeats, which is the case of every other event required by the `TeleportReadyEvent` event mapping) - ensures that `WindowsDesktopReady` event is emitted - refactors some of the code in `lib/service/supervisor.go` - moves the event mapping registration to a new `registerTeleportReadyEvent` function * Ensure stateOK is reported only when all components have sent updates (#11249) Fixes #11065. This commit: - ensures that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready - changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`) - uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065 Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that: 1. if `componentCount` is lower than it should, then the service fails to start (due to #11725) 2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`. * Make `PortList.Pop()` thread-safe (#11799)
vitorenesduarte
pushed a commit
that referenced
this issue
Apr 8, 2022
* Throw startup error if `TeleportReadyEvent` is not emitted (#11725) * Throw startup error if `TeleportReadyEvent` is not emitted Before this commit, the `TeleportReadyEvent` was only waited for when a process reload occurred. Thus, if a bug exists in the code that emits this event (as it's currently the case since the `MetricsReady` and `WindowsDesktopReady` events are never emitted), such a bug may go unnoticed for a while. This commit ensures that the `TeleportReadyEvent` is always waited for on startup, and throws an error if the event is not emitted (after some timeout). This commit also: - removes the `MetricsReady` event (as this is not produced by a component that sends heartbeats, which is the case of every other event required by the `TeleportReadyEvent` event mapping) - ensures that `WindowsDesktopReady` event is emitted - refactors some of the code in `lib/service/supervisor.go` - moves the event mapping registration to a new `registerTeleportReadyEvent` function * Ensure stateOK is reported only when all components have sent updates (#11249) Fixes #11065. This commit: - ensures that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready - changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`) - uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065 Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that: 1. if `componentCount` is lower than it should, then the service fails to start (due to #11725) 2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`. * Make `PortList.Pop()` thread-safe (#11799)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
I think that
processState.getState()
can returnstateOK
before it should.In the following piece of code it seems that
stateOK
can be returned as soon as a single component reports their status as ok, not all of them:teleport/lib/service/state.go
Lines 153 to 155 in ccbe161
Replacing
len(f.states) > 0
with something likelen(f.states) == componentsCount
should fix it.This was initially noticed in https://github.com/gravitational/cloud/issues/1432.
The text was updated successfully, but these errors were encountered: