Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: clarify /healthz and /readyz #11085

Merged
merged 5 commits into from
Mar 17, 2022
Merged

docs: clarify /healthz and /readyz #11085

merged 5 commits into from
Mar 17, 2022

Conversation

zmb3
Copy link
Collaborator

@zmb3 zmb3 commented Mar 11, 2022

Backports required:

  • v7
  • v8
  • v9

Updates #10799

@@ -15,27 +15,9 @@ run with verbose logging enabled by passing it `-d` flag.
It is not recommended to run Teleport in production with verbose logging as it generates a substantial amount of data.
</Admonition>

Sometimes you may want to reset [`teleport`](../reference/cli.mdx#teleport) to a clean
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this section, as it was a duplicate of whats in metrics.mdx.

IMO, the troubleshooting page should be focused on "things aren't working, what steps can I take to fix them" and metrics is more focused on "how can I monitor to ensure things are operating correctly and detect when things start to go wrong."

@zmb3 zmb3 removed the request for review from fheinecke March 11, 2022 22:42
@zmb3 zmb3 force-pushed the zmb3/docs-readyz branch 2 times, most recently from 5765b2b to 0a199b1 Compare March 11, 2022 22:50
docs/pages/setup/reference/metrics.mdx Outdated Show resolved Hide resolved
@zmb3 zmb3 force-pushed the zmb3/docs-readyz branch from 0a199b1 to b1a54c6 Compare March 14, 2022 21:00
Copy link
Contributor

@ptgott ptgott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the only docs page that mentions the heartbeat, I've recommend separating the heartbeat discussion into a subsection below the "/readyz" section to provide the authoritative description of it.

Is the heartbeat used for more than diagnostics/telemetry (e.g., load balancing)? If it is, it might make sense to talk about the heartbeat in its own page of the docs (e.g., in the Architecture section).

What do you think?


- HTTP 200 OK: Teleport is operating normally
- HTTP 503 Service Unavailable: Teleport has encountered a connection error and
is running in a degraded state. This happens when a Teleport heartbeat fails.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since (as far as I can tell) this is the first time we discuss the heartbeat in the docs, should we introduce the heartbeat at the beginning of the "/readyz" section? E.g., which component sends heartbeat traffic to other components and what port and protocol the heartbeat uses.

We cover some of this in the "Recovery" Admonition, but I think it makes sense to introduce this before we first mention the status object.

The same state information is also available via the `process_state` metric
under the `/metrics` endpoint.

<Admonition type="note" title="Recovery">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add an H4 section called "Heartbeat" (or similar) to the top of the "/readyz" section, then move this text to that H4 section. Then we can add another H4, "Status object" (or similar), which contains the information in "/readyz" that isn't in this Admonition.

I think it would be easier this way for a reader to get familiar with the heartbeat before reading about the status object.

@zmb3
Copy link
Collaborator Author

zmb3 commented Mar 15, 2022

Since this is the only docs page that mentions the heartbeat, I've recommend separating the heartbeat discussion into a subsection below the "/readyz" section to provide the authoritative description of it.

Is the heartbeat used for more than diagnostics/telemetry (e.g., load balancing)? If it is, it might make sense to talk about the heartbeat in its own page of the docs (e.g., in the Architecture section).

What do you think?

If we want to introduce heartbeats separately, it should probably be a separate page, as they're somewhat orthogonal to metrics.

Do you think there's a minimal edit or two we can make here to get this merged and create a separate issue for documenting heartbeats? I'm not a fan of holding up a PR that may provide value now, even if there is still some confusion we need to address.

@ptgott
Copy link
Contributor

ptgott commented Mar 15, 2022

@zmb3 Suggested edits in this PR: #11165

zmb3 and others added 3 commits March 16, 2022 09:35
- Rename the page, since it's about diagnostics rather than metrics
  alone

- Change major section headings to H2s so they apper in the table of
  contents

- Move information about heartbeats and recovery to an H3 so it's
  more visible

- More information about status codes to a separate H3 below the
  H3 re: heartbeats, both for visibility and to help ensure the reader
  knows the relevant information about heartbeats first
@zmb3 zmb3 force-pushed the zmb3/docs-readyz branch from fd82d1b to 52ec087 Compare March 16, 2022 15:35
@zmb3
Copy link
Collaborator Author

zmb3 commented Mar 16, 2022

Ok @ptgott - merged your suggestions, please take another look.

@zmb3 zmb3 enabled auto-merge (squash) March 16, 2022 15:57
@zmb3 zmb3 merged commit 072956e into master Mar 17, 2022
@zmb3 zmb3 deleted the zmb3/docs-readyz branch March 17, 2022 16:46
@zmb3 zmb3 mentioned this pull request Mar 17, 2022
zmb3 added a commit that referenced this pull request Mar 17, 2022
- Rename the page, since it's about diagnostics rather than metrics
  alone
- Change major section headings to H2s so they apper in the table of
  contents
- Move information about heartbeats and recovery to an H3 so it's
  more visible

Updates #10799

Co-authored-by: Paul Gottschling <[email protected]>
zmb3 added a commit that referenced this pull request Mar 17, 2022
- Rename the page, since it's about diagnostics rather than metrics
  alone
- Change major section headings to H2s so they apper in the table of
  contents
- Move information about heartbeats and recovery to an H3 so it's
  more visible

Updates #10799

Co-authored-by: Paul Gottschling <[email protected]>
@zmb3 zmb3 mentioned this pull request Mar 17, 2022
zmb3 added a commit that referenced this pull request Mar 17, 2022
- Rename the page, since it's about diagnostics rather than metrics
  alone
- Change major section headings to H2s so they apper in the table of
  contents
- Move information about heartbeats and recovery to an H3 so it's
  more visible

Updates #10799

Co-authored-by: Paul Gottschling <[email protected]>
@zmb3 zmb3 mentioned this pull request Mar 17, 2022
zmb3 added a commit that referenced this pull request Mar 18, 2022
- Rename the page, since it's about diagnostics rather than metrics
  alone
- Change major section headings to H2s so they apper in the table of
  contents
- Move information about heartbeats and recovery to an H3 so it's
  more visible

Updates #10799

Co-authored-by: Paul Gottschling <[email protected]>

Co-authored-by: Paul Gottschling <[email protected]>
ptgott added a commit that referenced this pull request Mar 18, 2022
- Rename the page, since it's about diagnostics rather than metrics
  alone
- Change major section headings to H2s so they apper in the table of
  contents
- Move information about heartbeats and recovery to an H3 so it's
  more visible

Updates #10799

Co-authored-by: Paul Gottschling <[email protected]>
zmb3 added a commit that referenced this pull request Mar 18, 2022
- Rename the page, since it's about diagnostics rather than metrics
  alone
- Change major section headings to H2s so they apper in the table of
  contents
- Move information about heartbeats and recovery to an H3 so it's
  more visible

Updates #10799

Co-authored-by: Paul Gottschling <[email protected]>

Co-authored-by: Paul Gottschling <[email protected]>
zmb3 added a commit that referenced this pull request Mar 18, 2022
- Rename the page, since it's about diagnostics rather than metrics
  alone
- Change major section headings to H2s so they apper in the table of
  contents
- Move information about heartbeats and recovery to an H3 so it's
  more visible

Updates #10799

Co-authored-by: Paul Gottschling <[email protected]>

Co-authored-by: Paul Gottschling <[email protected]>
@webvictim webvictim mentioned this pull request Apr 19, 2022
@webvictim webvictim mentioned this pull request Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants