-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.1: schematelemetry: emit metrics and logs about invalid objects #109733
Conversation
a104323
to
475494e
Compare
40d5841
to
eba281a
Compare
eba281a
to
2098fee
Compare
blathers backport 23.1.9-rc |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error setting assignees, but backport branch blathers/backport-release-23.1.9-rc-109733 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/issues/109739/assignees: 404 Not Found [] Backport to branch 23.1.9-rc failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
Short of continuously polling `crdb_internal.invalid_objects`, there was not a convenient way to monitor a cluster for descriptor corruption. Having such an indicator would allow customers to perform preflight checks ahead of upgrades to avoid being stuck in a mixed version state. It would also allow CRL to more easily monitor cloud clusters for corruptions in the wild. This commit updates the schematelemetry job to additionally update the `sql.schema.invalid_objects` gauge and emit logs for any encountered corruptions. Informs: #104266 Epic: CRDB-28665 Release note (ops change): Added a new sql.schema.invalid_objects gauge metric. This gauge is periodically updated based on the schedule set by the sql.schema.telemetry.recurrence cluster setting. When it is updated, it counts the number of schema objects (tables, types, schemas, databases, and functions) that are in an invalid state according to CockroachDB’s internal validation checks. This metric is expected to be zero in a healthy cluster, and if it is not, it indicates that there is a problem that must be repaired.
2098fee
to
a49fe07
Compare
Backport 1/1 commits from #108559 on behalf of @chrisseto.
/cc @cockroachdb/release
Short of continuously polling
crdb_internal.invalid_objects
, there was not a convenient way to monitor a cluster for descriptor corruption.Having such an indicator would allow customers to perform preflight checks ahead of upgrades to avoid being stuck in a mixed version state. It would also allow CRL to more easily monitor cloud clusters for corruptions in the wild.
This commit updates the schematelemetry job to additionally update the
sql.schema.invalid_objects
gauge and emit logs for any encountered corruptions.Informs: #104266
Epic: CRDB-28665
Release note (ops change): Added a new sql.schema.invalid_objects gauge
metric. This gauge is periodically updated based on the schedule set by
the sql.schema.telemetry.recurrence cluster setting. When it is updated,
it counts the number of schema objects (tables, types, schemas, databases,
and functions) that are in an invalid state according to CockroachDB’s
internal validation checks. This metric is expected to be zero in a healthy
cluster, and if it is not, it indicates that there is a problem that must
be repaired.
Release justification: low risk addition to metrics