Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist Insights to help users correlate problematic signals over time #104582

Open
koorosh opened this issue Jun 8, 2023 · 1 comment · Fixed by #104714
Open

Persist Insights to help users correlate problematic signals over time #104582

koorosh opened this issue Jun 8, 2023 · 1 comment · Fixed by #104714
Labels
A-cluster-observability Related to cluster observability A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-escalation-improvement Having this feature would have made an escalation easier O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-observability

Comments

@koorosh
Copy link
Collaborator

koorosh commented Jun 8, 2023

Is your feature request related to a problem? Please describe.
Currently insights for SQL statements and transactions stored in memory only and this:

  • doesn't allow to track historical data for insights as far as it has eviction policy that cleans up cache periodically
  • after cluster restarts (relevant to tenant clusters), Insights are not available then;
  • querying insights for large clusters becomes expensive as far as it should fan out request for every node in the cluster.

Describe the solution you'd like

  • Periodically preserve insights in system table. It should use the same logic of flashing data to system table as for
    system.transaction_statistics and system.statement_activity tables;
  • Fallback to in-memory option in case requested insights haven't been flashed to system table yet.

Jira issue: CRDB-28612

Epic CC-28354

@koorosh koorosh added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-cluster-observability A-cluster-observability Related to cluster observability labels Jun 8, 2023
@koorosh koorosh self-assigned this Jun 8, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jun 8, 2023

Hello, I am Blathers. I am here to help you get the issue triaged.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/sql-foundations (found keywords: SQL statement)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jun 8, 2023
@koorosh koorosh removed O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jun 8, 2023
@maryliag maryliag added C-escalation-improvement Having this feature would have made an escalation easier O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs labels Jul 6, 2023
zachlite pushed a commit to koorosh/cockroach that referenced this issue Oct 5, 2023
This commit is the combination of two separate work streams,
brought together for resolving logic test fallout simultaneously.

The first, authored by @koorosh is the creation of
system.transaction_exec_insights and system.statement_exec_insights.

The second, authored by @zachlite in cockroachdb#111365 is the creation of
system.mvcc_statistics and the MVCCStatisticsUpdate job.

Regarding persisted insights:
Before, this data was kept in memory only and tracked limited
number of latest insights. These tables will be used to persist
this data periodically.

Tables allow to store the same information as in memory insights
without aggregation.

To control the amount of data stored in tables, there will be
follow up PR to run GC job and prune old records.

To make tables flexible to changes when some columns might become
obsolete, most of the columns defined as nullable.

Regarding persisted MVCC Statistics:
The system.mvcc_statistics table stores historical mvcc data
for a tenant's SQL objects. It's purpose it to serve mvcc data for a
SQL object quickly - The span stats API is too slow to use in a hot path.
Storing data over time unlocks new use cases like showing a table or
index's accumulated garbage over time.

The MVCCStatisticsUpdate Job is responsible for managing the contents of
the table, decoupled from the read-hotpath.

Both the table and job are baked when a cluster bootstraps itself, or upgrades
itself from a previous version.

This PR supersedes cockroachdb#111365 with the following changes:
- Descriptor fixes to the mvcc_statistics table. No logical changes,
just housekeeping to make sure that the create table schema and descriptors
produce the same table.
- Fixes to the job to make sure the job system can wind down.

Partially resolves: cockroachdb#104582
Epic: CRDB-25491
Release note: None
craig bot pushed a commit that referenced this issue Oct 6, 2023
104714: sql: create new system observability tables and update job  r=koorosh a=koorosh

This commit is the combination of two separate work streams,
brought together for resolving logic test fallout simultaneously.

The first, authored by `@koorosh` is the creation of
system.transaction_exec_insights and system.statement_exec_insights.

The second, authored by `@zachlite` in #111365 is the creation of
system.mvcc_statistics and the MVCCStatisticsUpdate job.

Regarding persisted insights:
Before, this data was kept in memory only and tracked limited
number of latest insights. These tables will be used to persist
this data periodically.

Tables allow to store the same information as in memory insights
without aggregation.

To control the amount of data stored in tables, there will be
follow up PR to run GC job and prune old records.

To make tables flexible to changes when some columns might become
obsolete, most of the columns defined as nullable.

Regarding persisted MVCC Statistics:
The system.mvcc_statistics table stores historical mvcc data
for a tenant's SQL objects. It's purpose it to serve mvcc data for a
SQL object quickly - The span stats API is too slow to use in a hot path.
Storing data over time unlocks new use cases like showing a table or
index's accumulated garbage over time.

The MVCCStatisticsUpdate Job is responsible for managing the contents of
the table, decoupled from the read-hotpath.

Both the table and job are baked when a cluster bootstraps itself, or upgrades
itself from a previous version.

This PR supersedes #111365 with the following changes:
- Descriptor fixes to the mvcc_statistics table. No logical changes,
just housekeeping to make sure that the create table schema and descriptors
produce the same table.
- Fixes to the job to make sure the job system can wind down.

Partially resolves: #104582
Epic: [CRDB-25491](https://cockroachlabs.atlassian.net/browse/CRDB-25491)
Release note: None

111660: sctest: Add a test that runs comparator testing from logictest stmts r=Xiang-Gu a=Xiang-Gu

This commit introduced a new test that takes as input a path to a corpus file (of stmts collected from logic tests) and feed them into the comparator testing framework.

It also edit the existing nightly such that we now collect the corpus and immediately feed them into the comparator testing framework, without having to store them in the cloud in between.

Fixes #108183
Epic [CRDB-30346](https://cockroachlabs.atlassian.net/browse/CRDB-30346)
Release note: None

Co-authored-by: Zach Lite <[email protected]>
Co-authored-by: Xiang Gu <[email protected]>
@craig craig bot closed this as completed in a87f0c7 Oct 6, 2023
@exalate-issue-sync exalate-issue-sync bot changed the title sql: persist SQL insights to system table Persist Insights to help users correlate problematic signals over time Feb 7, 2024
@exalate-issue-sync exalate-issue-sync bot reopened this Feb 7, 2024
@nkodali nkodali added the P-3 Issues/test failures with no fix SLA label Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cluster-observability Related to cluster observability A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-escalation-improvement Having this feature would have made an escalation easier O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-observability
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants