Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
104714: sql: create new system observability tables and update job  r=koorosh a=koorosh

This commit is the combination of two separate work streams,
brought together for resolving logic test fallout simultaneously.

The first, authored by `@koorosh` is the creation of
system.transaction_exec_insights and system.statement_exec_insights.

The second, authored by `@zachlite` in #111365 is the creation of
system.mvcc_statistics and the MVCCStatisticsUpdate job.

Regarding persisted insights:
Before, this data was kept in memory only and tracked limited
number of latest insights. These tables will be used to persist
this data periodically.

Tables allow to store the same information as in memory insights
without aggregation.

To control the amount of data stored in tables, there will be
follow up PR to run GC job and prune old records.

To make tables flexible to changes when some columns might become
obsolete, most of the columns defined as nullable.

Regarding persisted MVCC Statistics:
The system.mvcc_statistics table stores historical mvcc data
for a tenant's SQL objects. It's purpose it to serve mvcc data for a
SQL object quickly - The span stats API is too slow to use in a hot path.
Storing data over time unlocks new use cases like showing a table or
index's accumulated garbage over time.

The MVCCStatisticsUpdate Job is responsible for managing the contents of
the table, decoupled from the read-hotpath.

Both the table and job are baked when a cluster bootstraps itself, or upgrades
itself from a previous version.

This PR supersedes #111365 with the following changes:
- Descriptor fixes to the mvcc_statistics table. No logical changes,
just housekeeping to make sure that the create table schema and descriptors
produce the same table.
- Fixes to the job to make sure the job system can wind down.

Partially resolves: #104582
Epic: [CRDB-25491](https://cockroachlabs.atlassian.net/browse/CRDB-25491)
Release note: None

111660: sctest: Add a test that runs comparator testing from logictest stmts r=Xiang-Gu a=Xiang-Gu

This commit introduced a new test that takes as input a path to a corpus file (of stmts collected from logic tests) and feed them into the comparator testing framework.

It also edit the existing nightly such that we now collect the corpus and immediately feed them into the comparator testing framework, without having to store them in the cloud in between.

Fixes #108183
Epic [CRDB-30346](https://cockroachlabs.atlassian.net/browse/CRDB-30346)
Release note: None

Co-authored-by: Zach Lite <[email protected]>
Co-authored-by: Xiang Gu <[email protected]>
  • Loading branch information
3 people committed Oct 6, 2023
3 parents 44c96e9 + a87f0c7 + f26c1bf commit 3b438b4
Show file tree
Hide file tree
Showing 124 changed files with 2,192 additions and 458 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ dir="$(dirname $(dirname $(dirname $(dirname "${0}"))))"
source "$dir/teamcity-support.sh" # For $root
source "$dir/teamcity-bazel-support.sh" # For run_bazel

tc_start_block "Collect SQL Logic Tests Statements"
tc_start_block "Schema Changer Comparator Testing"
BAZEL_SUPPORT_EXTRA_DOCKER_ARGS="-e TC_BUILD_BRANCH -e GITHUB_API_TOKEN -e GOOGLE_EPHEMERAL_CREDENTIALS -e BUILD_VCS_NUMBER -e TC_BUILD_ID -e TC_SERVER_URL -e TC_BUILDTYPE_ID -e GITHUB_REPO" \
run_bazel build/teamcity/cockroach/nightlies/sqllogic_statements_corpus_nightly_impl.sh
tc_end_block "Collect SQL Logic Tests Statements"
run_bazel build/teamcity/cockroach/nightlies/schema_changer_comparator_nightly_impl.sh
tc_end_block "Schema Changer Comparator Testing"
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env bash

set -xeuo pipefail

dir="$(dirname $(dirname $(dirname $(dirname "${0}"))))"
source "$dir/teamcity-support.sh"

CORPUS_DIR=/artifacts/logictest-stmts-corpus-dir # dir to store all collected corpus file(s)
exit_status=0

# Collect sql logic tests statements corpus.
bazel run -- //pkg/cmd/generate-logictest-corpus:generate-logictest-corpus \
-out-dir=$CORPUS_DIR \
|| exit_status=$?

bazel build //pkg/cmd/bazci --config=ci
BAZEL_BIN=$(bazel info bazel-bin --config=ci)

# Run schema changer comparator test with statements from the collected corpus file(s).
for CORPUS_FILE in "$CORPUS_DIR"/*
do
$BAZEL_BIN/pkg/cmd/bazci/bazci_/bazci test -- --config=ci \
//pkg/sql/schemachanger:schemachanger_test \
--test_arg=--logictest-stmt-corpus-path="$CORPUS_FILE" \
--test_filter='^TestComparatorFromLogicTests$' \
--test_env=GO_TEST_WRAP_TESTV=1 \
--test_env=GO_TEST_WRAP=1 \
--test_timeout=7200 \
|| exit_status=$?
done

exit $exit_status

This file was deleted.

12 changes: 12 additions & 0 deletions docs/generated/metrics/metrics.html
Original file line number Diff line number Diff line change
Expand Up @@ -1077,6 +1077,18 @@
<tr><td>APPLICATION</td><td>jobs.migration.resume_completed</td><td>Number of migration jobs which successfully resumed to completion</td><td>jobs</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>jobs.migration.resume_failed</td><td>Number of migration jobs which failed with a non-retriable error</td><td>jobs</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>jobs.migration.resume_retry_error</td><td>Number of migration jobs which failed with a retriable error</td><td>jobs</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.currently_idle</td><td>Number of mvcc_statistics_update jobs currently considered Idle and can be freely shut down</td><td>jobs</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.currently_paused</td><td>Number of mvcc_statistics_update jobs currently considered Paused</td><td>jobs</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.currently_running</td><td>Number of mvcc_statistics_update jobs currently running in Resume or OnFailOrCancel state</td><td>jobs</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.expired_pts_records</td><td>Number of expired protected timestamp records owned by mvcc_statistics_update jobs</td><td>records</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.fail_or_cancel_completed</td><td>Number of mvcc_statistics_update jobs which successfully completed their failure or cancelation process</td><td>jobs</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.fail_or_cancel_failed</td><td>Number of mvcc_statistics_update jobs which failed with a non-retriable error on their failure or cancelation process</td><td>jobs</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.fail_or_cancel_retry_error</td><td>Number of mvcc_statistics_update jobs which failed with a retriable error on their failure or cancelation process</td><td>jobs</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.protected_age_sec</td><td>The age of the oldest PTS record protected by mvcc_statistics_update jobs</td><td>seconds</td><td>GAUGE</td><td>SECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.protected_record_count</td><td>Number of protected timestamp records held by mvcc_statistics_update jobs</td><td>records</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.resume_completed</td><td>Number of mvcc_statistics_update jobs which successfully resumed to completion</td><td>jobs</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.resume_failed</td><td>Number of mvcc_statistics_update jobs which failed with a non-retriable error</td><td>jobs</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>jobs.mvcc_statistics_update.resume_retry_error</td><td>Number of mvcc_statistics_update jobs which failed with a retriable error</td><td>jobs</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>jobs.new_schema_change.currently_idle</td><td>Number of new_schema_change jobs currently considered Idle and can be freely shut down</td><td>jobs</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>jobs.new_schema_change.currently_paused</td><td>Number of new_schema_change jobs currently considered Paused</td><td>jobs</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>jobs.new_schema_change.currently_running</td><td>Number of new_schema_change jobs currently running in Resume or OnFailOrCancel state</td><td>jobs</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -315,4 +315,4 @@ trace.snapshot.rate duration 0s if non-zero, interval at which background trace
trace.span_registry.enabled boolean true if set, ongoing traces can be seen at https://<ui>/#/debug/tracez application
trace.zipkin.collector string the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used. application
ui.display_timezone enumeration etc/utc the timezone used to format timestamps in the ui [etc/utc = 0, america/new_york = 1] application
version version 1000023.1-28 set the active cluster version in the format '<major>.<minor>' application
version version 1000023.1-32 set the active cluster version in the format '<major>.<minor>' application
2 changes: 1 addition & 1 deletion docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,6 @@
<tr><td><div id="setting-trace-span-registry-enabled" class="anchored"><code>trace.span_registry.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>if set, ongoing traces can be seen at https://&lt;ui&gt;/#/debug/tracez</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-trace-zipkin-collector" class="anchored"><code>trace.zipkin.collector</code></div></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as &lt;host&gt;:&lt;port&gt;. If no port is specified, 9411 will be used.</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-ui-display-timezone" class="anchored"><code>ui.display_timezone</code></div></td><td>enumeration</td><td><code>etc/utc</code></td><td>the timezone used to format timestamps in the ui [etc/utc = 0, america/new_york = 1]</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000023.1-28</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000023.1-32</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
</tbody>
</table>
9 changes: 9 additions & 0 deletions pkg/ccl/backupccl/system_schema.go
Original file line number Diff line number Diff line change
Expand Up @@ -818,6 +818,15 @@ var systemTableBackupConfiguration = map[string]systemBackupConfiguration{
systemschema.RegionLivenessTable.GetName(): {
shouldIncludeInClusterBackup: optOutOfClusterBackup,
},
systemschema.SystemMVCCStatisticsTable.GetName(): {
shouldIncludeInClusterBackup: optOutOfClusterBackup,
},
systemschema.StatementExecInsightsTable.GetName(): {
shouldIncludeInClusterBackup: optOutOfClusterBackup,
},
systemschema.TransactionExecInsightsTable.GetName(): {
shouldIncludeInClusterBackup: optOutOfClusterBackup,
},
}

func rekeySystemTable(
Expand Down
4 changes: 2 additions & 2 deletions pkg/ccl/logictestccl/testdata/logic_test/crdb_internal_tenant
Original file line number Diff line number Diff line change
Expand Up @@ -351,12 +351,12 @@ txn_id txn_fingerprint_id query implicit_txn session_id start_time end_tim
query ITTI
SELECT range_id, start_pretty, end_pretty, lease_holder FROM crdb_internal.ranges
----
65 /Tenant/10 /Tenant/11 1
68 /Tenant/10 /Tenant/11 1

query ITT
SELECT range_id, start_pretty, end_pretty FROM crdb_internal.ranges_no_leases
----
65 /Tenant/10 /Tenant/11
68 /Tenant/10 /Tenant/11

query IT
SELECT zone_id, target FROM crdb_internal.zones ORDER BY 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -463,7 +463,7 @@ skipif config multiregion-9node-3region-3azs-vec-off
query I retry
SELECT DISTINCT range_id FROM [SHOW RANGES FROM TABLE messages_rbr]
----
67
70

# Update does not fail when accessing all rows in messages_rbr because lookup
# join does not error out the lookup table in phase 1.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ ap-southeast-2 23
query TT
SELECT start_key, end_key FROM [SHOW RANGE FROM TABLE regional_by_row_table FOR ROW ('ap-southeast-2', 1)]
----
<before:/Table/62> …
<before:/Table/65> …

query TIIII
SELECT crdb_region, pk, pk2, a, b FROM regional_by_row_table
Expand Down Expand Up @@ -402,7 +402,7 @@ SELECT start_key, end_key, replicas, lease_holder FROM [SHOW RANGES FROM INDEX r
ORDER BY 1
----
start_key end_key replicas lease_holder
<before:/Table/62> …/"\x80"/0 {1} 1
<before:/Table/65> …/"\x80"/0 {1} 1
…/"\x80"/0 …/"\xc0"/0 {4} 4
…/"\xc0"/0 <after:/Table/110/5> {7} 7

Expand Down
12 changes: 12 additions & 0 deletions pkg/ccl/spanconfigccl/spanconfigreconcilerccl/testdata/basic
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,9 @@ upsert /Table/{59-60} database system (host)
upsert /Table/6{0-1} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
upsert /Table/6{1-2} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
upsert /Table/6{2-3} database system (host)
upsert /Table/6{3-4} database system (host)
upsert /Table/6{4-5} database system (host)
upsert /Table/6{5-6} database system (host)

exec-sql
CREATE DATABASE db;
Expand Down Expand Up @@ -122,6 +125,9 @@ state offset=47
/Table/6{0-1} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{1-2} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{2-3} database system (host)
/Table/6{3-4} database system (host)
/Table/6{4-5} database system (host)
/Table/6{5-6} database system (host)
/Table/10{6-7} num_replicas=7 num_voters=5
/Table/10{7-8} num_replicas=7
/Table/11{2-3} num_replicas=7
Expand Down Expand Up @@ -233,6 +239,12 @@ delete /Table/{59-60}
upsert /Table/{59-60} ttl_seconds=100 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
delete /Table/6{2-3}
upsert /Table/6{2-3} ttl_seconds=100 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
delete /Table/6{3-4}
upsert /Table/6{3-4} ttl_seconds=100 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
delete /Table/6{4-5}
upsert /Table/6{4-5} ttl_seconds=100 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
delete /Table/6{5-6}
upsert /Table/6{5-6} ttl_seconds=100 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true

state offset=5 limit=42
----
Expand Down
15 changes: 15 additions & 0 deletions pkg/ccl/spanconfigccl/spanconfigreconcilerccl/testdata/indexes
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ state offset=47
/Table/6{0-1} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{1-2} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{2-3} database system (host)
/Table/6{3-4} database system (host)
/Table/6{4-5} database system (host)
/Table/6{5-6} database system (host)
/Table/10{6-7} range default

exec-sql
Expand Down Expand Up @@ -74,6 +77,9 @@ state offset=47
/Table/6{0-1} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{1-2} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{2-3} database system (host)
/Table/6{3-4} database system (host)
/Table/6{4-5} database system (host)
/Table/6{5-6} database system (host)
/Table/106{-/2} num_replicas=7
/Table/106/{2-3} num_replicas=7 num_voters=5
/Table/10{6/3-7} num_replicas=7
Expand Down Expand Up @@ -113,6 +119,9 @@ state offset=47
/Table/6{0-1} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{1-2} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{2-3} database system (host)
/Table/6{3-4} database system (host)
/Table/6{4-5} database system (host)
/Table/6{5-6} database system (host)
/Table/106{-/2} ttl_seconds=3600 num_replicas=7
/Table/106/{2-3} ttl_seconds=25 num_replicas=7 num_voters=5
/Table/10{6/3-7} ttl_seconds=3600 num_replicas=7
Expand Down Expand Up @@ -142,6 +151,9 @@ state offset=47
/Table/6{0-1} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{1-2} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{2-3} database system (host)
/Table/6{3-4} database system (host)
/Table/6{4-5} database system (host)
/Table/6{5-6} database system (host)
/Table/106{-/2} ttl_seconds=3600 num_replicas=9
/Table/106/{2-3} ttl_seconds=25 num_replicas=9 num_voters=5
/Table/10{6/3-7} ttl_seconds=3600 num_replicas=9
Expand Down Expand Up @@ -185,3 +197,6 @@ state offset=46
/Table/6{0-1} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{1-2} ttl_seconds=3600 ignore_strict_gc=true num_replicas=5 rangefeed_enabled=true
/Table/6{2-3} database system (host)
/Table/6{3-4} database system (host)
/Table/6{4-5} database system (host)
/Table/6{5-6} database system (host)
Loading

0 comments on commit 3b438b4

Please sign in to comment.