-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.2: sql,opt: improve SHOW TABLES and virtual tables with many databases #91054
release-22.2: sql,opt: improve SHOW TABLES and virtual tables with many databases #91054
Conversation
58abb40
to
5936a7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r1, 1 of 1 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained
@rafiss can I get a look at this and can you help me decide whether I should default this to off on release-22.2? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this lgtm, with some minor nits that maybe should be addressed in master
as well. as far as the default setting value, i feel fine about leaving it as true. one thing i wonder though is how do we envision that someone would know to disable it if they need to? it doesn't seem like something we want to spend time training TSE on
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @rytaft)
pkg/sql/crdb_internal.go
line 488 at r4 (raw file):
indexes: []virtualIndex{ { populate: func(ctx context.Context, unwrappedConstraint tree.Datum, p *planner, _ catalog.DatabaseDescriptor, addRow func(...tree.Datum) error) (matched bool, err error) {
nit: add a comment to each populate
to make it clear which index is which
pkg/sql/resolver.go
line 988 at r3 (raw file):
} var metamorphicDeadultUseIndexLookupForDescriptorsInDatabase = util.ConstantWithMetamorphicTestBool(
nit: typo
pkg/sql/delegate/show_tables.go
line 113 at r4 (raw file):
estimatedRowCount, estimatedRowCountJoin, tree.NewDString(name.CatalogName.String()),
nit: did you test this query with a database named like "Mixed-Case"
? i believe CatalogName.String() will add those quotes, but for this query we don't want that, so maybe it should just be string(name.CatalogName)
. also, the NewDString
is not needed.
Release note: None
information_schema and pg_catalog are always scoped to a database. Before this change, we would always look up all descriptors to materialize virtual tables, even if we only care about descriptors in the current database. This PR adds a new cluster setting (so that backports can be made kosher) to enable a new approach: scan the namespace table to find the relevant descriptors and then only resolve those. On my local machine where I loaded up a few megabytes of descriptors into 6k databases, a simple query like `SELECT * FROM information_schema.sequences` takes 1.2s without the setting enabled and 1ms with it. Release note (performance improvement): Tables in `pg_catalog` and `information_schema` (when not explicitly referenced as `"".information_schema`) may now be much faster if the current database has a small number of relations relative to the total number in the cluster.
This adds some virtual indexes to `crdb_internal.tables` so that it performs an index join when looking up the tables in a database. Release note: None
This commit modifies the query to use database-scoped virtual tables. In doing this, we now avoid reading all descriptors in the cluster. On a local cluster which has a large number of small databases, this query was using lots of the CPU. Release note: None
A partial index already means something. This means something different. Release note: None
The index would never be used. This eliminates a footgun. Release note: None
Release note: None
A recent commit introduced a bug which would have resulted in the localities of multi-region tables always being set to `NULL` due to bad quoting of the database name in a join expression. This is now fixed. No release note because it was never shipped. Release note: None
Release note: None
5936a7e
to
e918085
Compare
89584: cloud: add read and write metrics r=adityamaru a=adityamaru This change adds cloud.read_bytes and cloud.write_bytes metrics. These are updated on all read/write operations to external storage endpoints. These metrics while shared by all external storage interactions will provide an immediate signal into whether we're seeing an abnormal rate of reads and writes errors during support investigations. Fixes: #89242 Release note: None 91205: descidgen: use system.descriptor_id_seq for all tenants r=postamar a=postamar Previously, the system tenant used a legacy non-SQL key in the meta range to generate descriptor IDs, while non-system tenants used the system.descriptor_id_seq sequence. This commit introduces a cluster upgrade step which creates this sequence in the the system tenant's system database and migrates the legacy counter value to it. While this migration is underway, the descriptor ID generator is not available. This upgrade consists of a relatively brief transaction and descriptor ID generation is a relatively infrequent event, as a result the disruption is deemed tolerable. Fixes #80294. Release note (sql change): upgrading a cluster to a major release containing this patch will endow the system tenant's system database with a descriptor_id_seq sequence which will henceforth be used to generate new descriptor IDs, as is already the case for non-system tenants. 91225: rttanalysis: skip slow test case r=ajwerner a=rafiss see also #88885 Release note: None 91231: storage: remove legacy SST iterator r=nicktrav a=jbowens The legacy sstable iterator is no longer in use. Removing the legacy sstable iterator now also avoids updating it to accommodate the new Pebble internal iterator interface with lazy value fetching. Epic: None Release note: None 91245: sql: include all schemas into schema.sql in the bundle r=yuzefovich a=yuzefovich This commit makes it so that all schemas are added into the `schema.sql` file of the statement bundle using `SHOW CREATE ALL SCHEMAS;` statement (the only exception is the public schema that always exists in CRDB clusters). This will make it more likely that `debug statement-bundle recreate` succeeds on the bundles. Fixes: #91099. Epic: None Release note: None 91255: ui: fixes on sql activity filter r=maryliag a=maryliag This commit fixes the label of Transaction filter and add the proper color for the filter styles (previously was being loaded as a different color on the Sessions page because it was being inherit from another component) Fixes #91206 Transactions Before <img width="457" alt="Screen Shot 2022-11-03 at 7 07 13 PM" src="https://user-images.githubusercontent.com/1017486/199851829-b747102a-ac70-4720-97b5-7911b742345c.png"> Transactions After <img width="477" alt="Screen Shot 2022-11-03 at 6 55 14 PM" src="https://user-images.githubusercontent.com/1017486/199851704-71641bb6-8247-4d92-8f2d-3efd27846594.png"> Sessions Before <img width="317" alt="Screen Shot 2022-11-03 at 6 36 51 PM" src="https://user-images.githubusercontent.com/1017486/199851596-313646b9-0188-48aa-9392-7dd7f3eded99.png"> Sessions After <img width="309" alt="Screen Shot 2022-11-03 at 6 55 06 PM" src="https://user-images.githubusercontent.com/1017486/199851671-02d4db08-4f2b-4ac3-95ff-e1d98e2d6ede.png"> Before/After of Statement page continues the same <img width="604" alt="Screen Shot 2022-11-03 at 6 55 28 PM" src="https://user-images.githubusercontent.com/1017486/199851734-06a8a131-3528-4d17-a856-9d39d67e7017.png"> Release note (ui change): Fix Transaction filter label on SQL Activity page. 91260: sql: add crdb_internal.descriptor_with_post_deserialization_changes r=postamar a=postamar This builtin applies post-deserialization changes to descriptors. This is useful during descriptor surgery as well as, potentially, during cluster upgrade steps. Fixes #90269. Release note: None 91289: tracing: use timeutil for Now() and Since() r=yuzefovich a=yuzefovich Previously, for performance reasons (as well as a few others) we used `time.Now()` and `time.Since` in the tracing library. But the work in f782e45 alleviated those concerns, so we can now switch to using `timeutil` library in two places. ``` Tracer_StartSpanCtx/opts=none-24 508ns ± 0% 512ns ± 0% +0.80% (p=0.000 n=10+10) Tracer_StartSpanCtx/opts=real-24 510ns ± 1% 515ns ± 1% +0.96% (p=0.000 n=10+10) Tracer_StartSpanCtx/opts=real,logtag-24 539ns ± 1% 544ns ± 1% +0.90% (p=0.000 n=10+10) Tracer_StartSpanCtx/opts=real,autoparent-24 480ns ± 0% 486ns ± 1% +1.26% (p=0.000 n=10+10) Tracer_StartSpanCtx/opts=real,manualparent-24 523ns ± 1% 527ns ± 1% +0.91% (p=0.004 n=10+10) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener-24 755ns ± 0% 765ns ± 0% +1.40% (p=0.000 n=10+10) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener-24 547ns ± 1% 552ns ± 1% +0.89% (p=0.000 n=10+10) Span_GetRecording/root-only-24 7.36ns ± 0% 6.75ns ± 0% -8.26% (p=0.000 n=10+10) Span_GetRecording/child-only-24 7.36ns ± 0% 6.75ns ± 0% -8.30% (p=0.000 n=10+10) Span_GetRecording/root-child-24 7.36ns ± 0% 6.74ns ± 0% -8.32% (p=0.000 n=10+10) RecordingWithStructuredEvent/with-event-listener-24 3.47µs ± 1% 3.47µs ± 1% ~ (p=0.541 n=10+10) RecordingWithStructuredEvent/without-event-listener-24 3.31µs ± 0% 3.32µs ± 1% ~ (p=0.271 n=10+10) SpanCreation/detached-child=false-24 1.07µs ± 1% 1.07µs ± 1% ~ (p=0.955 n=10+10) SpanCreation/detached-child=true-24 1.84µs ± 1% 1.82µs ± 1% -1.30% (p=0.000 n=10+10) ``` And no changes in allocations. Epic: None Release note: None 91292: sql,delegate: cleanup from PR to improve performance of vtables with many databases r=ajwerner a=ajwerner This is fallout from #91054 (review). See individual commits. Epic: None Release note: None Co-authored-by: adityamaru <[email protected]> Co-authored-by: Marius Posta <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: maryliag <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
Backport 6/6 commits from #90116.
/cc @cockroachdb/release
information_schema and pg_catalog are always scoped to a database. Before this change, we would always look up all descriptors to materialize virtual tables, even if we only care about descriptors in the current database. This PR adds a new cluster setting (so that backports can be made kosher) to enable a new approach: scan the namespace table to find the relevant descriptors and then only resolve those.
On my local machine where I loaded up a few megabytes of descriptors into 6k databases, a simple query like
SELECT * FROM information_schema.sequences
takes 1.2s without the setting enabled and 1ms with it.Release justification: Fixes an important issue for using many databases.
Fixes #90115
Epic: CRDB-20865
Release note (performance improvement): Tables in
pg_catalog
andinformation_schema
(when not explicitly referenced as"".information_schema
) may now be much faster if the current database hasa small number of relations relative to the total number in the cluster.