add index for range scans on views #95770

maryliag · 2023-01-24T17:31:35Z

Add the ability to add/use index for range scans on virtual tables.

We have a feature where we display the list of fingerprints used per index, and to be possible we do a select such as
SELECT * FROM system.statement_statistics WHERE "1@1"::jsonb <@ indexes_usage
but we have to use the system table, since this is not possible on the crdb_internal.statement_statistics virtual table.
We don't let non-admin users to do selects on system tables, so using the crdb_internal table without the index is very inefficient.

Jira issue: CRDB-23726

The text was updated successfully, but these errors were encountered:

cucaroach · 2023-01-24T18:41:08Z

What's the level of urgency of this?

cucaroach · 2023-01-24T18:41:37Z

Possibly related: #74806

maryliag · 2023-01-24T20:57:35Z

What's the level of urgency of this?

The feature mentioned on the description (listing fingerprints used by index) will have to be hidden to non-admins until this task is completed, so the sooner the better.

Feel free to close this one in favour of the task you tagged on your message.

ajwerner · 2023-01-24T21:18:55Z

Another possible approach could be for us to:

Implement set-returning functions (we're planning to do this)
Implement the SECURITY DEFINER feature on functions so that the function can run as its owner

Then we could have the function do the nice query parameterized as we want it and then grant access to the function somehow via a role?

cc @mgartner and @chengxiong-ruan

mgartner · 2023-01-25T20:27:33Z

Can we not already do the exact same thing as @ajwerner describes but with views? Or is it impossible to create a role that can access a view but not the tables it references?

ajwerner · 2023-01-25T23:29:57Z

What is the privilege model here? I assumed it was complex, but if it's simple, views should do the trick.

The get the list of fingerprints used by an index, we use the `system.statement_statistics` directly, but non-admins don't have access to system table. Until cockroachdb#95756 or cockroachdb#95770 are done, we need to hide this feature for non-admins. Part Of cockroachdb#93087 Release note (ui change): Hide list of used fingerprint per index on Index Details page, for non-admin users.

94153: sql: Remove type annotations from `column_default` in `information_schema.columns` r=ZhouXing19 a=e-mbrown Informs: #87774 The type annotation on `column_default` caused confusion in postgres compatible apps. This change formats the `column_default` without the type annotation. It does not address the incorrect type annotation. Release note (bug fix): The content of `column_default` in `information_schema.columns` no longer have type annotations. 95855: backupccl: run restore/tpce/32tb/aws/nodes=15/cpus=16 once a week r=benbardin a=msbutler This patch adds the restore/tpce/32TB/aws/nodes=15/cpus=16 test to our weekly test suite. The backup fixture was generated by initing a tpce cluster with 2 Million customers, and running this workload while 48 incremental backups were taken at 15 minute increments. The roachtest restores from a system time captured in the 24th backup. The cluster used to restore the backup will run on aws with 15 nodes each with 16 vcpus. To verify this test exists, run: `roachtest list tag:weekly` Release note: None Epic: none 95861: kv: remove lastToReplica and lastFromReplica from raftMu r=nvanbenschoten a=nvanbenschoten Extracted from #94165 without modification. ---- In 410ef29, we moved these fields from under the Replica.mu to under the Replica.raftMu. This was done to avoid lock contention. In this commit, we move these fields under their own mutex so that they can be accessed without holding the raftMu. This allows us to send Raft messages from other goroutines. The commit also switches from calling RawNode.ReportUnreachable directly in sendRaftMessage to using the more flexible unreachablesMu set, which defers the call to ReportUnreachable until the next Raft tick. As a result, the commit closes #84246. Release note: None Epic: None 95876: sql: remove round-trips from common DISCARD ALL r=ajwerner a=ajwerner #### sql: avoid checking if a role exists when setting to the current role This is an uncached round-trip. It makes `DISCARD` expensive. In #86485 we implemented `SET SESSION AUTHORIZATION DEFAULT`, this added an extra sql query to `DISCARD ALL` to check if the role we'd become exists. This is almost certainly unintentional in the case where the role we'd become is the current session role. I think this just happened because of code consolidation. We now no longer check if the current session role exists. This removes the last round-trip from DISCARD ALL. #### sql: use in-memory session data to decide what to do in DISCARD In #86246 we introduced logic to discard schemas when running DISCARD ALL and DISCARD TEMP. This logic did expensive kv operations unconditionally; if the session knew it had never created a temporary schema, we'd still fetch all databases and proceed to search all databases for a temp schema. This is very expensive. Fixes: #95864 Release note (performance improvement): In 22.2, logic was added to make `SET SESSION AUTHORIZATION DEFAULT` not a no-op. This implementation used more general code for setting the role for a session which made sure that the role exists. The check for whether a role exists is currently uncached. We don't need to check if the role we already are exists. This improves the performance of `DISCARD ALL` in addition to `SET SESSION AUTHORIZATION DEFAULT`. Release note (performance improvement): In 22.2, we introduced support for `DISCARD TEMP` and made `DISCARD ALL` actually discard temp tables. This implementation ran expensive logic to discover temp schemas rather than consulting in-memory data structures. As a result, `DISCARD ALL`, which is issued regularly by connection pools, became an expensive operation when it should be cheap. This problem is now resolved. 95997: ui: hide list of statement for non-admins r=maryliag a=maryliag To get the list of fingerprints used by an index, we use the `system.statement_statistics` directly, but non-admins don't have access to system table. Until #95756 or #95770 are done, we need to hide this feature for non-admins. Part Of #93087 Release note (ui change): Hide list of used fingerprint per index on Index Details page, for non-admin users. 95998: roachtest: update activerecord blocklist r=ZhouXing19 a=andyyang890 This patch removes a few tests from the blocklist that do not exist in the test suite for rails 7.0.3, which were being run prior to the fix for version pinning. Informs #94211 Release note: None 96003: kv: add MVCC local timestamp details to ReadWithinUncertaintyIntervalError r=nvanbenschoten a=nvanbenschoten This PR adds details about MVCC local timestamps to ReadWithinUncertaintyIntervalErrors. This provides more information about the cause of uncertainty errors. The PR also includes a series of minor refactors to clean up this code. Release note: None Epic: None Co-authored-by: e-mbrown <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: maryliag <[email protected]> Co-authored-by: Andy Yang <[email protected]>

michae2 · 2023-02-07T19:54:26Z

[sql queries triage meeting] @maryliag would using views work, as mentioned above?

maryliag · 2023-02-07T19:57:27Z

I don't understand how that works, for example, what I would need to do to make a select as
SELECT * FROM crdb_internal.statement_statistics WHERE "1@1"::jsonb <@ indexes_usage ?

ajwerner · 2023-02-07T20:16:18Z

Okay, so the summary of the problem as I understand it is:

We have a view which today combines the data from the system table and the in-memory data (crdb_internal.statement_statistics).
We have a good index on the system table for the query we'd like to write.
What we don't have is a way to push the query bounds into the in-memory data, so we end up materializing all of it.
Maybe we don't care about the in-memory data, and we only care about the durable data, so we'd like to just scan the system table
End users can't just scan the system table because they don't have sufficient privileges.

The proposed solution is:

Create a virtual view which wraps just querying the system table.
Query that view.

The advantage of this over the system table is that you can grant privileges to these virtual tables now. An example of such a a wrapping is this:

--- a/pkg/sql/crdb_internal.go
+++ b/pkg/sql/crdb_internal.go
@@ -182,6 +182,7 @@ var crdbInternal = virtualSchema{
                catconstants.CrdbInternalTablesTableID:                      crdbInternalTablesTable,
                catconstants.CrdbInternalClusterTxnStatsTableID:             crdbInternalClusterTxnStatsTable,
                catconstants.CrdbInternalTxnStatsTableID:                    crdbInternalTxnStatsView,
+               catconstants.CrdbInternalStatementStatisticsTableID:         crdbInternalStatementStatisticsView,
                catconstants.CrdbInternalTransactionStatsTableID:            crdbInternalTransactionStatisticsTable,
                catconstants.CrdbInternalZonesTableID:                       crdbInternalZonesTable,
                catconstants.CrdbInternalInvalidDescriptorsTableID:          crdbInternalInvalidDescriptorsTable,
@@ -6297,6 +6298,34 @@ CREATE TABLE crdb_internal.cluster_transaction_statistics (
        },
 }
 
+var crdbInternalStatementStatisticsView = virtualSchemaView{
+       schema: `CREATE VIEW crdb_internal.statement_statistics2 AS SELECT
+                       aggregated_ts,
+                       fingerprint_id,
+                       transaction_fingerprint_id,
+                       plan_hash,
+                       app_name,
+                       node_id,
+                       agg_interval,
+                       metadata,
+                       statistics,
+                       plan,
+                       index_recommendations FROM system.statement_statistics`,
+       resultColumns: colinfo.ResultColumns{
+               {Name: "aggregated_ts", Typ: types.TimestampTZ},
+               {Name: "fingerprint_id", Typ: types.Bytes},
+               {Name: "transaction_fingerprint_id", Typ: types.Bytes},
+               {Name: "plan_hash", Typ: types.Bytes},
+               {Name: "app_name", Typ: types.String},
+               {Name: "node_id", Typ: types.Int},
+               {Name: "agg_interval", Typ: types.Interval},
+               {Name: "metadata", Typ: types.Jsonb},
+               {Name: "statistics", Typ: types.Jsonb},
+               {Name: "plan", Typ: types.Jsonb},
+               {Name: "index_recommendations", Typ: types.StringArray},
+       },
+}
+

maryliag · 2023-02-07T22:15:28Z

To the point 4: I'm only using the system table because there is no way to do that on the combined view. My goal is to do that select on the combined view, because I do care with the latest data (that only exists in-memory)

The point number 5 can be done as a workaround, until I can make the call to the combined view.

ajwerner · 2023-02-07T23:09:04Z

Great. This issue does seem fitting as written for point 4. We need to have more advanced predicate push down into virtual tables and virtual indexes to realize the vision.

yuzefovich · 2023-02-14T19:54:30Z

In a recent escalation, we had a system.statement_statistics table with 2.9TiB of data, and queries that power the SQL Activity page were performing a full scan of that table. Fixing this issue would have (probably) helped there and perhaps made the escalation be resolved faster.

mgartner · 2023-03-30T15:20:28Z

@maryliag When we spoke IRL I believe you mentioned that you are not longer combining the in-memory data with persisted data because fanning out to each node to retrieve the in-memory data was too slow. I'm going to close this issue, because I don't think there's anything to be done here, but let me know if I'm mistaken.

maryliag added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team labels Jan 24, 2023

maryliag mentioned this issue Jan 26, 2023

ui: hide list of statement for non-admins #95997

Merged

michae2 mentioned this issue Mar 2, 2023

pass filter/limit/order to virtual tables #95766

Closed

mgartner closed this as completed Mar 30, 2023

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add index for range scans on views #95770

add index for range scans on views #95770

maryliag commented Jan 24, 2023 •

edited by rytaft

Loading

cucaroach commented Jan 24, 2023

cucaroach commented Jan 24, 2023

maryliag commented Jan 24, 2023

ajwerner commented Jan 24, 2023

mgartner commented Jan 25, 2023

ajwerner commented Jan 25, 2023

michae2 commented Feb 7, 2023

maryliag commented Feb 7, 2023

ajwerner commented Feb 7, 2023

maryliag commented Feb 7, 2023

ajwerner commented Feb 7, 2023

yuzefovich commented Feb 14, 2023

mgartner commented Mar 30, 2023

add index for range scans on views #95770

add index for range scans on views #95770

Comments

maryliag commented Jan 24, 2023 • edited by rytaft Loading

cucaroach commented Jan 24, 2023

cucaroach commented Jan 24, 2023

maryliag commented Jan 24, 2023

ajwerner commented Jan 24, 2023

mgartner commented Jan 25, 2023

ajwerner commented Jan 25, 2023

michae2 commented Feb 7, 2023

maryliag commented Feb 7, 2023

ajwerner commented Feb 7, 2023

maryliag commented Feb 7, 2023

ajwerner commented Feb 7, 2023

yuzefovich commented Feb 14, 2023

mgartner commented Mar 30, 2023

maryliag commented Jan 24, 2023 •

edited by rytaft

Loading