Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: SELECT FOR UPDATE not able to be optimized away #114282

Open
Tracked by #114737
michae2 opened this issue Nov 11, 2023 · 4 comments
Open
Tracked by #114737

sql: SELECT FOR UPDATE not able to be optimized away #114282

michae2 opened this issue Nov 11, 2023 · 4 comments
Labels
A-read-committed Related to the introduction of Read Committed A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-sql-queries SQL Queries Team

Comments

@michae2
Copy link
Collaborator

michae2 commented Nov 11, 2023

This exact combination of SELECT FOR UPDATE, EXISTS, and NULL parameter using a prepared statement is able to be optimized to a constant false in 22.2.16 but becomes a full table scan in 23.1.11:

CREATE TABLE a (a INT, INDEX (a));
PREPARE p AS SELECT EXISTS (SELECT NULL FROM a WHERE a = $1 FOR UPDATE);
EXPLAIN ANALYZE EXECUTE p (NULL);

Here's v22.2.16:

[email protected]:26257/defaultdb> CREATE TABLE a (a INT, INDEX (a));                                                                                                                                                                                                                                                                                                                                                                          PREPARE p AS SELECT EXISTS (SELECT NULL FROM a WHERE a = $1 FOR UPDATE);                                                                                                                                                                                                                                                                                                                                                                    EXPLAIN ANALYZE EXECUTE p (NULL);
CREATE TABLE


Time: 2ms total (execution 2ms / network 0ms)

PREPARE


Time: 5ms total (execution 4ms / network 0ms)

               info
-----------------------------------
  planning time: 406µs
  execution time: 476µs
  distribution: local
  vectorized: true
  maximum memory usage: 10 KiB
  network usage: 0 B (0 messages)
  regions: us-east1

  • values
    nodes: n1
    regions: us-east1
    actual row count: 1
    size: 1 column, 1 row
(13 rows)


Time: 1ms total (execution 1ms / network 0ms)

Here's v23.1.11:

[email protected]:26257/defaultdb> CREATE TABLE a (a INT, INDEX (a));
                             -> PREPARE p AS SELECT EXISTS (SELECT NULL FROM a WHERE a = $1 FOR UPDATE);
                             -> EXPLAIN ANALYZE EXECUTE p (NULL);
                             ->
CREATE TABLE

Time: 2ms total (execution 2ms / network 0ms)

PREPARE

Time: 8ms total (execution 8ms / network 0ms)

                                info
--------------------------------------------------------------------
  planning time: 124µs
  execution time: 372µs
  distribution: local
  vectorized: true
  cumulative time spent in KV: 101µs
  maximum memory usage: 30 KiB
  network usage: 0 B (0 messages)
  regions: us-east1
  sql cpu time: 39µs

  • root
  │
  ├── • values
  │     nodes: n1
  │     regions: us-east1
  │     actual row count: 1
  │     sql cpu time: 11µs
  │     size: 1 column, 1 row
  │
  └── • subquery
      │ id: @S1
      │ original sql: (SELECT NULL FROM a WHERE a = $1 FOR UPDATE)
      │ exec mode: one row
      │
      └── • render
          │
          └── • filter
              │ nodes: n1
              │ regions: us-east1
              │ actual row count: 0
              │ sql cpu time: 24µs
              │ estimated row count: 0
              │ filter: false
              │
              └── • scan
                    nodes: n1
                    regions: us-east1
                    actual row count: 0
                    KV time: 101µs
                    KV contention time: 0µs
                    KV rows read: 0
                    KV bytes read: 0 B
                    KV gRPC calls: 1
                    estimated max memory allocated: 20 KiB
                    sql cpu time: 4µs
                    missing stats
                    table: a@a_pkey
                    spans: FULL SCAN
                    locking strength: for update
(49 rows)

Time: 1ms total (execution 1ms / network 0ms)

In v23.2.0-alpha.6 it's better if we use the new SELECT FOR UPDATE behavior (optimizer_use_lock_op_for_serializable):

[email protected]:26257/system/defaultdb> SET optimizer_use_lock_op_for_serializable = true;
SET

Time: 0ms total (execution 0ms / network 0ms)

[email protected]:26257/system/defaultdb> EXPLAIN ANALYZE EXECUTE p (NULL);
                                info
--------------------------------------------------------------------
  planning time: 1ms
  execution time: 201µs
  distribution: local
  vectorized: true
  maximum memory usage: 10 KiB
  network usage: 0 B (0 messages)
  regions: us-east1
  isolation level: serializable
  priority: normal
  quality of service: regular

  • root
  │
  ├── • values
  │     nodes: n1
  │     regions: us-east1
  │     actual row count: 1
  │     size: 1 column, 1 row
  │
  └── • subquery
      │ id: @S1
      │ original sql: (SELECT NULL FROM a WHERE a = $1 FOR UPDATE)
      │ exec mode: one row
      │
      └── • render
          │
          └── • lookup join (semi)
              │ nodes: n1
              │ regions: us-east1
              │ actual row count: 0
              │ KV time: 0µs
              │ KV contention time: 0µs
              │ KV rows decoded: 0
              │ KV bytes read: 0 B
              │ KV gRPC calls: 0
              │ estimated max memory allocated: 0 B
              │ estimated row count: 0
              │ table: a@a_pkey
              │ equality: (rowid) = (rowid)
              │ equality cols are key
              │ locking strength: for update
              │
              └── • norows
                    nodes: n1
                    regions: us-east1
                    actual row count: 0
(46 rows)

Time: 2ms total (execution 2ms / network 0ms)

Jira issue: CRDB-33431

@michae2 michae2 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-optimizer SQL logical planning and optimizations. T-sql-queries SQL Queries Team O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs labels Nov 11, 2023
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Nov 11, 2023
@DrewKimball
Copy link
Collaborator

We don't remove the scan with a normal SQL statement, either:

root@localhost:26257/system/defaultdb> explain (opt, verbose) SELECT NULL FROM a WHERE a = NULL FOR UPDATE;
                            info
-------------------------------------------------------------
  project
   ├── columns: "?column?":5
   ├── cardinality: [0 - 0]
   ├── volatile
   ├── stats: [rows=0]
   ├── cost: 1058.25
   ├── fd: ()-->(5)
   ├── prune: (5)
   ├── select
   │    ├── cardinality: [0 - 0]
   │    ├── volatile
   │    ├── stats: [rows=0]
   │    ├── cost: 1058.24
   │    ├── scan a
   │    │    ├── locking: for-update
   │    │    ├── volatile
   │    │    ├── stats: [rows=1000]
   │    │    └── cost: 1048.22
   │    └── filters
   │         └── false [constraints=(contradiction; tight)]
   └── projections
        └── NULL [as="?column?":5]
(22 rows)

Time: 11ms total (execution 11ms / network 0ms)

We don't fire the rule that removes a zero-cardinality group when the expression isn't leakproof. The scan isn't leakproof here because of the locking. I'm not sure what was removing the scan in 22.2 that's different - it must have been some other rule that doesn't check this.

@DrewKimball
Copy link
Collaborator

We probably shouldn't just remove locking (or any database state changes, like mutations), but in this case the SELECT FOR UPDATE only applies to rows that pass the filter, which is none. So this is potentially an instance of #75457.

Also, we should probably be removing that lookup join too even though it's locking, since it should be guaranteed not to do any lookups.

@DrewKimball
Copy link
Collaborator

Also, possible dupe of #73074

@michae2 michae2 moved this from Triage to 24.1 Release in SQL Queries Nov 20, 2023
@michae2 michae2 added A-read-committed Related to the introduction of Read Committed P-3 Issues/test failures with no fix SLA labels Nov 20, 2023
@michae2
Copy link
Collaborator Author

michae2 commented Nov 20, 2023

The lack of optimization of SFU does seem like a dupe of #73074, which is improved in 23.2 when using the new SFU implementation, so the only remaining question here is why we were optimizing at all in 22.2. I think there is some other minor difference due to the prepared statement. I"m going to keep this open because of that.

@mgartner mgartner moved this from 24.1 Release to New Backlog in SQL Queries Nov 28, 2023
@DrewKimball DrewKimball changed the title sql: prepared SELECT FOR UPDATE not able to be optimized away sql: SELECT FOR UPDATE not able to be optimized away Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-read-committed Related to the introduction of Read Committed A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

2 participants