sql: multiple CTEs with mutations on same row can cause inconsistency #70731

michae2 · 2021-09-24T23:08:19Z

In #44466 we found that an upsert statement modifying the same row twice could lead to inconsistencies, due to the upsert operator not reading its own writes. This was fixed in #45372 by checking that the input to the upsert is distinct on PKs.
But now @florence-crl and @erikgrinaker have discovered another way to upsert the same row multiple times in a single statement: using CTEs.

For example:

CREATE TABLE a (i INT PRIMARY KEY, j INT, INDEX (j));
INSERT INTO a VALUES (0, 0);
WITH x AS (UPSERT INTO a VALUES (0, 1) RETURNING j), y AS (UPSERT INTO a VALUES (0, 2) RETURNING j) SELECT * FROM x;

Now, another execution of the last statement fails with a dupe key error (confusingly on a non-unique secondary index, but this is because the PK values are also the same, so it is a duplicate key from KV's perspective):

[email protected]:26257/defaultdb> WITH x AS (UPSERT INTO a VALUES (0, 1) RETURNING j), y AS (UPSERT INTO a VALUES (0, 2) RETURNING j) SELECT * FROM x;
ERROR: duplicate key value violates unique constraint "a_j_idx"
SQLSTATE: 23505
DETAIL: Key (j)=(1) already exists.
CONSTRAINT: a_j_idx

And furthermore we can see the inconsistency directly:

[email protected]:26257/defaultdb> SELECT i, j FROM a@primary;
  i | j
----+----
  0 | 2
(1 row)


Time: 2ms total (execution 2ms / network 0ms)

[email protected]:26257/defaultdb> SELECT i, j FROM a@a_j_idx;
  i | j
----+----
  0 | 1
  0 | 2
(2 rows)


Time: 2ms total (execution 1ms / network 0ms)

Jira issue: CRDB-10192

The text was updated successfully, but these errors were encountered:

erikgrinaker · 2021-09-25T12:00:55Z

Now, another execution of the last statement fails with a dupe key error (confusingly on a non-unique secondary index, but this is because the PK values are also the same, so it is a duplicate key from KV's perspective)

I'm not sure if this is the explanation for the bogus error. I noticed that writing to secondary indexes will use e.g. InitPut and other conditional write operations. These operations will return ConditionFailedError whenever the condition is false (e.g. in the case of InitPut, when some value already exists for the key), but SQL will simply convert all ConditionFailedError into UniquenessConstraintViolationError regardless of what the actual cause of the error was:

cockroach/pkg/sql/row/errors.go

Lines 59 to 72 in 20c81dd

    
           case *roachpb.ConditionFailedError: 
        
           	if origPErr.Index == nil { 
        
           		break 
        
           	} 
        
           	j := origPErr.Index.Index 
        
           	if j >= int32(len(b.Results)) { 
        
           		return errors.AssertionFailedf("index %d outside of results: %+v", j, b.Results) 
        
           	} 
        
           	result := b.Results[j] 
        
           	if len(result.Rows) == 0 { 
        
           		break 
        
           	} 
        
           	key := result.Rows[0].Key 
        
           	return NewUniquenessConstraintViolationError(ctx, tableDesc, key, v.ActualValue)

It seems more likely to me that this was e.g. caused by an InitPut encountering an existing key where it didn't expect one, or something similar. Or maybe it just got confused about which index was violated. May be worth looking at a trace to find out what's going on.

michae2 · 2021-09-27T23:25:57Z

Here's the KV trace:

INSERT INTO a VALUES (0, 0);

 querying next range at /NamespaceTable/30/1/50/29/"a"/4/1
 r26: sending batch 1 Get to (n1,s1):1
 CPut /Table/52/1/0/0 -> /TUPLE/2:2:Int/0
 InitPut /Table/52/2/0/0/0 -> /BYTES/
 querying next range at /Table/52/1/0/0
 r44: sending batch 1 CPut, 1 EndTxn, 1 InitPut to (n1,s1):1
 fast path completed
 rows affected: 1

WITH x AS (UPSERT INTO a VALUES (0, 1) RETURNING j), y AS (UPSERT INTO a VALUES (0, 2) RETURNING j) SELECT * FROM x;

 Scan /Table/52/1/0/0
 querying next range at /Table/52/1/0/0
 r44: sending batch 1 Get to (n1,s1):1
 fetched: /a/primary/0/j -> /0
 Put /Table/52/1/0/0 -> /TUPLE/2:2:Int/1
 Del /Table/52/2/0/0/0
 CPut /Table/52/2/1/0/0 -> /BYTES/ (expecting does not exist)
 querying next range at /Table/52/1/0/0
 r44: sending batch 1 Put, 1 CPut, 1 Del to (n1,s1):1
 Scan /Table/52/1/0/0
 querying next range at /Table/52/1/0/0
 r44: sending batch 1 Get, 1 QueryIntent to (n1,s1):1
 fetched: /a/primary/0/j -> /0                            <<<<<<<<<<<<< failing to read our own write
 Put /Table/52/1/0/0 -> /TUPLE/2:2:Int/2
 Del /Table/52/2/0/0/0                                    <<<<<<<<<<<<< should be 52/2/1/0/0
 CPut /Table/52/2/2/0/0 -> /BYTES/ (expecting does not exist)
 querying next range at /Table/52/1/0/0
 r44: sending batch 1 Put, 1 CPut, 1 Del, 1 QueryIntent to (n1,s1):1
 rows affected: 1
 querying next range at /Table/52/1/0/0
 r44: sending batch 1 EndTxn to (n1,s1):1

WITH x AS (UPSERT INTO a VALUES (0, 1) RETURNING j), y AS (UPSERT INTO a VALUES (0, 2) RETURNING j) SELECT * FROM x;

 Scan /Table/52/1/0/0
 querying next range at /Table/52/1/0/0
 r44: sending batch 1 Get to (n1,s1):1
 fetched: /a/primary/0/j -> /2
 Put /Table/52/1/0/0 -> /TUPLE/2:2:Int/1
 Del /Table/52/2/2/0/0
 CPut /Table/52/2/1/0/0 -> /BYTES/ (expecting does not exist)         <<<<<<<<<<<<<< already exists
 querying next range at /Table/52/1/0/0
 r44: sending batch 1 Put, 1 CPut, 1 Del to (n1,s1):1
 execution failed after 0 rows: duplicate key value violates unique constraint "a_j_idx"
 querying next range at /Table/52/1/0/0
 r44: sending batch 1 EndTxn to (n1,s1):1

So I think you are right.

erikgrinaker · 2021-09-28T08:52:58Z

Thanks for checking! We may want to see if we can improve the error handling here while we're at it.

michae2 · 2021-09-28T18:52:49Z

@mgartner says this is related to a problem with INSERT ON CONFLICT DO UPDATE.

Edit: this issue

michae2 · 2021-09-29T00:34:09Z

This can also be caused with UPDATE / UPDATE:

CREATE TABLE a (i INT PRIMARY KEY, j INT, INDEX (j));
INSERT INTO a VALUES (0, 0);
WITH x AS (UPDATE a SET j = 1 WHERE i = 0 RETURNING j), y AS (UPDATE a SET j = 2 WHERE i = 0 RETURNING j) SELECT * FROM x;
SELECT i, j FROM a@primary;
SELECT i, j FROM a@a_j_idx;

as well as UPDATE / DELETE:

CREATE TABLE a (i INT PRIMARY KEY, j INT, INDEX (j));
INSERT INTO a VALUES (0, 0);
WITH x AS (UPDATE a SET j = 1 WHERE i = 0 RETURNING j), y AS (DELETE FROM a WHERE i = 0 RETURNING j) SELECT * FROM x;
SELECT i, j FROM a@primary;
SELECT i, j FROM a@a_j_idx;

and probably also INSERT ON CONFLICT and many other combinations thereof.

jordanlewis · 2021-09-29T01:14:26Z

I don't understand the mechanics of this error. Reading the KV trace that Michael sent, we see that the second Scan we send, which comes after the first subquery completes and has sent its Put batch to KV, reads the initial value for the primary key:

 Scan /Table/52/1/0/0
 querying next range at /Table/52/1/0/0
 r44: sending batch 1 Get, 1 QueryIntent to (n1,s1):1
 fetched: /a/primary/0/j -> /0                            <<<<<<<<<<<<< failing to read our own write

But why is this happening? Isn't it true that we've already executed the previous Puts in KV? I would expect at this point, we'd get back our own writes when we scanned them, just like we would if we opened a transaction, wrote a row, and then read the row back. What is different here?

I'm guessing I'm missing something obvious, but I don't really understand this. This anomaly is different from #45372 - in that case, it makes sense that the operator might want to cache information - it doesn't read its own writes, but that's because it doesn't try to by flushing its batches and re-reading, not because it can't... or so I thought, anyway.

jordanlewis · 2021-09-29T01:20:42Z

In other words, why does this query fail to read its own write:

[email protected]:26257/tpcc> CREATE TABLE a (i INT PRIMARY KEY, j INT, INDEX (j));
[email protected]:26257/tpcc> INSERT INTO a VALUES (0, 0);
[email protected]:26257/tpcc> WITH x AS (UPDATE a SET j = 1 WHERE i = 0 RETURNING j) SELECT * FROM a;
  i | j
----+----
  0 | 0              -- OWN WRITE NOT READ! 0 0, not 0 1
(1 row)

But this query succeeds?

[email protected]:26257/tpcc> BEGIN;
[email protected]:26257/tpcc  OPEN> UPDATE a SET j = 2 WHERE i = 0 RETURNING j;
  j
-----
  2
(1 row)
[email protected]:26257/tpcc  OPEN> SELECT * FROM a;
  i | j
----+----
  0 | 2         -- OWN WRITE READ - 0 2 correctly returned, not 0 1
(1 row)

jordanlewis · 2021-09-29T01:50:24Z

I see, according to these comments, it's expected that mutations can't read their own writes, and we've deliberately set things up so that this is maintained.

https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/conn_executor_exec.go#L640-L646

jordanlewis · 2021-09-29T02:43:49Z

Maybe we need a special read mode for scans that returns a special flag or condition if we see a write to a key at the same sequence number that we're reading at?

In the optimizer, mark CTE clauses that modify a table that's already been modified in the same statement with a special flag
In the optimizer and execbuilder, in each marked CTE clause, further mark any operators that scan the table that's going to get modified for a second time
In the execution engine, for each marked scan operator, emit the scans with the special read mode. Then, we can error out if we notice that the keys have been written to already (or ignore the rows somehow if we want to get fancy).

@knz this intersects closely with related prior work you've done on halloween problem and sequence numbers, so I am curious for your thoughts on this problem and also this proposed strategy.

michae2 · 2021-09-29T05:01:44Z

Maybe we need a special read mode for scans that returns a special flag or condition if we see a write to a key at the same sequence number that we're reading at?

I wonder if this would return false positives for something like:

CREATE TABLE b (m INT PRIMARY KEY, n INT);
INSERT INTO b VALUES (10, 1), (20, 1), (30, -1), (40, -1);
WITH x AS (UPDATE b SET n = n + 1 WHERE n > 0 RETURNING n), y AS (UPDATE b SET n = n - 1 WHERE n < 0 RETURNING n) SELECT * FROM x;

These two UPDATE parts should modify different rows, but they will both have to scan all rows, so the second update will scan rows modified by the first.

michae2 · 2021-09-29T05:10:31Z

Thinking along the same lines, what if we try to use a special mode to catch it during the modification instead of during the scan? Instead of Put into the primary index, we could use CPut with the value the scan returned. Would that work? Or another idea is to add a CDel that verifies we're deleting the kv we think we are.

knz · 2021-09-29T12:26:51Z

The semantics in the SQL standard (and what is implemented in pg) on this is pretty clear:
Mutation CTEs must be executed sequentially and the internal nested txn sequence number must be increased after each mutation CTE.

(edit: I was wrong - see jordan's answer below)

knz · 2021-09-29T12:27:30Z

It's the process of increasing the seqnum tht makes subsequent CTEs / the main statement able to observe the previous writes.

knz · 2021-09-29T12:29:37Z

I'm also somewhat surprised -- when we implemented nested txns originally, I though we tested this case? At that time at least, I think this used to work transparently, because CTEs were executed as separate statements and the planner reset for each statement was in charge of increasing the seqnum.

jordanlewis · 2021-09-29T12:34:08Z

The semantics in the SQL standard (and what is implemented in pg) on this is pretty clear:
Mutation CTEs must be executed sequentially and the internal nested txn sequence number must be increased after each mutation CTE.

This is what I thought too, but it's not the case. All CTEs in a single statement, in Postgres, observe a single snapshot of the database. They do not read their own writes. This is easily verifiable in Postgres:

postgres=# create table test (a int);
CREATE TABLE
postgres=# with x as (insert into test values(1)) select * from test;
 a
---
(0 rows)

Postgres also is documented this way (https://www.postgresql.org/docs/current/queries-with.html):

The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables. This alleviates the effects of the unpredictability of the actual order of row updates, and means that RETURNING data is the only way to communicate changes between different WITH sub-statements and the main query. An example of this is that in

knz · 2021-09-29T12:34:33Z

oh wow TIL

knz · 2021-09-29T12:35:37Z

how does pg ensure consistency in this case?

jordanlewis · 2021-09-29T12:38:38Z

Thinking along the same lines, what if we try to use a special mode to catch it during the modification instead of during the scan? Instead of Put into the primary index, we could use CPut with the value the scan returned. Would that work? Or another idea is to add a CDel that verifies we're deleting the kv we think we are.

Yes, I think this would work, and it's a lot more straightforward than what I proposed. The main complication would be plumbing.

how does pg ensure consistency in this case?

Postgres uses MVCC information to ignore rows that have already been written in the same statement. I don't fully understand the details, but you can see the effects by searching for TM_Invisible and TM_SelfModified in their codebase. Here is a relevant snippet:

https://github.com/postgres/postgres/blob/master/src/backend/executor/nodeModifyTable.c#L2024-L2040

knz · 2021-09-29T12:41:10Z

this looks like writing at the next seqnum and reading at the current one.
Maybe we could implement this too.

jordanlewis · 2021-09-29T12:57:57Z

But I think the semantics that you are describing are the reason for the current anomaly. The issue is that if you read at seqnum X, and base your writes off of that data, but the data has been edited in seqnum X+1, you could be sending updates or dels that don't reflect reality - this situation is what causes the index corruption posted at the top of the issue.

knz · 2021-09-29T13:09:54Z

hm this will need some more thought on my side.

i do want to reflect on the solution you've proposed earlier:

In the execution engine, for each marked scan operator, emit the scans with the special read mode. Then, we can error out if we notice that the keys have been written to already (or ignore the rows somehow if we want to get fancy).

It's going to be particularly critical to ensure that any subsequent reader always executes on the gateway and does not get distributed. The KV layer does not support parallel operation using multiple nodes on a statement that interleaves reads and writes. (Unless @andreimatei can tell us this limitation has been lifted)

michae2 · 2022-07-12T23:19:31Z

I suppose the intent from the first Del /Table/108/2/444/555/333/0 could be seen by the second Del /Table/108/2/444/555/333/0?

I think there will always be some intersection of the intent sets of the two substatements. Otherwise it is "merely" a write-skew anomaly, rather than corruption.

To prevent index corruption described in cockroachdb#70731, optbuilder raises an error when a statement performs multiple mutations to the same table. This commit loosens this restriction for UDFs that perform mutations because it is overly strict. --- The index corruption described in cockroachdb#70731 occurs when a statement performs multiple writes to the same table. Any reads performed by successive writes see the snapshot of data as of the beginning of the statement. They do not read values as of the most recent write within the same statement. Because these successive writes are based on stale data, they can write incorrect KVs and cause inconsistencies between primary and secondary indexes. Each statement in a UDF body is essentially a child of the statement that is invoking the UDF. Mutations within UDFs are not as susceptible to the inconsistencies described above because a UDF with a mutation must be VOLATILE, and each statement in a VOLATILE UDFs reads at the latest sequence number. In other words, statements within UDFs can see previous writes made by any outer statement. This prevents inconsistencies due to writes based on stale reads. Therefore, the restriction that prevents multiple writes to the same table can be lifted in some cases when the writes are performed in UDFs. However, we cannot forgo restrictions for all writes in UDFs. A parent statement that calls a UDF should cannot be allowed to mutate the same table that the UDF did. Unlike subsequent statements in the UDF after the write, the parent statement will not see the UDF's writes, and inconsistencies could occur. To define acceptable mutations to the same table within UDFs, we define a statement tree that represents the hierarchy of statements and sub-statements in a query. A sub-statement `sub` is any statement within a UDF. `sub`'s parent is the statement invoking the UDF. Other statements in the same UDF as `sub` are the `sub`'s siblings. Any statements in a UDF invoked by `sub` are `sub`'s children. For example, consider: CREATE FUNCTION f1() RETURNS INT LANGUAGE SQL AS 'SELECT 1'; CREATE FUNCTION f2() RETURNS INT LANGUAGE SQL AS 'SELECT 2 + f3()'; CREATE FUNCTION f3() RETURNS INT LANGUAGE SQL AS 'SELECT 3'; SELECT f1(), f2(), f3(); The statement tree for this SELECT would be: root: SELECT f1(), f2(), f3() ├── f1: SELECT 1 ├── f2: SELECT 2 + f3() │ └── f3: SELECT 3 └── f3: SELECT 3 We define multiple mutations to the same table as safe if, for every possible path from the root statement to a leaf statement, either of the following is true: 1. There is no more than one mutation to any table. 2. Or, any table with multiple mutations is modified only by simple INSERTs without ON CONFLICT clauses. As a consequence of this definition, a UDF is now allowed to mutate the same table as long as it does so in different statements in its body. Such statements are siblings in the statement tree, and therefore do not share any path from root to leaf. For example, this is now allowed: CREATE FUNCTION ups(a1 INT, a2 INT) RETURNS VOID LANGUAGE SQL AS $$ UPSERT INTO a VALUES (a1); UPSERT INTO a VALUES (a2); $$ Similarly, successive invocations of the same UDF that mutates a table are now allowed: CREATE FUNCTION upd(k0 INT, v0 INT) RETURNS VOID LANGUAGE SQL AS $$ UPDATE kv SET v = v0 WHERE k = k0; $$; SELECT upd(1, 2), upd(1, 3); The `statementTree` data structure has been added to enforce this definition. See its documentation for more details. Note: These restrictions will likely need to be revisited once we support recursive UDFs. Epic: CRDB-25388 Informs cockroachdb#70731 Release note: None

To prevent index corruption described in cockroachdb#70731, optbuilder raises an error when a statement performs multiple mutations to the same table. This commit loosens this restriction for UDFs that perform mutations because it is overly strict. --- The index corruption described in cockroachdb#70731 occurs when a statement performs multiple writes to the same table. Any reads performed by successive writes see the snapshot of data as of the beginning of the statement. They do not read values as of the most recent write within the same statement. Because these successive writes are based on stale data, they can write incorrect KVs and cause inconsistencies between primary and secondary indexes. Each statement in a UDF body is essentially a child of the statement that is invoking the UDF. Mutations within UDFs are not as susceptible to the inconsistencies described above because a UDF with a mutation must be VOLATILE, and each statement in a VOLATILE UDFs reads at the latest sequence number. In other words, statements within UDFs can see previous writes made by any outer statement. This prevents inconsistencies due to writes based on stale reads. Therefore, the restriction that prevents multiple writes to the same table can be lifted in some cases when the writes are performed in UDFs. However, we cannot forgo restrictions for all writes in UDFs. A parent statement that calls a UDF cannot be allowed to mutate the same table that the UDF did. Unlike subsequent statements in the UDF after the write, the parent statement will not see the UDF's writes, and inconsistencies could occur. To define acceptable mutations to the same table within UDFs, we define a statement tree that represents the hierarchy of statements and sub-statements in a query. A sub-statement `sub` is any statement within a UDF. `sub`'s parent is the statement invoking the UDF. Other statements in the same UDF as `sub` are the `sub`'s siblings. Any statements in a UDF invoked by `sub` are `sub`'s children. For example, consider: CREATE FUNCTION f1() RETURNS INT LANGUAGE SQL AS 'SELECT 1'; CREATE FUNCTION f2() RETURNS INT LANGUAGE SQL AS 'SELECT 2 + f3()'; CREATE FUNCTION f3() RETURNS INT LANGUAGE SQL AS 'SELECT 3'; SELECT f1(), f2(), f3(); The statement tree for this SELECT would be: root: SELECT f1(), f2(), f3() ├── f1: SELECT 1 ├── f2: SELECT 2 + f3() │ └── f3: SELECT 3 └── f3: SELECT 3 We define multiple mutations to the same table as safe if, for every possible path from the root statement to a leaf statement, either of the following is true: 1. There is no more than one mutation to any table. 2. Or, any table with multiple mutations is modified only by simple INSERTs without ON CONFLICT clauses. As a consequence of this definition, a UDF is now allowed to mutate the same table as long as it does so in different statements in its body. Such statements are siblings in the statement tree, and therefore do not share any path from root to leaf. For example, this is now allowed: CREATE FUNCTION ups(a1 INT, a2 INT) RETURNS VOID LANGUAGE SQL AS $$ UPSERT INTO a VALUES (a1); UPSERT INTO a VALUES (a2); $$ Similarly, successive invocations of the same UDF that mutates a table are now allowed: CREATE FUNCTION upd(k0 INT, v0 INT) RETURNS VOID LANGUAGE SQL AS $$ UPDATE kv SET v = v0 WHERE k = k0; $$; SELECT upd(1, 2), upd(1, 3); The `statementTree` data structure has been added to enforce this definition. See its documentation for more details. Note: These restrictions will likely need to be revisited once we support recursive UDFs. Epic: CRDB-25388 Informs cockroachdb#70731 Release note: None

103920: opt: loosen restriction on UDF mutations to the same table r=mgartner a=mgartner #### opt: loosen restriction on UDF mutations to the same table To prevent index corruption described in #70731, optbuilder raises an error when a statement performs multiple mutations to the same table. This commit loosens this restriction for UDFs that perform mutations because it is overly strict. --- The index corruption described in #70731 occurs when a statement performs multiple writes to the same table. Any reads performed by successive writes see the snapshot of data as of the beginning of the statement. They do not read values as of the most recent write within the same statement. Because these successive writes are based on stale data, they can write incorrect KVs and cause inconsistencies between primary and secondary indexes. Each statement in a UDF body is essentially a child of the statement that is invoking the UDF. Mutations within UDFs are not as susceptible to the inconsistencies described above because a UDF with a mutation must be VOLATILE, and each statement in a VOLATILE UDFs reads at the latest sequence number. In other words, statements within UDFs can see previous writes made by any outer statement. This prevents inconsistencies due to writes based on stale reads. Therefore, the restriction that prevents multiple writes to the same table can be lifted in some cases when the writes are performed in UDFs. However, we cannot forgo restrictions for all writes in UDFs. A parent statement that calls a UDF cannot be allowed to mutate the same table that the UDF did. Unlike subsequent statements in the UDF after the write, the parent statement will not see the UDF's writes, and inconsistencies could occur. To define acceptable mutations to the same table within UDFs, we define a statement tree that represents the hierarchy of statements and sub-statements in a query. A sub-statement `sub` is any statement within a UDF. `sub`'s parent is the statement invoking the UDF. Other statements in the same UDF as `sub` are the `sub`'s siblings. Any statements in a UDF invoked by `sub` are `sub`'s children. For example, consider: CREATE FUNCTION f1() RETURNS INT LANGUAGE SQL AS 'SELECT 1'; CREATE FUNCTION f2() RETURNS INT LANGUAGE SQL AS 'SELECT 2 + f3()'; CREATE FUNCTION f3() RETURNS INT LANGUAGE SQL AS 'SELECT 3'; SELECT f1(), f2(), f3(); The statement tree for this SELECT would be: root: SELECT f1(), f2(), f3() ├── f1: SELECT 1 ├── f2: SELECT 2 + f3() │ └── f3: SELECT 3 └── f3: SELECT 3 We define multiple mutations to the same table as safe if, for every possible path from the root statement to a leaf statement, either of the following is true: 1. There is no more than one mutation to any table. 2. Or, any table with multiple mutations is modified only by simple INSERTs without ON CONFLICT clauses. As a consequence of this definition, a UDF is now allowed to mutate the same table as long as it does so in different statements in its body. Such statements are siblings in the statement tree, and therefore do not share any path from root to leaf. For example, this is now allowed: CREATE FUNCTION ups(a1 INT, a2 INT) RETURNS VOID LANGUAGE SQL AS $$ UPSERT INTO a VALUES (a1); UPSERT INTO a VALUES (a2); $$ Similarly, successive invocations of the same UDF that mutates a table are now allowed: CREATE FUNCTION upd(k0 INT, v0 INT) RETURNS VOID LANGUAGE SQL AS $$ UPDATE kv SET v = v0 WHERE k = k0; $$; SELECT upd(1, 2), upd(1, 3); The `statementTree` data structure has been added to enforce this definition. See its documentation for more details. Note: These restrictions will likely need to be revisited once we support recursive UDFs. Epic: CRDB-25388 Informs #70731 Release note: None 106354: sql: report contention on writes in EXPLAIN ANALYZE r=yuzefovich a=yuzefovich This commit adds the contention events listener to `planNodeToRowSource` which allows us to add contention time information for mutation planNodes to be shown in EXPLAIN ANALYZE. Fixes: #106266. Release note (sql change): CockroachDB now reports contention time encountered while executing mutation statements (INSERT, UPSERT, UPDATE, DELETE) when run via EXPLAIN ANALYZE. 106398: ui: fix infinite re-render on the key visualizer page r=zachlite a=zachlite Since #101258, the TimeScaleDropdownWithSearchParams can cause infinite re-renders. The exact cause of the bug is not yet diagnosed, but it occurs on the key visualizer page, and seems to be related to the custom duration options. This is tracked in #106395. This commit removes the dependency on TimeScaleDropdownWithSearchParams from the key visualizer, and replaces it with the vanilla TimeScaleDropdown. The custom duration options are still present. Informs: #106395 Epic: none Release note: None 106409: workload: ignore QueryCanceled in random schema change test r=rafiss a=rafiss fixes #105299 Release note: None Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: zachlite <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>

There are currently some situations where a query that modifies the same table in multiple locations may cause index corruption (cockroachdb#70731). To avoid this, we disallow query structures that may lead to a problematic combination of mutations. Triggers require special handling to make this check work, because they can execute arbitrary SQL statements, which can mutate a table directly or through routines, FK cascades, or other triggers. BEFORE triggers on the main query "just work" because they are built as UDF invocations as part of the main query. AFTER triggers and BEFORE triggers fired on cascades are more difficult, because they are planned lazily only if the post-query has rows to process. This commit adds logic to track invalid mutations for both types of triggers. We now propagate the "ancestor" mutated tables whenever planning a post-query, so that any triggers planned as part of the post-query can detect conflicting mutations. See the "After Triggers" section in `statement_tree.go` for additional explanation. Informs cockroachdb#70731 Release note (bug fix): Previously, it was possible to cause index corruption using AFTER-triggers that fire within a routine. In order for the bug to manifest, both the AFTER-trigger and the statement that invokes the routine must mutate the same row of a table with a mutation other than `INSERT`.

There are currently some situations where a query that modifies the same table in multiple locations may cause index corruption (cockroachdb#70731). To avoid this, we disallow query structures that may lead to a problematic combination of mutations. Triggers require special handling to make this check work, because they can execute arbitrary SQL statements, which can mutate a table directly or through routines, FK cascades, or other triggers. BEFORE triggers on the main query "just work" because they are built as UDF invocations as part of the main query. AFTER triggers and BEFORE triggers fired on cascades are more difficult, because they are planned lazily only if the post-query has rows to process. This commit adds logic to track invalid mutations for both types of triggers. We now propagate the "ancestor" mutated tables whenever planning a post-query, so that any triggers planned as part of the post-query can detect conflicting mutations. See the "After Triggers" section in `statement_tree.go` for additional explanation. Informs cockroachdb#70731 Release note (bug fix): Fixed possible index corruption caused by triggers that could occur when the following conditions were satisfied: 1. A query calls a UDF or stored procedure, and also performs a mutation on a table. 2. The UDF/SP contains a statement that either fires an AFTER trigger, or fires a cascade that itself fires a trigger. 3. The trigger modifies the same row as the outer statement. 4. Either the outer or inner mutation is something other than an INSERT without an `ON CONFLICT` clause.

136076: sql: check for multiple mutations to the same table by triggers r=DrewKimball a=DrewKimball #### sql: refactor some cascade/trigger logic This commit refactors some of the logic shared between cascades and AFTER triggers. This will make the following commit easier to understand. Epic: None Release note: None #### sql: check for multiple mutations to the same table by triggers There are currently some situations where a query that modifies the same table in multiple locations may cause index corruption (#70731). To avoid this, we disallow query structures that may lead to a problematic combination of mutations. Triggers require special handling to make this check work, because they can execute arbitrary SQL statements, which can mutate a table directly or through routines, FK cascades, or other triggers. BEFORE triggers on the main query "just work" because they are built as UDF invocations as part of the main query. AFTER triggers and BEFORE triggers fired on cascades are more difficult, because they are planned lazily only if the post-query has rows to process. This commit adds logic to track invalid mutations for both types of triggers. We now propagate the "ancestor" mutated tables whenever planning a post-query, so that any triggers planned as part of the post-query can detect conflicting mutations. See the "After Triggers" section in `statement_tree.go` for additional explanation. Informs #70731 Release note (bug fix): Previously, it was possible to cause index corruption using AFTER-triggers that fire within a routine. In order for the bug to manifest, both the AFTER-trigger and the statement that invokes the routine must mutate the same row of a table with a mutation other than `INSERT`. Co-authored-by: Drew Kimball <[email protected]>

There are currently some situations where a query that modifies the same table in multiple locations may cause index corruption (cockroachdb#70731). To avoid this, we disallow query structures that may lead to a problematic combination of mutations. Triggers require special handling to make this check work, because they can execute arbitrary SQL statements, which can mutate a table directly or through routines, FK cascades, or other triggers. BEFORE triggers on the main query "just work" because they are built as UDF invocations as part of the main query. AFTER triggers and BEFORE triggers fired on cascades are more difficult, because they are planned lazily only if the post-query has rows to process. This commit adds logic to track invalid mutations for both types of triggers. We now propagate the "ancestor" mutated tables whenever planning a post-query, so that any triggers planned as part of the post-query can detect conflicting mutations. See the "After Triggers" section in `statement_tree.go` for additional explanation. Informs cockroachdb#70731 Release note (bug fix): Fixed possible index corruption caused by triggers that could occur when the following conditions were satisfied: 1. A query calls a UDF or stored procedure, and also performs a mutation on a table. 2. The UDF/SP contains a statement that either fires an AFTER trigger, or fires a cascade that itself fires a trigger. 3. The trigger modifies the same row as the outer statement. 4. Either the outer or inner mutation is something other than an INSERT without an `ON CONFLICT` clause.

michae2 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-execution Relating to SQL execution. T-sql-queries SQL Queries Team labels Sep 24, 2021

michae2 self-assigned this Sep 28, 2021

mgartner mentioned this issue Sep 1, 2022

sql: support mutations within UDFs #87289

Closed

exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label Nov 3, 2022

rharding6373 mentioned this issue May 10, 2023

sql: enable some mutations in udfs #102773

Merged

mgartner mentioned this issue May 25, 2023

opt: loosen restriction on UDF mutations to the same table #103920

Merged

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to New Backlog in SQL Queries Jul 24, 2023

yuzefovich added the T-sql-queries SQL Queries Team label May 2, 2024

DrewKimball mentioned this issue Nov 24, 2024

sql: check for multiple mutations to the same table by triggers #136076

Merged

DrewKimball mentioned this issue Jan 7, 2025

release-24.3: sql: check for multiple mutations to the same table by triggers #138361

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: multiple CTEs with mutations on same row can cause inconsistency #70731

sql: multiple CTEs with mutations on same row can cause inconsistency #70731

michae2 commented Sep 24, 2021 •

edited by cockroach-jira-scripts

Loading

erikgrinaker commented Sep 25, 2021 •

edited

Loading

michae2 commented Sep 27, 2021

erikgrinaker commented Sep 28, 2021

michae2 commented Sep 28, 2021 •

edited

Loading

michae2 commented Sep 29, 2021

jordanlewis commented Sep 29, 2021

jordanlewis commented Sep 29, 2021 •

edited

Loading

jordanlewis commented Sep 29, 2021

jordanlewis commented Sep 29, 2021

michae2 commented Sep 29, 2021

michae2 commented Sep 29, 2021

knz commented Sep 29, 2021 •

edited

Loading

knz commented Sep 29, 2021

knz commented Sep 29, 2021

jordanlewis commented Sep 29, 2021 •

edited

Loading

knz commented Sep 29, 2021

knz commented Sep 29, 2021

jordanlewis commented Sep 29, 2021

knz commented Sep 29, 2021

jordanlewis commented Sep 29, 2021

knz commented Sep 29, 2021

michae2 commented Jul 12, 2022

sql: multiple CTEs with mutations on same row can cause inconsistency #70731

sql: multiple CTEs with mutations on same row can cause inconsistency #70731

Comments

michae2 commented Sep 24, 2021 • edited by cockroach-jira-scripts Loading

erikgrinaker commented Sep 25, 2021 • edited Loading

michae2 commented Sep 27, 2021

erikgrinaker commented Sep 28, 2021

michae2 commented Sep 28, 2021 • edited Loading

michae2 commented Sep 29, 2021

jordanlewis commented Sep 29, 2021

jordanlewis commented Sep 29, 2021 • edited Loading

jordanlewis commented Sep 29, 2021

jordanlewis commented Sep 29, 2021

michae2 commented Sep 29, 2021

michae2 commented Sep 29, 2021

knz commented Sep 29, 2021 • edited Loading

knz commented Sep 29, 2021

knz commented Sep 29, 2021

jordanlewis commented Sep 29, 2021 • edited Loading

knz commented Sep 29, 2021

knz commented Sep 29, 2021

jordanlewis commented Sep 29, 2021

knz commented Sep 29, 2021

jordanlewis commented Sep 29, 2021

knz commented Sep 29, 2021

michae2 commented Jul 12, 2022

michae2 commented Sep 24, 2021 •

edited by cockroach-jira-scripts

Loading

erikgrinaker commented Sep 25, 2021 •

edited

Loading

michae2 commented Sep 28, 2021 •

edited

Loading

jordanlewis commented Sep 29, 2021 •

edited

Loading

knz commented Sep 29, 2021 •

edited

Loading

jordanlewis commented Sep 29, 2021 •

edited

Loading