-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: multiple CTEs with mutations on same row can cause inconsistency #70731
Comments
I'm not sure if this is the explanation for the bogus error. I noticed that writing to secondary indexes will use e.g. cockroach/pkg/sql/row/errors.go Lines 59 to 72 in 20c81dd
It seems more likely to me that this was e.g. caused by an |
Here's the KV trace:
So I think you are right. |
Thanks for checking! We may want to see if we can improve the error handling here while we're at it. |
@mgartner says this is related to a problem with Edit: this issue |
This can also be caused with CREATE TABLE a (i INT PRIMARY KEY, j INT, INDEX (j));
INSERT INTO a VALUES (0, 0);
WITH x AS (UPDATE a SET j = 1 WHERE i = 0 RETURNING j), y AS (UPDATE a SET j = 2 WHERE i = 0 RETURNING j) SELECT * FROM x;
SELECT i, j FROM a@primary;
SELECT i, j FROM a@a_j_idx; as well as CREATE TABLE a (i INT PRIMARY KEY, j INT, INDEX (j));
INSERT INTO a VALUES (0, 0);
WITH x AS (UPDATE a SET j = 1 WHERE i = 0 RETURNING j), y AS (DELETE FROM a WHERE i = 0 RETURNING j) SELECT * FROM x;
SELECT i, j FROM a@primary;
SELECT i, j FROM a@a_j_idx; and probably also |
I don't understand the mechanics of this error. Reading the KV trace that Michael sent, we see that the second Scan we send, which comes after the first subquery completes and has sent its Put batch to KV, reads the initial value for the primary key:
But why is this happening? Isn't it true that we've already executed the previous Puts in KV? I would expect at this point, we'd get back our own writes when we scanned them, just like we would if we opened a transaction, wrote a row, and then read the row back. What is different here? I'm guessing I'm missing something obvious, but I don't really understand this. This anomaly is different from #45372 - in that case, it makes sense that the operator might want to cache information - it doesn't read its own writes, but that's because it doesn't try to by flushing its batches and re-reading, not because it can't... or so I thought, anyway. |
In other words, why does this query fail to read its own write:
But this query succeeds?
|
I see, according to these comments, it's expected that mutations can't read their own writes, and we've deliberately set things up so that this is maintained. https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/conn_executor_exec.go#L640-L646 |
Maybe we need a special read mode for scans that returns a special flag or condition if we see a write to a key at the same sequence number that we're reading at?
@knz this intersects closely with related prior work you've done on halloween problem and sequence numbers, so I am curious for your thoughts on this problem and also this proposed strategy. |
I wonder if this would return false positives for something like: CREATE TABLE b (m INT PRIMARY KEY, n INT);
INSERT INTO b VALUES (10, 1), (20, 1), (30, -1), (40, -1);
WITH x AS (UPDATE b SET n = n + 1 WHERE n > 0 RETURNING n), y AS (UPDATE b SET n = n - 1 WHERE n < 0 RETURNING n) SELECT * FROM x; These two |
Thinking along the same lines, what if we try to use a special mode to catch it during the modification instead of during the scan? Instead of |
(edit: I was wrong - see jordan's answer below) |
It's the process of increasing the seqnum tht makes subsequent CTEs / the main statement able to observe the previous writes. |
I'm also somewhat surprised -- when we implemented nested txns originally, I though we tested this case? At that time at least, I think this used to work transparently, because CTEs were executed as separate statements and the planner reset for each statement was in charge of increasing the seqnum. |
This is what I thought too, but it's not the case. All CTEs in a single statement, in Postgres, observe a single snapshot of the database. They do not read their own writes. This is easily verifiable in Postgres:
Postgres also is documented this way (https://www.postgresql.org/docs/current/queries-with.html):
|
oh wow TIL |
how does pg ensure consistency in this case? |
Yes, I think this would work, and it's a lot more straightforward than what I proposed. The main complication would be plumbing.
Postgres uses MVCC information to ignore rows that have already been written in the same statement. I don't fully understand the details, but you can see the effects by searching for |
this looks like writing at the next seqnum and reading at the current one. |
But I think the semantics that you are describing are the reason for the current anomaly. The issue is that if you read at seqnum X, and base your writes off of that data, but the data has been edited in seqnum X+1, you could be sending updates or dels that don't reflect reality - this situation is what causes the index corruption posted at the top of the issue. |
hm this will need some more thought on my side. i do want to reflect on the solution you've proposed earlier:
It's going to be particularly critical to ensure that any subsequent reader always executes on the gateway and does not get distributed. The KV layer does not support parallel operation using multiple nodes on a statement that interleaves reads and writes. (Unless @andreimatei can tell us this limitation has been lifted) |
I think there will always be some intersection of the intent sets of the two substatements. Otherwise it is "merely" a write-skew anomaly, rather than corruption. |
To prevent index corruption described in cockroachdb#70731, optbuilder raises an error when a statement performs multiple mutations to the same table. This commit loosens this restriction for UDFs that perform mutations because it is overly strict. --- The index corruption described in cockroachdb#70731 occurs when a statement performs multiple writes to the same table. Any reads performed by successive writes see the snapshot of data as of the beginning of the statement. They do not read values as of the most recent write within the same statement. Because these successive writes are based on stale data, they can write incorrect KVs and cause inconsistencies between primary and secondary indexes. Each statement in a UDF body is essentially a child of the statement that is invoking the UDF. Mutations within UDFs are not as susceptible to the inconsistencies described above because a UDF with a mutation must be VOLATILE, and each statement in a VOLATILE UDFs reads at the latest sequence number. In other words, statements within UDFs can see previous writes made by any outer statement. This prevents inconsistencies due to writes based on stale reads. Therefore, the restriction that prevents multiple writes to the same table can be lifted in some cases when the writes are performed in UDFs. However, we cannot forgo restrictions for all writes in UDFs. A parent statement that calls a UDF should cannot be allowed to mutate the same table that the UDF did. Unlike subsequent statements in the UDF after the write, the parent statement will not see the UDF's writes, and inconsistencies could occur. To define acceptable mutations to the same table within UDFs, we define a statement tree that represents the hierarchy of statements and sub-statements in a query. A sub-statement `sub` is any statement within a UDF. `sub`'s parent is the statement invoking the UDF. Other statements in the same UDF as `sub` are the `sub`'s siblings. Any statements in a UDF invoked by `sub` are `sub`'s children. For example, consider: CREATE FUNCTION f1() RETURNS INT LANGUAGE SQL AS 'SELECT 1'; CREATE FUNCTION f2() RETURNS INT LANGUAGE SQL AS 'SELECT 2 + f3()'; CREATE FUNCTION f3() RETURNS INT LANGUAGE SQL AS 'SELECT 3'; SELECT f1(), f2(), f3(); The statement tree for this SELECT would be: root: SELECT f1(), f2(), f3() ├── f1: SELECT 1 ├── f2: SELECT 2 + f3() │ └── f3: SELECT 3 └── f3: SELECT 3 We define multiple mutations to the same table as safe if, for every possible path from the root statement to a leaf statement, either of the following is true: 1. There is no more than one mutation to any table. 2. Or, any table with multiple mutations is modified only by simple INSERTs without ON CONFLICT clauses. As a consequence of this definition, a UDF is now allowed to mutate the same table as long as it does so in different statements in its body. Such statements are siblings in the statement tree, and therefore do not share any path from root to leaf. For example, this is now allowed: CREATE FUNCTION ups(a1 INT, a2 INT) RETURNS VOID LANGUAGE SQL AS $$ UPSERT INTO a VALUES (a1); UPSERT INTO a VALUES (a2); $$ Similarly, successive invocations of the same UDF that mutates a table are now allowed: CREATE FUNCTION upd(k0 INT, v0 INT) RETURNS VOID LANGUAGE SQL AS $$ UPDATE kv SET v = v0 WHERE k = k0; $$; SELECT upd(1, 2), upd(1, 3); The `statementTree` data structure has been added to enforce this definition. See its documentation for more details. Note: These restrictions will likely need to be revisited once we support recursive UDFs. Epic: CRDB-25388 Informs cockroachdb#70731 Release note: None
To prevent index corruption described in cockroachdb#70731, optbuilder raises an error when a statement performs multiple mutations to the same table. This commit loosens this restriction for UDFs that perform mutations because it is overly strict. --- The index corruption described in cockroachdb#70731 occurs when a statement performs multiple writes to the same table. Any reads performed by successive writes see the snapshot of data as of the beginning of the statement. They do not read values as of the most recent write within the same statement. Because these successive writes are based on stale data, they can write incorrect KVs and cause inconsistencies between primary and secondary indexes. Each statement in a UDF body is essentially a child of the statement that is invoking the UDF. Mutations within UDFs are not as susceptible to the inconsistencies described above because a UDF with a mutation must be VOLATILE, and each statement in a VOLATILE UDFs reads at the latest sequence number. In other words, statements within UDFs can see previous writes made by any outer statement. This prevents inconsistencies due to writes based on stale reads. Therefore, the restriction that prevents multiple writes to the same table can be lifted in some cases when the writes are performed in UDFs. However, we cannot forgo restrictions for all writes in UDFs. A parent statement that calls a UDF should cannot be allowed to mutate the same table that the UDF did. Unlike subsequent statements in the UDF after the write, the parent statement will not see the UDF's writes, and inconsistencies could occur. To define acceptable mutations to the same table within UDFs, we define a statement tree that represents the hierarchy of statements and sub-statements in a query. A sub-statement `sub` is any statement within a UDF. `sub`'s parent is the statement invoking the UDF. Other statements in the same UDF as `sub` are the `sub`'s siblings. Any statements in a UDF invoked by `sub` are `sub`'s children. For example, consider: CREATE FUNCTION f1() RETURNS INT LANGUAGE SQL AS 'SELECT 1'; CREATE FUNCTION f2() RETURNS INT LANGUAGE SQL AS 'SELECT 2 + f3()'; CREATE FUNCTION f3() RETURNS INT LANGUAGE SQL AS 'SELECT 3'; SELECT f1(), f2(), f3(); The statement tree for this SELECT would be: root: SELECT f1(), f2(), f3() ├── f1: SELECT 1 ├── f2: SELECT 2 + f3() │ └── f3: SELECT 3 └── f3: SELECT 3 We define multiple mutations to the same table as safe if, for every possible path from the root statement to a leaf statement, either of the following is true: 1. There is no more than one mutation to any table. 2. Or, any table with multiple mutations is modified only by simple INSERTs without ON CONFLICT clauses. As a consequence of this definition, a UDF is now allowed to mutate the same table as long as it does so in different statements in its body. Such statements are siblings in the statement tree, and therefore do not share any path from root to leaf. For example, this is now allowed: CREATE FUNCTION ups(a1 INT, a2 INT) RETURNS VOID LANGUAGE SQL AS $$ UPSERT INTO a VALUES (a1); UPSERT INTO a VALUES (a2); $$ Similarly, successive invocations of the same UDF that mutates a table are now allowed: CREATE FUNCTION upd(k0 INT, v0 INT) RETURNS VOID LANGUAGE SQL AS $$ UPDATE kv SET v = v0 WHERE k = k0; $$; SELECT upd(1, 2), upd(1, 3); The `statementTree` data structure has been added to enforce this definition. See its documentation for more details. Note: These restrictions will likely need to be revisited once we support recursive UDFs. Epic: CRDB-25388 Informs cockroachdb#70731 Release note: None
To prevent index corruption described in cockroachdb#70731, optbuilder raises an error when a statement performs multiple mutations to the same table. This commit loosens this restriction for UDFs that perform mutations because it is overly strict. --- The index corruption described in cockroachdb#70731 occurs when a statement performs multiple writes to the same table. Any reads performed by successive writes see the snapshot of data as of the beginning of the statement. They do not read values as of the most recent write within the same statement. Because these successive writes are based on stale data, they can write incorrect KVs and cause inconsistencies between primary and secondary indexes. Each statement in a UDF body is essentially a child of the statement that is invoking the UDF. Mutations within UDFs are not as susceptible to the inconsistencies described above because a UDF with a mutation must be VOLATILE, and each statement in a VOLATILE UDFs reads at the latest sequence number. In other words, statements within UDFs can see previous writes made by any outer statement. This prevents inconsistencies due to writes based on stale reads. Therefore, the restriction that prevents multiple writes to the same table can be lifted in some cases when the writes are performed in UDFs. However, we cannot forgo restrictions for all writes in UDFs. A parent statement that calls a UDF cannot be allowed to mutate the same table that the UDF did. Unlike subsequent statements in the UDF after the write, the parent statement will not see the UDF's writes, and inconsistencies could occur. To define acceptable mutations to the same table within UDFs, we define a statement tree that represents the hierarchy of statements and sub-statements in a query. A sub-statement `sub` is any statement within a UDF. `sub`'s parent is the statement invoking the UDF. Other statements in the same UDF as `sub` are the `sub`'s siblings. Any statements in a UDF invoked by `sub` are `sub`'s children. For example, consider: CREATE FUNCTION f1() RETURNS INT LANGUAGE SQL AS 'SELECT 1'; CREATE FUNCTION f2() RETURNS INT LANGUAGE SQL AS 'SELECT 2 + f3()'; CREATE FUNCTION f3() RETURNS INT LANGUAGE SQL AS 'SELECT 3'; SELECT f1(), f2(), f3(); The statement tree for this SELECT would be: root: SELECT f1(), f2(), f3() ├── f1: SELECT 1 ├── f2: SELECT 2 + f3() │ └── f3: SELECT 3 └── f3: SELECT 3 We define multiple mutations to the same table as safe if, for every possible path from the root statement to a leaf statement, either of the following is true: 1. There is no more than one mutation to any table. 2. Or, any table with multiple mutations is modified only by simple INSERTs without ON CONFLICT clauses. As a consequence of this definition, a UDF is now allowed to mutate the same table as long as it does so in different statements in its body. Such statements are siblings in the statement tree, and therefore do not share any path from root to leaf. For example, this is now allowed: CREATE FUNCTION ups(a1 INT, a2 INT) RETURNS VOID LANGUAGE SQL AS $$ UPSERT INTO a VALUES (a1); UPSERT INTO a VALUES (a2); $$ Similarly, successive invocations of the same UDF that mutates a table are now allowed: CREATE FUNCTION upd(k0 INT, v0 INT) RETURNS VOID LANGUAGE SQL AS $$ UPDATE kv SET v = v0 WHERE k = k0; $$; SELECT upd(1, 2), upd(1, 3); The `statementTree` data structure has been added to enforce this definition. See its documentation for more details. Note: These restrictions will likely need to be revisited once we support recursive UDFs. Epic: CRDB-25388 Informs cockroachdb#70731 Release note: None
To prevent index corruption described in cockroachdb#70731, optbuilder raises an error when a statement performs multiple mutations to the same table. This commit loosens this restriction for UDFs that perform mutations because it is overly strict. --- The index corruption described in cockroachdb#70731 occurs when a statement performs multiple writes to the same table. Any reads performed by successive writes see the snapshot of data as of the beginning of the statement. They do not read values as of the most recent write within the same statement. Because these successive writes are based on stale data, they can write incorrect KVs and cause inconsistencies between primary and secondary indexes. Each statement in a UDF body is essentially a child of the statement that is invoking the UDF. Mutations within UDFs are not as susceptible to the inconsistencies described above because a UDF with a mutation must be VOLATILE, and each statement in a VOLATILE UDFs reads at the latest sequence number. In other words, statements within UDFs can see previous writes made by any outer statement. This prevents inconsistencies due to writes based on stale reads. Therefore, the restriction that prevents multiple writes to the same table can be lifted in some cases when the writes are performed in UDFs. However, we cannot forgo restrictions for all writes in UDFs. A parent statement that calls a UDF cannot be allowed to mutate the same table that the UDF did. Unlike subsequent statements in the UDF after the write, the parent statement will not see the UDF's writes, and inconsistencies could occur. To define acceptable mutations to the same table within UDFs, we define a statement tree that represents the hierarchy of statements and sub-statements in a query. A sub-statement `sub` is any statement within a UDF. `sub`'s parent is the statement invoking the UDF. Other statements in the same UDF as `sub` are the `sub`'s siblings. Any statements in a UDF invoked by `sub` are `sub`'s children. For example, consider: CREATE FUNCTION f1() RETURNS INT LANGUAGE SQL AS 'SELECT 1'; CREATE FUNCTION f2() RETURNS INT LANGUAGE SQL AS 'SELECT 2 + f3()'; CREATE FUNCTION f3() RETURNS INT LANGUAGE SQL AS 'SELECT 3'; SELECT f1(), f2(), f3(); The statement tree for this SELECT would be: root: SELECT f1(), f2(), f3() ├── f1: SELECT 1 ├── f2: SELECT 2 + f3() │ └── f3: SELECT 3 └── f3: SELECT 3 We define multiple mutations to the same table as safe if, for every possible path from the root statement to a leaf statement, either of the following is true: 1. There is no more than one mutation to any table. 2. Or, any table with multiple mutations is modified only by simple INSERTs without ON CONFLICT clauses. As a consequence of this definition, a UDF is now allowed to mutate the same table as long as it does so in different statements in its body. Such statements are siblings in the statement tree, and therefore do not share any path from root to leaf. For example, this is now allowed: CREATE FUNCTION ups(a1 INT, a2 INT) RETURNS VOID LANGUAGE SQL AS $$ UPSERT INTO a VALUES (a1); UPSERT INTO a VALUES (a2); $$ Similarly, successive invocations of the same UDF that mutates a table are now allowed: CREATE FUNCTION upd(k0 INT, v0 INT) RETURNS VOID LANGUAGE SQL AS $$ UPDATE kv SET v = v0 WHERE k = k0; $$; SELECT upd(1, 2), upd(1, 3); The `statementTree` data structure has been added to enforce this definition. See its documentation for more details. Note: These restrictions will likely need to be revisited once we support recursive UDFs. Epic: CRDB-25388 Informs cockroachdb#70731 Release note: None
To prevent index corruption described in cockroachdb#70731, optbuilder raises an error when a statement performs multiple mutations to the same table. This commit loosens this restriction for UDFs that perform mutations because it is overly strict. --- The index corruption described in cockroachdb#70731 occurs when a statement performs multiple writes to the same table. Any reads performed by successive writes see the snapshot of data as of the beginning of the statement. They do not read values as of the most recent write within the same statement. Because these successive writes are based on stale data, they can write incorrect KVs and cause inconsistencies between primary and secondary indexes. Each statement in a UDF body is essentially a child of the statement that is invoking the UDF. Mutations within UDFs are not as susceptible to the inconsistencies described above because a UDF with a mutation must be VOLATILE, and each statement in a VOLATILE UDFs reads at the latest sequence number. In other words, statements within UDFs can see previous writes made by any outer statement. This prevents inconsistencies due to writes based on stale reads. Therefore, the restriction that prevents multiple writes to the same table can be lifted in some cases when the writes are performed in UDFs. However, we cannot forgo restrictions for all writes in UDFs. A parent statement that calls a UDF cannot be allowed to mutate the same table that the UDF did. Unlike subsequent statements in the UDF after the write, the parent statement will not see the UDF's writes, and inconsistencies could occur. To define acceptable mutations to the same table within UDFs, we define a statement tree that represents the hierarchy of statements and sub-statements in a query. A sub-statement `sub` is any statement within a UDF. `sub`'s parent is the statement invoking the UDF. Other statements in the same UDF as `sub` are the `sub`'s siblings. Any statements in a UDF invoked by `sub` are `sub`'s children. For example, consider: CREATE FUNCTION f1() RETURNS INT LANGUAGE SQL AS 'SELECT 1'; CREATE FUNCTION f2() RETURNS INT LANGUAGE SQL AS 'SELECT 2 + f3()'; CREATE FUNCTION f3() RETURNS INT LANGUAGE SQL AS 'SELECT 3'; SELECT f1(), f2(), f3(); The statement tree for this SELECT would be: root: SELECT f1(), f2(), f3() ├── f1: SELECT 1 ├── f2: SELECT 2 + f3() │ └── f3: SELECT 3 └── f3: SELECT 3 We define multiple mutations to the same table as safe if, for every possible path from the root statement to a leaf statement, either of the following is true: 1. There is no more than one mutation to any table. 2. Or, any table with multiple mutations is modified only by simple INSERTs without ON CONFLICT clauses. As a consequence of this definition, a UDF is now allowed to mutate the same table as long as it does so in different statements in its body. Such statements are siblings in the statement tree, and therefore do not share any path from root to leaf. For example, this is now allowed: CREATE FUNCTION ups(a1 INT, a2 INT) RETURNS VOID LANGUAGE SQL AS $$ UPSERT INTO a VALUES (a1); UPSERT INTO a VALUES (a2); $$ Similarly, successive invocations of the same UDF that mutates a table are now allowed: CREATE FUNCTION upd(k0 INT, v0 INT) RETURNS VOID LANGUAGE SQL AS $$ UPDATE kv SET v = v0 WHERE k = k0; $$; SELECT upd(1, 2), upd(1, 3); The `statementTree` data structure has been added to enforce this definition. See its documentation for more details. Note: These restrictions will likely need to be revisited once we support recursive UDFs. Epic: CRDB-25388 Informs cockroachdb#70731 Release note: None
To prevent index corruption described in cockroachdb#70731, optbuilder raises an error when a statement performs multiple mutations to the same table. This commit loosens this restriction for UDFs that perform mutations because it is overly strict. --- The index corruption described in cockroachdb#70731 occurs when a statement performs multiple writes to the same table. Any reads performed by successive writes see the snapshot of data as of the beginning of the statement. They do not read values as of the most recent write within the same statement. Because these successive writes are based on stale data, they can write incorrect KVs and cause inconsistencies between primary and secondary indexes. Each statement in a UDF body is essentially a child of the statement that is invoking the UDF. Mutations within UDFs are not as susceptible to the inconsistencies described above because a UDF with a mutation must be VOLATILE, and each statement in a VOLATILE UDFs reads at the latest sequence number. In other words, statements within UDFs can see previous writes made by any outer statement. This prevents inconsistencies due to writes based on stale reads. Therefore, the restriction that prevents multiple writes to the same table can be lifted in some cases when the writes are performed in UDFs. However, we cannot forgo restrictions for all writes in UDFs. A parent statement that calls a UDF cannot be allowed to mutate the same table that the UDF did. Unlike subsequent statements in the UDF after the write, the parent statement will not see the UDF's writes, and inconsistencies could occur. To define acceptable mutations to the same table within UDFs, we define a statement tree that represents the hierarchy of statements and sub-statements in a query. A sub-statement `sub` is any statement within a UDF. `sub`'s parent is the statement invoking the UDF. Other statements in the same UDF as `sub` are the `sub`'s siblings. Any statements in a UDF invoked by `sub` are `sub`'s children. For example, consider: CREATE FUNCTION f1() RETURNS INT LANGUAGE SQL AS 'SELECT 1'; CREATE FUNCTION f2() RETURNS INT LANGUAGE SQL AS 'SELECT 2 + f3()'; CREATE FUNCTION f3() RETURNS INT LANGUAGE SQL AS 'SELECT 3'; SELECT f1(), f2(), f3(); The statement tree for this SELECT would be: root: SELECT f1(), f2(), f3() ├── f1: SELECT 1 ├── f2: SELECT 2 + f3() │ └── f3: SELECT 3 └── f3: SELECT 3 We define multiple mutations to the same table as safe if, for every possible path from the root statement to a leaf statement, either of the following is true: 1. There is no more than one mutation to any table. 2. Or, any table with multiple mutations is modified only by simple INSERTs without ON CONFLICT clauses. As a consequence of this definition, a UDF is now allowed to mutate the same table as long as it does so in different statements in its body. Such statements are siblings in the statement tree, and therefore do not share any path from root to leaf. For example, this is now allowed: CREATE FUNCTION ups(a1 INT, a2 INT) RETURNS VOID LANGUAGE SQL AS $$ UPSERT INTO a VALUES (a1); UPSERT INTO a VALUES (a2); $$ Similarly, successive invocations of the same UDF that mutates a table are now allowed: CREATE FUNCTION upd(k0 INT, v0 INT) RETURNS VOID LANGUAGE SQL AS $$ UPDATE kv SET v = v0 WHERE k = k0; $$; SELECT upd(1, 2), upd(1, 3); The `statementTree` data structure has been added to enforce this definition. See its documentation for more details. Note: These restrictions will likely need to be revisited once we support recursive UDFs. Epic: CRDB-25388 Informs cockroachdb#70731 Release note: None
103920: opt: loosen restriction on UDF mutations to the same table r=mgartner a=mgartner #### opt: loosen restriction on UDF mutations to the same table To prevent index corruption described in #70731, optbuilder raises an error when a statement performs multiple mutations to the same table. This commit loosens this restriction for UDFs that perform mutations because it is overly strict. --- The index corruption described in #70731 occurs when a statement performs multiple writes to the same table. Any reads performed by successive writes see the snapshot of data as of the beginning of the statement. They do not read values as of the most recent write within the same statement. Because these successive writes are based on stale data, they can write incorrect KVs and cause inconsistencies between primary and secondary indexes. Each statement in a UDF body is essentially a child of the statement that is invoking the UDF. Mutations within UDFs are not as susceptible to the inconsistencies described above because a UDF with a mutation must be VOLATILE, and each statement in a VOLATILE UDFs reads at the latest sequence number. In other words, statements within UDFs can see previous writes made by any outer statement. This prevents inconsistencies due to writes based on stale reads. Therefore, the restriction that prevents multiple writes to the same table can be lifted in some cases when the writes are performed in UDFs. However, we cannot forgo restrictions for all writes in UDFs. A parent statement that calls a UDF cannot be allowed to mutate the same table that the UDF did. Unlike subsequent statements in the UDF after the write, the parent statement will not see the UDF's writes, and inconsistencies could occur. To define acceptable mutations to the same table within UDFs, we define a statement tree that represents the hierarchy of statements and sub-statements in a query. A sub-statement `sub` is any statement within a UDF. `sub`'s parent is the statement invoking the UDF. Other statements in the same UDF as `sub` are the `sub`'s siblings. Any statements in a UDF invoked by `sub` are `sub`'s children. For example, consider: CREATE FUNCTION f1() RETURNS INT LANGUAGE SQL AS 'SELECT 1'; CREATE FUNCTION f2() RETURNS INT LANGUAGE SQL AS 'SELECT 2 + f3()'; CREATE FUNCTION f3() RETURNS INT LANGUAGE SQL AS 'SELECT 3'; SELECT f1(), f2(), f3(); The statement tree for this SELECT would be: root: SELECT f1(), f2(), f3() ├── f1: SELECT 1 ├── f2: SELECT 2 + f3() │ └── f3: SELECT 3 └── f3: SELECT 3 We define multiple mutations to the same table as safe if, for every possible path from the root statement to a leaf statement, either of the following is true: 1. There is no more than one mutation to any table. 2. Or, any table with multiple mutations is modified only by simple INSERTs without ON CONFLICT clauses. As a consequence of this definition, a UDF is now allowed to mutate the same table as long as it does so in different statements in its body. Such statements are siblings in the statement tree, and therefore do not share any path from root to leaf. For example, this is now allowed: CREATE FUNCTION ups(a1 INT, a2 INT) RETURNS VOID LANGUAGE SQL AS $$ UPSERT INTO a VALUES (a1); UPSERT INTO a VALUES (a2); $$ Similarly, successive invocations of the same UDF that mutates a table are now allowed: CREATE FUNCTION upd(k0 INT, v0 INT) RETURNS VOID LANGUAGE SQL AS $$ UPDATE kv SET v = v0 WHERE k = k0; $$; SELECT upd(1, 2), upd(1, 3); The `statementTree` data structure has been added to enforce this definition. See its documentation for more details. Note: These restrictions will likely need to be revisited once we support recursive UDFs. Epic: CRDB-25388 Informs #70731 Release note: None 106354: sql: report contention on writes in EXPLAIN ANALYZE r=yuzefovich a=yuzefovich This commit adds the contention events listener to `planNodeToRowSource` which allows us to add contention time information for mutation planNodes to be shown in EXPLAIN ANALYZE. Fixes: #106266. Release note (sql change): CockroachDB now reports contention time encountered while executing mutation statements (INSERT, UPSERT, UPDATE, DELETE) when run via EXPLAIN ANALYZE. 106398: ui: fix infinite re-render on the key visualizer page r=zachlite a=zachlite Since #101258, the TimeScaleDropdownWithSearchParams can cause infinite re-renders. The exact cause of the bug is not yet diagnosed, but it occurs on the key visualizer page, and seems to be related to the custom duration options. This is tracked in #106395. This commit removes the dependency on TimeScaleDropdownWithSearchParams from the key visualizer, and replaces it with the vanilla TimeScaleDropdown. The custom duration options are still present. Informs: #106395 Epic: none Release note: None 106409: workload: ignore QueryCanceled in random schema change test r=rafiss a=rafiss fixes #105299 Release note: None Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: zachlite <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>
There are currently some situations where a query that modifies the same table in multiple locations may cause index corruption (cockroachdb#70731). To avoid this, we disallow query structures that may lead to a problematic combination of mutations. Triggers require special handling to make this check work, because they can execute arbitrary SQL statements, which can mutate a table directly or through routines, FK cascades, or other triggers. BEFORE triggers on the main query "just work" because they are built as UDF invocations as part of the main query. AFTER triggers and BEFORE triggers fired on cascades are more difficult, because they are planned lazily only if the post-query has rows to process. This commit adds logic to track invalid mutations for both types of triggers. We now propagate the "ancestor" mutated tables whenever planning a post-query, so that any triggers planned as part of the post-query can detect conflicting mutations. See the "After Triggers" section in `statement_tree.go` for additional explanation. Informs cockroachdb#70731 Release note (bug fix): Previously, it was possible to cause index corruption using AFTER-triggers that fire within a routine. In order for the bug to manifest, both the AFTER-trigger and the statement that invokes the routine must mutate the same row of a table with a mutation other than `INSERT`.
There are currently some situations where a query that modifies the same table in multiple locations may cause index corruption (cockroachdb#70731). To avoid this, we disallow query structures that may lead to a problematic combination of mutations. Triggers require special handling to make this check work, because they can execute arbitrary SQL statements, which can mutate a table directly or through routines, FK cascades, or other triggers. BEFORE triggers on the main query "just work" because they are built as UDF invocations as part of the main query. AFTER triggers and BEFORE triggers fired on cascades are more difficult, because they are planned lazily only if the post-query has rows to process. This commit adds logic to track invalid mutations for both types of triggers. We now propagate the "ancestor" mutated tables whenever planning a post-query, so that any triggers planned as part of the post-query can detect conflicting mutations. See the "After Triggers" section in `statement_tree.go` for additional explanation. Informs cockroachdb#70731 Release note (bug fix): Fixed possible index corruption caused by triggers that could occur when the following conditions were satisfied: 1. A query calls a UDF or stored procedure, and also performs a mutation on a table. 2. The UDF/SP contains a statement that either fires an AFTER trigger, or fires a cascade that itself fires a trigger. 3. The trigger modifies the same row as the outer statement. 4. Either the outer or inner mutation is something other than an INSERT without an `ON CONFLICT` clause.
There are currently some situations where a query that modifies the same table in multiple locations may cause index corruption (cockroachdb#70731). To avoid this, we disallow query structures that may lead to a problematic combination of mutations. Triggers require special handling to make this check work, because they can execute arbitrary SQL statements, which can mutate a table directly or through routines, FK cascades, or other triggers. BEFORE triggers on the main query "just work" because they are built as UDF invocations as part of the main query. AFTER triggers and BEFORE triggers fired on cascades are more difficult, because they are planned lazily only if the post-query has rows to process. This commit adds logic to track invalid mutations for both types of triggers. We now propagate the "ancestor" mutated tables whenever planning a post-query, so that any triggers planned as part of the post-query can detect conflicting mutations. See the "After Triggers" section in `statement_tree.go` for additional explanation. Informs cockroachdb#70731 Release note (bug fix): Fixed possible index corruption caused by triggers that could occur when the following conditions were satisfied: 1. A query calls a UDF or stored procedure, and also performs a mutation on a table. 2. The UDF/SP contains a statement that either fires an AFTER trigger, or fires a cascade that itself fires a trigger. 3. The trigger modifies the same row as the outer statement. 4. Either the outer or inner mutation is something other than an INSERT without an `ON CONFLICT` clause.
136076: sql: check for multiple mutations to the same table by triggers r=DrewKimball a=DrewKimball #### sql: refactor some cascade/trigger logic This commit refactors some of the logic shared between cascades and AFTER triggers. This will make the following commit easier to understand. Epic: None Release note: None #### sql: check for multiple mutations to the same table by triggers There are currently some situations where a query that modifies the same table in multiple locations may cause index corruption (#70731). To avoid this, we disallow query structures that may lead to a problematic combination of mutations. Triggers require special handling to make this check work, because they can execute arbitrary SQL statements, which can mutate a table directly or through routines, FK cascades, or other triggers. BEFORE triggers on the main query "just work" because they are built as UDF invocations as part of the main query. AFTER triggers and BEFORE triggers fired on cascades are more difficult, because they are planned lazily only if the post-query has rows to process. This commit adds logic to track invalid mutations for both types of triggers. We now propagate the "ancestor" mutated tables whenever planning a post-query, so that any triggers planned as part of the post-query can detect conflicting mutations. See the "After Triggers" section in `statement_tree.go` for additional explanation. Informs #70731 Release note (bug fix): Previously, it was possible to cause index corruption using AFTER-triggers that fire within a routine. In order for the bug to manifest, both the AFTER-trigger and the statement that invokes the routine must mutate the same row of a table with a mutation other than `INSERT`. Co-authored-by: Drew Kimball <[email protected]>
There are currently some situations where a query that modifies the same table in multiple locations may cause index corruption (cockroachdb#70731). To avoid this, we disallow query structures that may lead to a problematic combination of mutations. Triggers require special handling to make this check work, because they can execute arbitrary SQL statements, which can mutate a table directly or through routines, FK cascades, or other triggers. BEFORE triggers on the main query "just work" because they are built as UDF invocations as part of the main query. AFTER triggers and BEFORE triggers fired on cascades are more difficult, because they are planned lazily only if the post-query has rows to process. This commit adds logic to track invalid mutations for both types of triggers. We now propagate the "ancestor" mutated tables whenever planning a post-query, so that any triggers planned as part of the post-query can detect conflicting mutations. See the "After Triggers" section in `statement_tree.go` for additional explanation. Informs cockroachdb#70731 Release note (bug fix): Fixed possible index corruption caused by triggers that could occur when the following conditions were satisfied: 1. A query calls a UDF or stored procedure, and also performs a mutation on a table. 2. The UDF/SP contains a statement that either fires an AFTER trigger, or fires a cascade that itself fires a trigger. 3. The trigger modifies the same row as the outer statement. 4. Either the outer or inner mutation is something other than an INSERT without an `ON CONFLICT` clause.
In #44466 we found that an upsert statement modifying the same row twice could lead to inconsistencies, due to the upsert operator not reading its own writes. This was fixed in #45372 by checking that the input to the upsert is distinct on PKs.
But now @florence-crl and @erikgrinaker have discovered another way to upsert the same row multiple times in a single statement: using CTEs.
For example:
Now, another execution of the last statement fails with a dupe key error (confusingly on a non-unique secondary index, but this is because the PK values are also the same, so it is a duplicate key from KV's perspective):
And furthermore we can see the inconsistency directly:
Jira issue: CRDB-10192
The text was updated successfully, but these errors were encountered: