Skip to content

Commit

Permalink
opt: loosen restriction on UDF mutations to the same table
Browse files Browse the repository at this point in the history
To prevent index corruption described in cockroachdb#70731, optbuilder raises an
error when a statement performs multiple mutations to the same table.
This commit loosens this restriction for UDFs that perform mutations
because it is overly strict.

---

The index corruption described in cockroachdb#70731 occurs when a statement
performs multiple writes to the same table. Any reads performed by
successive writes see the snapshot of data as of the beginning of the
statement. They do not read values as of the most recent write within
the same statement. Because these successive writes are based on stale
data, they can write incorrect KVs and cause inconsistencies between
primary and secondary indexes.

Each statement in a UDF body is essentially a child of the statement
that is invoking the UDF. Mutations within UDFs are not as susceptible
to the inconsistencies described above because a UDF with a mutation
must be VOLATILE, and each statement in a VOLATILE UDFs reads at the
latest sequence number. In other words, statements within UDFs can see
previous writes made by any outer statement. This prevents
inconsistencies due to writes based on stale reads. Therefore, the
restriction that prevents multiple writes to the same table can be
lifted in some cases when the writes are performed in UDFs.

However, we cannot forgo restrictions for all writes in UDFs. A parent
statement that calls a UDF cannot be allowed to mutate the same table
that the UDF did. Unlike subsequent statements in the UDF after the
write, the parent statement will not see the UDF's writes, and
inconsistencies could occur.

To define acceptable mutations to the same table within UDFs, we define
a statement tree that represents the hierarchy of statements and
sub-statements in a query. A sub-statement `sub` is any statement within
a UDF. `sub`'s parent is the statement invoking the UDF. Other
statements in the same UDF as `sub` are the `sub`'s siblings. Any
statements in a UDF invoked by `sub` are `sub`'s children. For example,
consider:

    CREATE FUNCTION f1() RETURNS INT LANGUAGE SQL AS 'SELECT 1';
    CREATE FUNCTION f2() RETURNS INT LANGUAGE SQL AS 'SELECT 2 + f3()';
    CREATE FUNCTION f3() RETURNS INT LANGUAGE SQL AS 'SELECT 3';

    SELECT f1(), f2(), f3();

The statement tree for this SELECT would be:

    root: SELECT f1(), f2(), f3()
      ├── f1: SELECT 1
      ├── f2: SELECT 2 + f3()
      │    └── f3: SELECT 3
      └── f3: SELECT 3

We define multiple mutations to the same table as safe if, for every
possible path from the root statement to a leaf statement, either of the
following is true:

  1. There is no more than one mutation to any table.
  2. Or, any table with multiple mutations is modified only by simple
     INSERTs without ON CONFLICT clauses.

As a consequence of this definition, a UDF is now allowed to mutate the
same table as long as it does so in different statements in its body.
Such statements are siblings in the statement tree, and therefore do not
share any path from root to leaf. For example, this is now allowed:

    CREATE FUNCTION ups(a1 INT, a2 INT) RETURNS VOID LANGUAGE SQL AS $$
      UPSERT INTO a VALUES (a1);
      UPSERT INTO a VALUES (a2);
    $$

Similarly, successive invocations of the same UDF that mutates a table
are now allowed:

    CREATE FUNCTION upd(k0 INT, v0 INT) RETURNS VOID LANGUAGE SQL AS $$
      UPDATE kv SET v = v0 WHERE k = k0;
    $$;
    SELECT upd(1, 2), upd(1, 3);

The `statementTree` data structure has been added to enforce this
definition. See its documentation for more details.

Note: These restrictions will likely need to be revisited once we support
recursive UDFs.

Epic: CRDB-25388

Informs cockroachdb#70731

Release note: None
  • Loading branch information
mgartner committed Jul 6, 2023
1 parent 7c2bc1d commit 028c516
Show file tree
Hide file tree
Showing 15 changed files with 782 additions and 56 deletions.
37 changes: 33 additions & 4 deletions pkg/sql/logictest/testdata/logic_test/udf_delete
Original file line number Diff line number Diff line change
Expand Up @@ -57,20 +57,49 @@ query II
SELECT * FROM kv
----

statement error pq: multiple modification subqueries of the same table "kv" are not supported
CREATE FUNCTION f_err(i INT, j INT) RETURNS SETOF RECORD AS
statement ok
CREATE FUNCTION f_kv2(i INT, j INT) RETURNS SETOF RECORD AS
$$
DELETE FROM kv WHERE k = i RETURNING k, v;
DELETE FROM kv WHERE v = j RETURNING k, v;
$$ LANGUAGE SQL;

statement error pq: multiple modification subqueries of the same table "kv" are not supported
CREATE FUNCTION f_err(i INT, j INT) RETURNS SETOF RECORD AS
statement ok
INSERT INTO kv VALUES (1, 2), (3, 4), (5, 6), (7, 8)

statement ok
SELECT f_kv2(1, 9)

query II rowsort
SELECT * FROM kv
----
3 4
5 6
7 8

statement ok
SELECT f_kv2(i, j) FROM (VALUES (3, 0), (0, 8)) v(i, j)

query II
SELECT * FROM kv
----
5 6

statement ok
CREATE FUNCTION f_no_op(i INT, j INT) RETURNS SETOF RECORD AS
$$
INSERT INTO kv VALUES (i, j);
DELETE FROM kv WHERE k=i AND v=j RETURNING k, v;
$$ LANGUAGE SQL;

statement ok
SELECT f_no_op(9, 10)

query II
SELECT * FROM kv
----
5 6

statement ok
CREATE TABLE unindexed (
k INT PRIMARY KEY,
Expand Down
12 changes: 7 additions & 5 deletions pkg/sql/logictest/testdata/logic_test/udf_update
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,10 @@ SELECT f2(5,32);
----
(5,32)

statement error pq: multiple modification subqueries of the same table \"t\" are not supported
query TT
SELECT f2(5,9), f2(7,11);
----
(5,9) (7,11)

query T nosort
SELECT f2(x,y) FROM (VALUES (1,16),(1,17)) v(x,y);
Expand All @@ -80,10 +82,10 @@ SELECT f2(x,y) FROM (VALUES (1,16),(1,17)) v(x,y);
query II rowsort
SELECT * FROM t;
----
1 17
3 4
5 32
7 8
1 17
3 4
5 9
7 11

statement ok
CREATE TABLE t2 (a INT, b INT, c INT);
Expand Down
55 changes: 40 additions & 15 deletions pkg/sql/logictest/testdata/logic_test/udf_upsert
Original file line number Diff line number Diff line change
Expand Up @@ -29,23 +29,27 @@ SELECT f_ocdn(1,2,1);
----
NULL

statement error pq: multiple modification subqueries of the same table \"t_ocdn\" are not supported
query TTTT
SELECT f_ocdn(1,1,1), f_ocdn(3,2,2), f_ocdn(6,6,2), f_ocdn(2,1,1);
----
NULL (3,2,2) (6,6,2) NULL

query T nosort
SELECT f_ocdn(x, y, z) FROM (VALUES (1, 1, 1), (2, 2, 1), (3, 3, 3), (3, 4, 4)) v(x, y, z)
SELECT f_ocdn(x, y, z) FROM (VALUES (1, 1, 1), (2, 2, 1), (3, 3, 3), (3, 4, 4), (5, 5, 5)) v(x, y, z)
----
NULL
(2,2,1)
(3,3,3)
NULL
NULL
NULL
(5,5,5)

query III rowsort
SELECT * FROM t_ocdn
----
1 1 1
2 2 1
3 3 3
1 1 1
3 2 2
5 5 5
6 6 2


statement ok
Expand All @@ -55,24 +59,45 @@ $$
$$ LANGUAGE SQL;

statement ok
SELECT f_ocdn_2vals(5,5,5,5,5,5);
SELECT f_ocdn_2vals(7,7,7,7,7,7);

query III rowsort
SELECT * FROM t_ocdn;
----
1 1 1
2 2 1
3 3 3
5 5 5
1 1 1
3 2 2
5 5 5
6 6 2
7 7 7

statement error pq: multiple modification subqueries of the same table \"t_ocdn\" are not supported
CREATE FUNCTION f_err(i INT, j INT, k INT, m INT, n INT, o INT) RETURNS RECORD AS
statement ok
CREATE FUNCTION f_multi_ins(i INT, j INT, k INT, m INT, n INT, o INT) RETURNS RECORD AS
$$
INSERT INTO t_ocdn VALUES (i, j, k) ON CONFLICT DO NOTHING;
INSERT INTO t_ocdn VALUES (m, n, o) ON CONFLICT DO NOTHING;
SELECT * FROM t_ocdn WHERE t.a=i OR t.a=m;
SELECT * FROM t_ocdn WHERE a=i OR a=m ORDER BY a;
$$ LANGUAGE SQL;

query TT
SELECT f_multi_ins(1, 1, 1, 1, 1, 1), f_multi_ins(1, 1, 1, 1, 1, 1)
----
(1,1,1) (1,1,1)

query TT
SELECT f_multi_ins(2, 2, 2, 3, 3, 3), f_multi_ins(3, 3, 3, 4, 4, 4)
----
(3,2,2) (3,2,2)

query III rowsort
SELECT * FROM t_ocdn
----
1 1 1
3 2 2
4 4 4
5 5 5
6 6 2
7 7 7

subtest on_conflict_do_update

statement ok
Expand Down
10 changes: 5 additions & 5 deletions pkg/sql/logictest/testdata/logic_test/with
Original file line number Diff line number Diff line change
Expand Up @@ -1052,7 +1052,7 @@ SELECT i, j FROM u@u_j_idx

# Multiple upserts of the same row. Might succeed after issue 70731 is fixed,
# depending on the implementation, but should not cause corruption.
query error pgcode 0A000 multiple modification subqueries of the same table
query error pgcode 0A000 multiple mutations of the same table
WITH
v AS (UPSERT INTO u VALUES (0, 1) RETURNING *),
w AS (UPSERT INTO u SELECT i, j + 1 FROM v RETURNING *)
Expand All @@ -1071,7 +1071,7 @@ SELECT i, j FROM u@u_j_idx
# Multiple updates of the same row. Might succeed after issue 70731 is fixed,
# depending on the implementation, but should not cause corruption. The order of
# CTE execution is not necessarily defined here.
query error pgcode 0A000 multiple modification subqueries of the same table
query error pgcode 0A000 multiple mutations of the same table
WITH
v AS (UPDATE u SET j = 3 WHERE i = 0 RETURNING *),
w AS (UPDATE u SET j = 4 WHERE i = 0 RETURNING *)
Expand All @@ -1090,7 +1090,7 @@ SELECT i, j FROM u@u_j_idx
# Multiple updates of the same row. Might succeed after issue 70731 is fixed,
# depending on the implementation, but should not cause corruption. The order of
# CTE execution should be deterministic.
query error pgcode 0A000 multiple modification subqueries of the same table
query error pgcode 0A000 multiple mutations of the same table
WITH
v AS (UPDATE u SET j = 5 WHERE i = 0 RETURNING *),
w AS (UPDATE u SET j = v.j + 1 FROM v WHERE u.i = v.i RETURNING *)
Expand All @@ -1108,7 +1108,7 @@ SELECT i, j FROM u@u_j_idx

# Multiple inserts of the same row into the same table, most should become
# updates due to conflicts. Might succeed after issue 70731 is fixed.
query error pgcode 0A000 multiple modification subqueries of the same table
query error pgcode 0A000 multiple mutations of the same table
WITH
v AS (INSERT INTO u VALUES (0, 42), (1, 42) ON CONFLICT (i) DO UPDATE SET j = 52 RETURNING *)
INSERT INTO u SELECT i, j + 1 FROM v ON CONFLICT (i) DO UPDATE SET j = v.j + 100 RETURNING *
Expand All @@ -1125,7 +1125,7 @@ SELECT i, j FROM u@u_j_idx

# Multiple deletes of the same row. Might succeed after issue 70731 is fixed,
# though the order of CTE execution is undefined.
query error pgcode 0A000 multiple modification subqueries of the same table
query error pgcode 0A000 multiple mutations of the same table
WITH
v AS (DELETE FROM u ORDER BY i LIMIT 1 RETURNING *),
w AS (DELETE FROM u ORDER BY i LIMIT 2 RETURNING *)
Expand Down
3 changes: 3 additions & 0 deletions pkg/sql/opt/optbuilder/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ go_library(
"show_trace.go",
"sql_fn.go",
"srfs.go",
"statement_tree.go",
"subquery.go",
"union.go",
"update.go",
Expand Down Expand Up @@ -105,6 +106,7 @@ go_test(
srcs = [
"builder_test.go",
"name_resolution_test.go",
"statement_tree_test.go",
"union_test.go",
],
args = ["-test.timeout=55s"],
Expand All @@ -114,6 +116,7 @@ go_test(
"//pkg/settings/cluster",
"//pkg/sql/catalog/colinfo",
"//pkg/sql/catalog/colinfo/colinfotestutils",
"//pkg/sql/opt/cat",
"//pkg/sql/opt/memo",
"//pkg/sql/opt/testutils",
"//pkg/sql/opt/testutils/opttester",
Expand Down
28 changes: 20 additions & 8 deletions pkg/sql/opt/optbuilder/builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,10 @@ type Builder struct {
catalog cat.Catalog
scopeAlloc []scope

// stmtTree tracks the hierarchy of statements to ensure that multiple
// modifications to the same table cannot corrupt indexes (see #70731).
stmtTree statementTree

// ctes stores CTEs which may need to be built at the top-level.
ctes cteSources

Expand Down Expand Up @@ -163,12 +167,6 @@ type Builder struct {
// query contains a correlated subquery.
isCorrelated bool

// areAllTableMutationsSimpleInserts maps from each table mutated by the
// statement to true if all mutations of that table are simple inserts
// (without ON CONFLICT) or false otherwise. All mutated tables will have an
// entry in the map.
areAllTableMutationsSimpleInserts map[cat.StableID]bool

// subqueryNameIdx helps generate unique subquery names during star
// expansion.
subqueryNameIdx int
Expand Down Expand Up @@ -261,11 +259,25 @@ func unimplementedWithIssueDetailf(issue int, detail, format string, args ...int
// "context". This is used at the top-level of every statement, and inside
// EXPLAIN, CREATE VIEW, CREATE TABLE AS.
func (b *Builder) buildStmtAtRoot(stmt tree.Statement, desiredTypes []*types.T) (outScope *scope) {
// A "root" statement cannot refer to anything from an enclosing query, so we
// always start with an empty scope.
// A "root" statement cannot refer to anything from an enclosing query, so
// we always start with an empty scope.
inScope := b.allocScope()
return b.buildStmtAtRootWithScope(stmt, desiredTypes, inScope)
}

// buildStmtAtRootWithScope is similar to buildStmtAtRoot, but allows a scope to
// be provided. This is used at the top-level of a statement, that has a new
// context but can refer to variables that are declared outside the statement,
// like a statement within a UDF body that can reference UDF parameters.
func (b *Builder) buildStmtAtRootWithScope(
stmt tree.Statement, desiredTypes []*types.T, inScope *scope,
) (outScope *scope) {
inScope.atRoot = true

// Push a new statement onto the statement tree.
b.stmtTree.Push()
defer b.stmtTree.Pop()

// Save any CTEs above the boundary.
prevCTEs := b.ctes
b.ctes = nil
Expand Down
2 changes: 1 addition & 1 deletion pkg/sql/opt/optbuilder/create_function.go
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ func (b *Builder) buildCreateFunction(cf *tree.CreateFunction, inScope *scope) (
// volatility of stable functions. If folded, we only get a scalar and lose
// the volatility.
b.factory.FoldingControl().TemporarilyDisallowStableFolds(func() {
stmtScope = b.buildStmt(stmts[i].AST, nil /* desiredTypes */, bodyScope)
stmtScope = b.buildStmtAtRootWithScope(stmts[i].AST, nil /* desiredTypes */, bodyScope)
})
checkStmtVolatility(targetVolatility, stmtScope, stmt.AST)

Expand Down
2 changes: 1 addition & 1 deletion pkg/sql/opt/optbuilder/delete.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ func (b *Builder) buildDelete(del *tree.Delete, inScope *scope) (outScope *scope
b.checkPrivilege(depName, tab, privilege.SELECT)

// Check if this table has already been mutated in another subquery.
b.checkMultipleMutations(tab, false /* simpleInsert */)
b.checkMultipleMutations(tab, generalMutation)

var mb mutationBuilder
mb.init(b, "delete", tab, alias)
Expand Down
6 changes: 5 additions & 1 deletion pkg/sql/opt/optbuilder/insert.go
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,11 @@ func (b *Builder) buildInsert(ins *tree.Insert, inScope *scope) (outScope *scope
}

// Check if this table has already been mutated in another subquery.
b.checkMultipleMutations(tab, ins.OnConflict == nil /* simpleInsert */)
mutType := generalMutation
if ins.OnConflict == nil {
mutType = simpleInsert
}
b.checkMultipleMutations(tab, mutType)

var mb mutationBuilder
if ins.OnConflict != nil && ins.OnConflict.IsUpsertAlias() {
Expand Down
2 changes: 1 addition & 1 deletion pkg/sql/opt/optbuilder/scalar.go
Original file line number Diff line number Diff line change
Expand Up @@ -732,7 +732,7 @@ func (b *Builder) buildUDF(
bodyProps = make([]*physical.Required, len(stmts))

for i := range stmts {
stmtScope := b.buildStmt(stmts[i].AST, nil /* desiredTypes */, bodyScope)
stmtScope := b.buildStmtAtRootWithScope(stmts[i].AST, nil /* desiredTypes */, bodyScope)
expr, physProps := stmtScope.expr, stmtScope.makePhysicalProps()

// The last statement produces the output of the UDF.
Expand Down
Loading

0 comments on commit 028c516

Please sign in to comment.