-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: lock all column families for new SELECT FOR UPDATE #116170
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)
pkg/sql/opt/exec/execbuilder/mutation.go
line 1150 at r2 (raw file):
// In some simple cases we can push the locking down into the input operation // instead of adding what would be a redundant lookup join.
Does this optimization need to change to ensure that the input is scanning the same columns as are required by the lock?
pkg/sql/opt/exec/execbuilder/mutation.go
line 1160 at r2 (raw file):
switch input := lock.Input.(type) { case *memo.ScanExpr: if md.Table(input.Table) == md.Table(lock.Table) && input.Index == cat.PrimaryIndex {
[nit] I think this check is correct, but occasionally I've seen the catalog tables be different pointers even for the same table. I think it would be best to use md.Table(input.Table).ID() == md.Table(lock.Table).ID()
pkg/sql/opt/optbuilder/locking.go
line 398 at r2 (raw file):
oldColID := lb.table.ColumnID(i) newColID := newTabID.ColumnID(i) colMap.Set(int(oldColID), int(newColID))
[nit] ColMaps can have poor performance for queries with a lot of columns. Can we follow the example of DuplicateColumnIDs
instead? Or better yet, call into a shared method on the factory that returns the new table and column IDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed
line 403 at r2 (raw file):
statement ok CREATE TABLE abcde (a INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL, d INT NOT NULL, e INT NOT NULL, FAMILY (a, b), FAMILY (c), FAMILY (d), FAMILY (e))
A case with a nullable column might be interesting. I think attempting to lock (only) families with id > 0 and all nullable columns will also lock family 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)
pkg/sql/opt/exec/execbuilder/mutation.go
line 1150 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Does this optimization need to change to ensure that the input is scanning the same columns as are required by the lock?
Or at least, should we check that the input is scanning at least the columns that need to be locked?
Previously, DrewKimball (Drew Kimball) wrote…
|
Previously, mgartner (Marcus Gartner) wrote…
I suppose that's what we want here, since this code was comparing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @nvanbenschoten)
pkg/sql/opt/optbuilder/locking.go
line 409 at r2 (raw file):
oldColID := inScope.cols[i].id if newColID, ok := colMap.Get(int(oldColID)); ok { cols.Add(opt.ColumnID(newColID))
Is cols
supposed to include columns from both inScope
and the new table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Drew's questions brought up a number of problems that I have been working on. Hold off on reviewing for a bit, will push today with fixes.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @nvanbenschoten)
pkg/sql/opt/exec/execbuilder/mutation.go
line 1150 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Or at least, should we check that the input is scanning at least the columns that need to be locked?
Yes, great point, this optimization does now need to check columns.
pkg/sql/opt/exec/execbuilder/mutation.go
line 1160 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I suppose that's what we want here, since this code was comparing the
opt.TableID
s before?
Yes, exactly, a query that joins a table to itself now has a problem, which I am currently fixing.
I have realized there's a problem with this approach: statements like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I changed my mind, because locking individual column families seems really useful. I plan to make SELECT 1 FROM foo FOR UPDATE
lock the first column family, as it does now.
I think this new version will work, but hold off on reviewing until I add more tests.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
41b362d
to
c025af6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
pkg/sql/opt/optbuilder/locking.go
line 405 at r4 (raw file):
// empty. In this case we will only lock the first column family. var lockCols opt.ColSet for _, col := range inScope.cols {
Hmm, ran into another quandary. This method of picking the columns to lock from the projection scope means that a statement like:
SELECT a + b FROM foo FOR UPDATE;
won't actually lock the column families of a and b, but will instead only lock the first column family, just as I'm doing for SELECT 1 FROM foo FOR UPDATE
. Maybe even scarier is a statement like:
SELECT a::int FROM foo FOR UPDATE;
which really looks like it should lock column family a, but does not.
I think this is too subtle for people to work with safely. I'm going to go back to locking all column families.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to lock all column families.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
pkg/sql/opt/exec/execbuilder/mutation.go
line 1150 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Yes, great point, this optimization does now need to check columns.
I made the optimization only apply to single-column-family tables. This saves us from having to compare the columns produced by the input to the columns locked, because there's only one column family in play.
pkg/sql/opt/exec/execbuilder/mutation.go
line 1160 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Yes, exactly, a query that joins a table to itself now has a problem, which I am currently fixing.
I switched back to comparing the exact opt.TableID
(by saving it in the KeySource
field of the LockPrivate
).
pkg/sql/opt/optbuilder/locking.go
line 398 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] ColMaps can have poor performance for queries with a lot of columns. Can we follow the example of
DuplicateColumnIDs
instead? Or better yet, call into a shared method on the factory that returns the new table and column IDs?
I ended up putting every column of the duplicated table into the set, instead of using the scope to pick, so the ColMap went away. (This is done to ensure we lock all column families.)
pkg/sql/opt/optbuilder/locking.go
line 409 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is
cols
supposed to include columns from bothinScope
and the new table?
I added the LockCols
field to make this more clear: Cols
is essentially passthrough columns from inScope
and LockCols
has columns from the new table.
pkg/sql/opt/optbuilder/locking.go
line 405 at r4 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Hmm, ran into another quandary. This method of picking the columns to lock from the projection scope means that a statement like:
SELECT a + b FROM foo FOR UPDATE;won't actually lock the column families of a and b, but will instead only lock the first column family, just as I'm doing for
SELECT 1 FROM foo FOR UPDATE
. Maybe even scarier is a statement like:SELECT a::int FROM foo FOR UPDATE;which really looks like it should lock column family a, but does not.
I think this is too subtle for people to work with safely. I'm going to go back to locking all column families.
I think we should introduce a syntax something like SELECT ... FOR UPDATE OF foo (a, b)
to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job figuring all this out!
I think there's one last problem - when the primary key is fully specified (e.g. not looking up with a prefix), and a column family is fully contained in the key column set, we can skip scanning that family and perform get
requests on the others. Example:
CREATE TABLE abc (a INT NOT NULL, b INT NOT NULL, c INT NOT NULL, PRIMARY KEY (a, b), FAMILY foo (a), FAMILY bar (b), FAMILY baz (c));
EXPLAIN ANALYZE (DEBUG) SELECT * FROM abc WHERE a = 5 AND b = 6;
--From plan.txt:
• scan
columns: (a int, b int, c int)
nodes: n1
actual row count: 0
vectorized batch count: 0
KV time: 1ms
KV contention time: 0µs
KV rows decoded: 0
KV pairs read: 0
KV bytes read: 0 B
KV gRPC calls: 1
estimated max memory allocated: 20 KiB
sql cpu time: 35µs
MVCC step count (ext/int): 0/0
MVCC seek count (ext/int): 1/1
estimated row count: 1 (missing stats)
table: abc@abc_pkey
spans: /5/6/2/1
Note from the spans that only the baz
column family was scanned. The span-splitting logic is called from DistSQLPlanner.createPlanForLookupJoin
. We could potentially make that a separate fix - whatever's easiest.
Reviewed 1 of 6 files at r3, 4 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @nvanbenschoten)
pkg/sql/opt/exec/execbuilder/mutation.go
line 1160 at r2 (raw file):
I switched back to comparing the exact
opt.TableID
(by saving it in theKeySource
field of theLockPrivate
).
SGTM. Seems best to limit the scope of the optimization for now.
pkg/sql/opt/optbuilder/locking.go
line 405 at r4 (raw file):
I think we should introduce a syntax something like
SELECT ... FOR UPDATE OF foo (a, b)
to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.
I like this idea.
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed
line 403 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
A case with a nullable column might be interesting. I think attempting to lock (only) families with id > 0 and all nullable columns will also lock family 0.
A nullable case might still be interesting, but seems less important now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @nvanbenschoten)
pkg/sql/opt/ops/mutation.opt
line 384 at r5 (raw file):
# KeySource identifies the source of the key columns. KeySource TableID
In a follow-up, consider adding some assertions about the interesting invariants of this expression in check_expr.go, like:
- Assert that all columns in KeyCols are from the KeySource table.
- Assert that all columns in LockCols are from Table.
- Assert that Cols contains KeyCols.
- Assert that Cols does not intersect LockCols.
- Assert the order of KeyCols.
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed
line 400 at r5 (raw file):
statement ok CREATE TABLE abcde (a INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL, d INT NOT NULL, e INT NOT NULL, FAMILY (a, b), FAMILY (c), FAMILY (d), FAMILY (e))
nit: The table would be easier to read if split on multiple lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's one last problem - when the primary key is fully specified (e.g. not looking up with a prefix), and a column family is fully contained in the key column set, we can skip scanning that family and perform
get
requests on the others.
Wow, great catch!
I pushed a second commit fixing that behavior.
Interestingly, I was not able to come up with a scenario where this lack of a lock on family bar
would cause a problem, because any change to b
causes a delete and a put, which touch all column families. And we always have at least the first column family locked. But maybe someone else can come up with a scenario?
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed
line 403 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
A nullable case might still be interesting, but seems less important now.
Done. (column d)
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed
line 400 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: The table would be easier to read if split on multiple lines.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My fix was a little too aggressive, and broke a test added by #42572 for FK checks on multi-column-family tables. Now I'm only modifying the behavior for lookup joins under read committed isolation.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some suggestions/comments, but I'm happy with this as-is if we want to merge.
Interestingly, I was not able to come up with a scenario where this lack of a lock on family
bar
would cause a problem, because any change tob
causes a delete and a put, which touch all column families. And we always have at least the first column family locked. But maybe someone else can come up with a scenario?
That's a good point. Since the problem case requires the column to be part of the primary key, any mutations to that column involve all families automatically. So, maybe the only problematic case is two locking reads.
Reviewed 2 of 2 files at r6, 4 of 7 files at r7, 3 of 3 files at r8, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @michae2, and @nvanbenschoten)
pkg/sql/distsql_physical_planner.go
line 3120 at r8 (raw file):
if joinReaderSpec.LockingStrength != descpb.ScanLockingStrength_FOR_NONE && planCtx.ExtendedEvalCtx.TxnIsoLevel != isolation.Serializable { splitter = span.MakeSplitterForSideEffect(n.table.desc, n.table.index, fetchOrdinals)
I'm happy with the current implementation, so please feel free to ignore, but one thought: we could just check that this condition is false before calling MakeSplitter
, since a nil SplitFamilyIDs
indicates that no splitting will be performed and all column families should be scanned. Then, we could even just get rid of the entire first commit, since we would be guaranteed to scan all column families, no need for the LockCols
field or TableID
duplication.
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed
line 484 at r6 (raw file):
SELECT d FROM abcde WHERE a = 45 AND b = 46 FOR UPDATE ---- NULL
BTW, one of the things that makes this case interesting is that AFAIK we don't actually write a KV entry for entirely NULL column families other than family 0. So, if and when we decide to do something other than locking all families, we'll want to make sure the persistent locks interact well with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, @michae2, and @nvanbenschoten)
pkg/sql/opt/ops/mutation.opt
line 378 at r8 (raw file):
[Private] define LockPrivate {
nit: This is all a bit confusing for someone without the context here. Can you add an example, maybe using the query above the Lock
definition, and specify what Table
, KeySource
, KeyCols
etc are in this case?
The new implementation of SELECT FOR UPDATE used for Read Committed isolation starts as a lock operator built from a table scan, which becomes a lookup join during execbuild. This lock operator was re-using the same table and column references as the original table scan, which caused the resulting lookup join to only lock the first column family of the table. Instead, we need to use fresh table and column references for the lock operator, so that the resulting lookup join is able to control which column families it locks. For now we choose to lock all column families, even if some column family would not normally be read by the select statement. We do this to more closely match Postgres behavior for statements like: ``` SELECT 1 FROM foo FOR UPDATE ``` Fixes: cockroachdb#115014 Release note (sql change): The new SELECT FOR UPDATE implementation used under Read Committed isolation (and under Serializable isolation when optimizer_use_lock_op_for_serializable is true) now locks all column families instead of only the first column family.
When determining which column families to read from an index for a set of requested columns, the span splitter has an optimization: it skips over the families for requested columns that are completely encoded in the index key, because these columns can be decoded from the key of *any* column family. For example, for the following query we do not need to read column families foo and bar: ``` CREATE TABLE abc ( a INT NOT NULL, b INT NOT NULL, c INT NOT NULL, PRIMARY KEY (a, b), FAMILY foo (a), FAMILY bar (b), FAMILY baz (c) ); SELECT * FROM abc WHERE a = 5 AND b = 6; ``` When the read has a side-effect, however (e.g. SELECT FOR UPDATE) we need to lock all of the requested column families. To do so, this commit adds a variant of `span.MakeSplitter` for locked reads, called `span.MakeSplitterForSideEffect`, which does not use this optimization. Fixes: cockroachdb#115014 Release note (sql change): Fix a bug where SELECT FOR UPDATE under Read Committed isolation on multi-column-family tables was not locking column families containing only key columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs! (Looks like @mgartner has drafts so I will wait.)
Also had a conversation with Nathan in slack about locking all column families.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
pkg/sql/distsql_physical_planner.go
line 3120 at r8 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
I'm happy with the current implementation, so please feel free to ignore, but one thought: we could just check that this condition is false before calling
MakeSplitter
, since a nilSplitFamilyIDs
indicates that no splitting will be performed and all column families should be scanned. Then, we could even just get rid of the entire first commit, since we would be guaranteed to scan all column families, no need for theLockCols
field orTableID
duplication.
Interesting. This is a really good point, and not something I thought of.
I'm a little scared to do this, because by the time we're doing physical planning we don't know if the locking lookup join came from a SELECT FOR UPDATE statement or a FK check. I believe FK checks are still only locking column family 0, and at least back in the time of #42572 we were trying to preserve this behavior. I think using a nil SplitFamilyIDs
would cause all locking lookup joins, including those used for FK checks, to lock all column families.
pkg/sql/opt/ops/mutation.opt
line 378 at r8 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: This is all a bit confusing for someone without the context here. Can you add an example, maybe using the query above the
Lock
definition, and specify whatTable
,KeySource
,KeyCols
etc are in this case?
Done, lmk if that is helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r3, 2 of 2 files at r6, 4 of 7 files at r7, 3 of 3 files at r8, 7 of 7 files at r9, 6 of 6 files at r10, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @michae2, and @nvanbenschoten)
Thanks! bors r=DrewKimball,mgartner |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error getting backport branch release-23.1.12-rc.FROZEN: unexpected status code: 404 Not Found Backport to branch 23.1.12-rc.FROZEN failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale)
pkg/sql/opt/optbuilder/locking.go
line 405 at r4 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
I think we should introduce a syntax something like
SELECT ... FOR UPDATE OF foo (a, b)
to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.I like this idea.
Filed #116838 about this.
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed
line 484 at r6 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
BTW, one of the things that makes this case interesting is that AFAIK we don't actually write a KV entry for entirely NULL column families other than family 0. So, if and when we decide to do something other than locking all families, we'll want to make sure the persistent locks interact well with that.
Yes, and as it turns out this is still an issue. Filed #116836.
opt: lock all column families for new SELECT FOR UPDATE
The new implementation of SELECT FOR UPDATE used for Read Committed isolation starts as a lock operator built from a table scan, which becomes a lookup join during execbuild. This lock operator was re-using the same table and column references as the original table scan, which caused the resulting lookup join to only lock the first column family of the table.
Instead, we need to use fresh table and column references for the lock operator, so that the resulting lookup join is able to control which column families it locks.
For now we choose to lock all column families, even if some column family would not normally be read by the select statement. We do this to more closely match Postgres behavior for statements like:
Fixes: #115014
Release note (sql change): The new SELECT FOR UPDATE implementation used under Read Committed isolation (and under Serializable isolation when optimizer_use_lock_op_for_serializable is true) now locks all column families instead of only the first column family.
sql: do not skip key col families when performing a locked read
When determining which column families to read from an index for a set of requested columns, the span splitter has an optimization: it skips over the families for requested columns that are completely encoded in the index key, because these columns can be decoded from the key of any column family. For example, for the following query we do not need to read column families foo and bar:
When the read has a side-effect, however (e.g. SELECT FOR UPDATE) we need to lock all of the requested column families. To do so, this commit adds a variant of
span.MakeSplitter
for locked reads, calledspan.MakeSplitterForSideEffect
, which does not use this optimization.Fixes: #115014
Release note (sql change): Fix a bug where SELECT FOR UPDATE under Read Committed isolation on multi-column-family tables was not locking column families containing only key columns.