-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: remove redundant arbiter indexes #76961
opt: remove redundant arbiter indexes #76961
Conversation
742ad7f
to
b1e5494
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very pleased reading through the pr.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @msirek, @RaduBerinde, and @rytaft)
pkg/sql/opt/optbuilder/arbiter_set.go, line 179 at r1 (raw file):
// INDEX constraints that are synthesized for partitioned and hash-sharded // indexes, which the user has no control over. type minArbiterSet struct {
just a nit feel free to ignore, but to be honest, I'm not sure how much singleIndexConflictOrdsCache
gonna save us from memory footprint. But if there is a strong reason that's helpful, I think it worth mentioning in comments at struct def level that either singleIndexConflictOrdsCache
or indexConflictOrdsCache
is set, not both. This is also relative to how addIndex
and addUniqueConstraint
cooperate with each other, so worth mentioning that we always finish adding index with addIndex
always before logic adding unique constraint starts, that's why we always know if there is one single index or more when we initCache
which is required by addUniqueConstraint
...... Sorry it took me quite a few minutes to fully understand, but I think the code is a good read in general.
pkg/sql/opt/optbuilder/testdata/upsert, line 3630 at r1 (raw file):
nit: I think it worth to break up this table into smaller test, but feel free to ignore as well because I know there'd be annoying amount of combinations >_> |
pkg/sql/opt/optbuilder/arbiter_set.go, line 179 at r1 (raw file): Previously, chengxiong-ruan (Chengxiong Ruan) wrote…
Yes, after sleeping on it I agree this might be over-engineered. Even without the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @mgartner, @RaduBerinde, and @rytaft)
pkg/sql/opt/optbuilder/arbiter_set.go, line 221 at r1 (raw file):
m.as.removeIndex(idx) } }
I couldn't find any issues from looking at the code so far, but taking a testing approach, I found a test case I'm wondering should pass:
CREATE TABLE uniq_constraint_and_partial_index (
k INT PRIMARY KEY,
a INT,
b INT,
UNIQUE INDEX (a) WHERE b > 11,
UNIQUE WITHOUT INDEX (a) WHERE b < -1,
UNIQUE WITHOUT INDEX (a) WHERE b > 0
);
INSERT INTO uniq_constraint_and_partial_index VALUES (2, 2, 2)
ON CONFLICT (a) WHERE b > 10 DO NOTHING;
error (42P10): there is no unique or exclusion constraint matching the ON CONFLICT specificationerror (42P10): there is no unique or exclusion constraint matching the ON CONFLICT specification
If the table has only the last unique index, the insert works:
CREATE TABLE uniq_constraint_and_partial_index (
k INT PRIMARY KEY,
a INT,
b INT,
UNIQUE WITHOUT INDEX (a) WHERE b > 0
);
pkg/sql/opt/optbuilder/arbiter_set.go, line 270 at r1 (raw file):
// Do nothing if the cache has already been initialized. if m.singleIndexConflictOrdsCache.Empty() && m.indexConflictOrdsCache != nil { return
nit: Based on the comment, should the condition be if !m.singleIndexConflictOrdsCache.Empty() || m.indexConflictOrdsCache != nil {
?
pkg/sql/opt/optbuilder/mutation_builder_arbiter.go, line 237 at r1 (raw file):
// the unique constraint ordinals and determining set equality. if uniqueConstraint.ColumnCount() != conflictOrds.Len() { continue
nit: It's not part of this PR, but this if
is a duplicate of the lines of code above it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @mgartner, @RaduBerinde, and @rytaft)
pkg/sql/opt/optbuilder/arbiter_set.go, line 221 at r1 (raw file):
CREATE TABLE uniq_constraint_and_partial_index (
k INT PRIMARY KEY,
a INT,
b INT,
UNIQUE INDEX (a) WHERE b > 11,
UNIQUE WITHOUT INDEX (a) WHERE b < -1,
UNIQUE WITHOUT INDEX (a) WHERE b > 0
);
I found the test case fails as an optbuilder test run in Goland, but succeeds when run against a demo cluster, both on the master
and remove-unnecessary-arbiters
branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: complete! 3 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @mgartner, and @RaduBerinde)
-- commits, line 18 at r1:
nit: column b came out of nowhere (here and in the PR message)
pkg/sql/opt/optbuilder/arbiter_set.go, line 179 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Yes, after sleeping on it I agree this might be over-engineered. Even without the
singleIndexConflictOrdsCache
, we avoid allocations when the table has noUNIQUE WITHOUT INDEX
constraints, so that's good. I'll simplify this unless someone disagrees.
Go for it
pkg/sql/opt/optbuilder/arbiter_set.go, line 229 at r1 (raw file):
// findRedundantIndex returns the first arbiter index which is made redundant by // the unique constraint. An arbiter index is redundant if:
nit: it's not clear if this list is combined with AND or OR. You could say "... is redundant if both of the following hold:"
pkg/sql/opt/optbuilder/arbiter_set.go, line 280 at r1 (raw file):
return } // Otherwise, cache each indexes conflict columns in a map.
nit: indexes -> index's
pkg/sql/opt/optbuilder/arbiter_set.go, line 281 at r1 (raw file):
} // Otherwise, cache each indexes conflict columns in a map. m.indexConflictOrdsCache = make(map[cat.IndexOrdinal]util.FastIntSet)
nit: you can do make(map[cat.IndexOrdinal]util.FastIntSet, m.as.indexes.Len())
Arbiter indexes and unique constraints are now shown in optimizer trees, regardless of format flags. Release note: None
We select all unique indexes and unique constraints as arbiters for `INSERT .. ON CONFLICT` statements without explicit conflict columns or constraints. In some cases, unique constraint arbiters make unique index arbiters redundant, which results in query plans that perform unnecessary work to check for conflicts. Most notably, redundant unique indexes are chosen as arbiters for both partitioned and hash-sharded indexes which synthesize `UNIQUE WITHOUT INDEX` constraints. For example, consider the table and statement which mimics a simplified hash-sharded index: CREATE TABLE t ( a INT, shard_a INT, UNIQUE INDEX shard_a_a_key (shard_a, a), UNIQUE WITHOUT INDEX a_key (a) ) INSERT INTO t VALUES (1, 2) ON CONFLICT DO NOTHING There is no need to use both `shard_a_a_key` and `a_key` as arbiters for the `INSERT` statement because any conflict in `shard_a_a_key` will also be a conflict in `a_key`. Only `a_key` is required to be an arbiter. This commit removes these redundant arbiter indexes, which improves query plans. Fixes cockroachdb#75498 Release note (performance improvement): The optimizer produces more efficient query plans for `INSERT .. ON CONFLICT` statements that do not have explicit conflict columns or constraints and are performed on partitioned tables.
b1e5494
to
8122043
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the great feedback!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @chengxiong-ruan, @msirek, @RaduBerinde, and @rytaft)
Previously, rytaft (Rebecca Taft) wrote…
nit: column b came out of nowhere (here and in the PR message)
Fixed.
pkg/sql/opt/optbuilder/arbiter_set.go, line 179 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Go for it
Done.
pkg/sql/opt/optbuilder/arbiter_set.go, line 221 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
CREATE TABLE uniq_constraint_and_partial_index (
k INT PRIMARY KEY,
a INT,
b INT,
UNIQUE INDEX (a) WHERE b > 11,
UNIQUE WITHOUT INDEX (a) WHERE b < -1,
UNIQUE WITHOUT INDEX (a) WHERE b > 0
);I found the test case fails as an optbuilder test run in Goland, but succeeds when run against a demo cluster, both on the
master
andremove-unnecessary-arbiters
branches.
Thanks for finding this! I've created an issue to track it: #76994
pkg/sql/opt/optbuilder/arbiter_set.go, line 229 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: it's not clear if this list is combined with AND or OR. You could say "... is redundant if both of the following hold:"
Done.
pkg/sql/opt/optbuilder/arbiter_set.go, line 270 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
nit: Based on the comment, should the condition be
if !m.singleIndexConflictOrdsCache.Empty() || m.indexConflictOrdsCache != nil {
?
Good catch. I've removed singleIndexConflictOrdsCache
entirely.
pkg/sql/opt/optbuilder/arbiter_set.go, line 280 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: indexes -> index's
Done.
pkg/sql/opt/optbuilder/arbiter_set.go, line 281 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: you can do
make(map[cat.IndexOrdinal]util.FastIntSet, m.as.indexes.Len())
Good catch. Done.
pkg/sql/opt/optbuilder/mutation_builder_arbiter.go, line 237 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
nit: It's not part of this PR, but this
if
is a duplicate of the lines of code above it.
Good catch, fixed.
pkg/sql/opt/optbuilder/testdata/upsert, line 3630 at r1 (raw file):
Previously, chengxiong-ruan (Chengxiong Ruan) wrote…
nit: I think it worth to break up this table into smaller test, but feel free to ignore as well because I know there'd be annoying amount of combinations >_>
Good idea. I've split this up into multiple tests. I've also added a new commit that allows the arbiters to be shown with less-verbose opt trees, since we only really care about arbiter indexes
and arbiter constraints
in these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @chengxiong-ruan, @msirek, @RaduBerinde, and @rytaft)
pkg/sql/opt/optbuilder/testdata/upsert, line 3630 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Good idea. I've split this up into multiple tests. I've also added a new commit that allows the arbiters to be shown with less-verbose opt trees, since we only really care about
arbiter indexes
andarbiter constraints
in these tests.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 26 of 26 files at r2, 11 of 11 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @chengxiong-ruan, @msirek, and @RaduBerinde)
VALUES ('ca-central-1', 7, 7, 8, 9) ON CONFLICT DO NOTHING | ||
] OFFSET 2 | ||
---- | ||
· |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This plan is much better than before removing redundant arbiters. It was:
·
• insert
│ into: regional_by_row_table(pk, pk2, a, b, j, crdb_region)
│ arbiter indexes: regional_by_row_table_pkey, regional_by_row_table_b_key, uniq_idx
│ arbiter constraints: regional_by_row_table_pkey, regional_by_row_table_b_key, uniq_idx
│
└── • render
│
└── • distinct
│ distinct on: arbiter_uniq_idx_distinct
│ nulls are distinct
│
└── • render
│
└── • distinct
│ distinct on: arbiter_uniq_idx_distinct
│ nulls are distinct
│
└── • render
│
└── • lookup join (anti)
│ table: regional_by_row_table@uniq_idx (partial index)
│ lookup condition: (column4 = a) AND (crdb_region IN ('ap-southeast-2', 'ca-central-1', 'us-east-1'))
│ pred: column5 > 0
│
└── • lookup join (anti)
│ table: regional_by_row_table@regional_by_row_table_b_key
│ equality cols are key
│ lookup condition: (column5 = b) AND (crdb_region IN ('ap-southeast-2', 'ca-central-1', 'us-east-1'))
│
└── • lookup join (anti)
│ table: regional_by_row_table@regional_by_row_table_pkey
│ equality cols are key
│ lookup condition: (column2 = pk) AND (crdb_region IN ('ap-southeast-2', 'ca-central-1', 'us-east-1'))
│
└── • lookup join (anti)
│ table: regional_by_row_table@uniq_idx (partial index)
│ equality: (column1, column4) = (crdb_region,a)
│ pred: column5 > 0
│
└── • lookup join (anti)
│ table: regional_by_row_table@regional_by_row_table_b_key
│ equality: (column1, column5) = (crdb_region,b)
│ equality cols are key
│
└── • cross join (anti)
│
├── • values
│ size: 6 columns, 1 row
│
└── • scan
missing stats
table: regional_by_row_table@regional_by_row_table_pkey
spans: [/'ca-central-1'/7 - /'ca-central-1'/7]
TFTRs! bors r+ |
Build failed: |
Failed due to flake. bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
We select all unique indexes and unique constraints as arbiters for
INSERT .. ON CONFLICT
statements without explicit conflict columns orconstraints. In some cases, unique constraint arbiters make unique index
arbiters redundant, which results in query plans that perform
unnecessary work to check for conflicts. Most notably, redundant unique
indexes are chosen as arbiters for both partitioned and hash-sharded
indexes which synthesize
UNIQUE WITHOUT INDEX
constraints.For example, consider the table and statement which mimics a
simplified hash-sharded index:
There is no need to use both
shard_a_a_key
anda_key
as arbiters forthe
INSERT
statement because any conflict inshard_a_a_key
will alsobe a conflict in
a_key
. Onlya_key
is required to be an arbiter.This commit removes these redundant arbiter indexes, which improves
query plans.
Fixes #75498
Release note (performance improvement): The optimizer produces more
efficient query plans for
INSERT .. ON CONFLICT
statements that do nothave explicit conflict columns or constraints and are performed on
partitioned tables.