Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: remove redundant arbiter indexes #76961

Merged
merged 2 commits into from
Feb 25, 2022

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Feb 24, 2022

We select all unique indexes and unique constraints as arbiters for
INSERT .. ON CONFLICT statements without explicit conflict columns or
constraints. In some cases, unique constraint arbiters make unique index
arbiters redundant, which results in query plans that perform
unnecessary work to check for conflicts. Most notably, redundant unique
indexes are chosen as arbiters for both partitioned and hash-sharded
indexes which synthesize UNIQUE WITHOUT INDEX constraints.

For example, consider the table and statement which mimics a
simplified hash-sharded index:

CREATE TABLE t (
    a INT,
    shard_a INT,
    UNIQUE INDEX shard_a_a_key (shard_a, a),
    UNIQUE WITHOUT INDEX a_key (a)
)

INSERT INTO t VALUES (1, 2) ON CONFLICT DO NOTHING

There is no need to use both shard_a_a_key and a_key as arbiters for
the INSERT statement because any conflict in shard_a_a_key will also
be a conflict in a_key. Only a_key is required to be an arbiter.

This commit removes these redundant arbiter indexes, which improves
query plans.

Fixes #75498

Release note (performance improvement): The optimizer produces more
efficient query plans for INSERT .. ON CONFLICT statements that do not
have explicit conflict columns or constraints and are performed on
partitioned tables.

@mgartner mgartner requested a review from a team as a code owner February 24, 2022 00:50
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner mgartner force-pushed the remove-unnecessary-arbiters branch from 742ad7f to b1e5494 Compare February 24, 2022 00:50
Copy link
Contributor

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very pleased reading through the pr. :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @msirek, @RaduBerinde, and @rytaft)


pkg/sql/opt/optbuilder/arbiter_set.go, line 179 at r1 (raw file):

// INDEX constraints that are synthesized for partitioned and hash-sharded
// indexes, which the user has no control over.
type minArbiterSet struct {

just a nit feel free to ignore, but to be honest, I'm not sure how much singleIndexConflictOrdsCache gonna save us from memory footprint. But if there is a strong reason that's helpful, I think it worth mentioning in comments at struct def level that either singleIndexConflictOrdsCache or indexConflictOrdsCache is set, not both. This is also relative to how addIndex and addUniqueConstraint cooperate with each other, so worth mentioning that we always finish adding index with addIndex always before logic adding unique constraint starts, that's why we always know if there is one single index or more when we initCache which is required by addUniqueConstraint ...... Sorry it took me quite a few minutes to fully understand, but I think the code is a good read in general.

@chengxiong-ruan
Copy link
Contributor


pkg/sql/opt/optbuilder/testdata/upsert, line 3630 at r1 (raw file):

# columns or constraints.
exec-ddl
CREATE TABLE min_arbiters (

nit: I think it worth to break up this table into smaller test, but feel free to ignore as well because I know there'd be annoying amount of combinations >_>

@mgartner
Copy link
Collaborator Author


pkg/sql/opt/optbuilder/arbiter_set.go, line 179 at r1 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

just a nit feel free to ignore, but to be honest, I'm not sure how much singleIndexConflictOrdsCache gonna save us from memory footprint. But if there is a strong reason that's helpful, I think it worth mentioning in comments at struct def level that either singleIndexConflictOrdsCache or indexConflictOrdsCache is set, not both. This is also relative to how addIndex and addUniqueConstraint cooperate with each other, so worth mentioning that we always finish adding index with addIndex always before logic adding unique constraint starts, that's why we always know if there is one single index or more when we initCache which is required by addUniqueConstraint ...... Sorry it took me quite a few minutes to fully understand, but I think the code is a good read in general.

Yes, after sleeping on it I agree this might be over-engineered. Even without the singleIndexConflictOrdsCache, we avoid allocations when the table has no UNIQUE WITHOUT INDEX constraints, so that's good. I'll simplify this unless someone disagrees.

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @mgartner, @RaduBerinde, and @rytaft)


pkg/sql/opt/optbuilder/arbiter_set.go, line 221 at r1 (raw file):

		m.as.removeIndex(idx)
	}
}

I couldn't find any issues from looking at the code so far, but taking a testing approach, I found a test case I'm wondering should pass:

CREATE TABLE uniq_constraint_and_partial_index (
  k INT PRIMARY KEY,
  a INT,
  b INT,
  UNIQUE INDEX (a) WHERE b > 11,
  UNIQUE WITHOUT INDEX (a) WHERE b < -1,
  UNIQUE WITHOUT INDEX (a) WHERE b > 0
);

INSERT INTO uniq_constraint_and_partial_index VALUES (2, 2, 2)
ON CONFLICT (a) WHERE b > 10 DO NOTHING;

error (42P10): there is no unique or exclusion constraint matching the ON CONFLICT specificationerror (42P10): there is no unique or exclusion constraint matching the ON CONFLICT specification

If the table has only the last unique index, the insert works:

CREATE TABLE uniq_constraint_and_partial_index (
  k INT PRIMARY KEY,
  a INT,
  b INT,
  UNIQUE WITHOUT INDEX (a) WHERE b > 0
);

pkg/sql/opt/optbuilder/arbiter_set.go, line 270 at r1 (raw file):

	// Do nothing if the cache has already been initialized.
	if m.singleIndexConflictOrdsCache.Empty() && m.indexConflictOrdsCache != nil {
		return

nit: Based on the comment, should the condition be if !m.singleIndexConflictOrdsCache.Empty() || m.indexConflictOrdsCache != nil { ?


pkg/sql/opt/optbuilder/mutation_builder_arbiter.go, line 237 at r1 (raw file):

		// the unique constraint ordinals and determining set equality.
		if uniqueConstraint.ColumnCount() != conflictOrds.Len() {
			continue

nit: It's not part of this PR, but this if is a duplicate of the lines of code above it.

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @mgartner, @RaduBerinde, and @rytaft)


pkg/sql/opt/optbuilder/arbiter_set.go, line 221 at r1 (raw file):

CREATE TABLE uniq_constraint_and_partial_index (
k INT PRIMARY KEY,
a INT,
b INT,
UNIQUE INDEX (a) WHERE b > 11,
UNIQUE WITHOUT INDEX (a) WHERE b < -1,
UNIQUE WITHOUT INDEX (a) WHERE b > 0
);

I found the test case fails as an optbuilder test run in Goland, but succeeds when run against a demo cluster, both on the master and remove-unnecessary-arbiters branches.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Awesome!

Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: :shipit: complete! 3 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @mgartner, and @RaduBerinde)


-- commits, line 18 at r1:
nit: column b came out of nowhere (here and in the PR message)


pkg/sql/opt/optbuilder/arbiter_set.go, line 179 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Yes, after sleeping on it I agree this might be over-engineered. Even without the singleIndexConflictOrdsCache, we avoid allocations when the table has no UNIQUE WITHOUT INDEX constraints, so that's good. I'll simplify this unless someone disagrees.

Go for it


pkg/sql/opt/optbuilder/arbiter_set.go, line 229 at r1 (raw file):

// findRedundantIndex returns the first arbiter index which is made redundant by
// the unique constraint. An arbiter index is redundant if:

nit: it's not clear if this list is combined with AND or OR. You could say "... is redundant if both of the following hold:"


pkg/sql/opt/optbuilder/arbiter_set.go, line 280 at r1 (raw file):

		return
	}
	// Otherwise, cache each indexes conflict columns in a map.

nit: indexes -> index's


pkg/sql/opt/optbuilder/arbiter_set.go, line 281 at r1 (raw file):

	}
	// Otherwise, cache each indexes conflict columns in a map.
	m.indexConflictOrdsCache = make(map[cat.IndexOrdinal]util.FastIntSet)

nit: you can do make(map[cat.IndexOrdinal]util.FastIntSet, m.as.indexes.Len())

Arbiter indexes and unique constraints are now shown in optimizer trees,
regardless of format flags.

Release note: None
We select all unique indexes and unique constraints as arbiters for
`INSERT .. ON CONFLICT` statements without explicit conflict columns or
constraints. In some cases, unique constraint arbiters make unique index
arbiters redundant, which results in query plans that perform
unnecessary work to check for conflicts. Most notably, redundant unique
indexes are chosen as arbiters for both partitioned and hash-sharded
indexes which synthesize `UNIQUE WITHOUT INDEX` constraints.

For example, consider the table and statement which mimics a
simplified hash-sharded index:

    CREATE TABLE t (
        a INT,
        shard_a INT,
        UNIQUE INDEX shard_a_a_key (shard_a, a),
        UNIQUE WITHOUT INDEX a_key (a)
    )

    INSERT INTO t VALUES (1, 2) ON CONFLICT DO NOTHING

There is no need to use both `shard_a_a_key` and `a_key` as arbiters for
the `INSERT` statement because any conflict in `shard_a_a_key` will also
be a conflict in `a_key`. Only `a_key` is required to be an arbiter.

This commit removes these redundant arbiter indexes, which improves
query plans.

Fixes cockroachdb#75498

Release note (performance improvement): The optimizer produces more
efficient query plans for `INSERT .. ON CONFLICT` statements that do not
have explicit conflict columns or constraints and are performed on
partitioned tables.
@mgartner mgartner force-pushed the remove-unnecessary-arbiters branch from b1e5494 to 8122043 Compare February 24, 2022 18:57
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the great feedback!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @chengxiong-ruan, @msirek, @RaduBerinde, and @rytaft)


-- commits, line 18 at r1:

Previously, rytaft (Rebecca Taft) wrote…

nit: column b came out of nowhere (here and in the PR message)

Fixed.


pkg/sql/opt/optbuilder/arbiter_set.go, line 179 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Go for it

Done.


pkg/sql/opt/optbuilder/arbiter_set.go, line 221 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

CREATE TABLE uniq_constraint_and_partial_index (
k INT PRIMARY KEY,
a INT,
b INT,
UNIQUE INDEX (a) WHERE b > 11,
UNIQUE WITHOUT INDEX (a) WHERE b < -1,
UNIQUE WITHOUT INDEX (a) WHERE b > 0
);

I found the test case fails as an optbuilder test run in Goland, but succeeds when run against a demo cluster, both on the master and remove-unnecessary-arbiters branches.

Thanks for finding this! I've created an issue to track it: #76994


pkg/sql/opt/optbuilder/arbiter_set.go, line 229 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: it's not clear if this list is combined with AND or OR. You could say "... is redundant if both of the following hold:"

Done.


pkg/sql/opt/optbuilder/arbiter_set.go, line 270 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

nit: Based on the comment, should the condition be if !m.singleIndexConflictOrdsCache.Empty() || m.indexConflictOrdsCache != nil { ?

Good catch. I've removed singleIndexConflictOrdsCache entirely.


pkg/sql/opt/optbuilder/arbiter_set.go, line 280 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: indexes -> index's

Done.


pkg/sql/opt/optbuilder/arbiter_set.go, line 281 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: you can do make(map[cat.IndexOrdinal]util.FastIntSet, m.as.indexes.Len())

Good catch. Done.


pkg/sql/opt/optbuilder/mutation_builder_arbiter.go, line 237 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

nit: It's not part of this PR, but this if is a duplicate of the lines of code above it.

Good catch, fixed.


pkg/sql/opt/optbuilder/testdata/upsert, line 3630 at r1 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

nit: I think it worth to break up this table into smaller test, but feel free to ignore as well because I know there'd be annoying amount of combinations >_>

Good idea. I've split this up into multiple tests. I've also added a new commit that allows the arbiters to be shown with less-verbose opt trees, since we only really care about arbiter indexes and arbiter constraints in these tests.

Copy link
Contributor

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @chengxiong-ruan, @msirek, @RaduBerinde, and @rytaft)


pkg/sql/opt/optbuilder/testdata/upsert, line 3630 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Good idea. I've split this up into multiple tests. I've also added a new commit that allows the arbiters to be shown with less-verbose opt trees, since we only really care about arbiter indexes and arbiter constraints in these tests.

nice!

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 26 of 26 files at r2, 11 of 11 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @chengxiong-ruan, @msirek, and @RaduBerinde)

VALUES ('ca-central-1', 7, 7, 8, 9) ON CONFLICT DO NOTHING
] OFFSET 2
----
·
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan is much better than before removing redundant arbiters. It was:

·
• insert
│ into: regional_by_row_table(pk, pk2, a, b, j, crdb_region)
│ arbiter indexes: regional_by_row_table_pkey, regional_by_row_table_b_key, uniq_idx
│ arbiter constraints: regional_by_row_table_pkey, regional_by_row_table_b_key, uniq_idx
│
└── • render
    │
    └── • distinct
        │ distinct on: arbiter_uniq_idx_distinct
        │ nulls are distinct
        │
        └── • render
            │
            └── • distinct
                │ distinct on: arbiter_uniq_idx_distinct
                │ nulls are distinct
                │
                └── • render
                    │
                    └── • lookup join (anti)
                        │ table: regional_by_row_table@uniq_idx (partial index)
                        │ lookup condition: (column4 = a) AND (crdb_region IN ('ap-southeast-2', 'ca-central-1', 'us-east-1'))
                        │ pred: column5 > 0
                        │
                        └── • lookup join (anti)
                            │ table: regional_by_row_table@regional_by_row_table_b_key
                            │ equality cols are key
                            │ lookup condition: (column5 = b) AND (crdb_region IN ('ap-southeast-2', 'ca-central-1', 'us-east-1'))
                            │
                            └── • lookup join (anti)
                                │ table: regional_by_row_table@regional_by_row_table_pkey
                                │ equality cols are key
                                │ lookup condition: (column2 = pk) AND (crdb_region IN ('ap-southeast-2', 'ca-central-1', 'us-east-1'))
                                │
                                └── • lookup join (anti)
                                    │ table: regional_by_row_table@uniq_idx (partial index)
                                    │ equality: (column1, column4) = (crdb_region,a)
                                    │ pred: column5 > 0
                                    │
                                    └── • lookup join (anti)
                                        │ table: regional_by_row_table@regional_by_row_table_b_key
                                        │ equality: (column1, column5) = (crdb_region,b)
                                        │ equality cols are key
                                        │
                                        └── • cross join (anti)
                                            │
                                            ├── • values
                                            │     size: 6 columns, 1 row
                                            │
                                            └── • scan
                                                  missing stats
                                                  table: regional_by_row_table@regional_by_row_table_pkey
                                                  spans: [/'ca-central-1'/7 - /'ca-central-1'/7]

@mgartner
Copy link
Collaborator Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 24, 2022

Build failed:

@mgartner
Copy link
Collaborator Author

Failed due to flake.

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 25, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 25, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 25, 2022

Build succeeded:

@craig craig bot merged commit 94e64ce into cockroachdb:master Feb 25, 2022
@mgartner mgartner deleted the remove-unnecessary-arbiters branch February 25, 2022 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hash Sharded Index doesn't need to be an arbiter if Unique Without Index is used as an arbiter.
5 participants