Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt/props: use EquivGroups in FuncDepSet for tracking equivalences #137571

Merged
merged 4 commits into from
Jan 4, 2025

Conversation

DrewKimball
Copy link
Collaborator

props: don't leave unmerged equiv groups in EquivSet

This commit fixes a bug in EquivSet.tryMergeGroups, which could cause
an invariant (that equiv groups are non-intersecting) to be violated.
This would potentially cause re-ordered joins to have redundant equality
filters, and prevent join filter push-down in rare cases. This commit
also adds verification to the set for test builds, and a unit test for
the Add method.

Fixes #137381

Release note: None

props: rename EquivSet to EquivGroups

This commit renames EquivSet to EquivGroups to better reflect the
fact that it tracks groups of equivalent groups, rather than a single
set of equivalent columns.

Epic: None

Release note: None

props: remove initial buffer from EquivGroups

This commit removes the inline buffer from EquivGroups, as well as the
NewEquivGroups() that was previously used to set up the buffer. This
simplifies usage, and doesn't appear to affect performance when the
set is used in JoinOrderBuilder or in FuncDepSet.

Informs #83963

Release note: None

props: add methods to EquivGroups for use in FuncDepSet

This commit adds several new methods along with unit tests to EquivGroups
to prepare its use in tracking equivalencies in FuncDepSet.

Informs #83963

Release note: None

opt/props: use EquivGroups in FuncDepSet for tracking equivalences

Previously, FuncDepSet was fairly wasteful in how it tracked sets of
equivalent columns: for each column in the equiv group, an FD was maintained
from that column to all other columns in the group. This meant that there
were 2n ColSets for each equiv group (where n is the number of columns
in the group).

This patch modifies FuncDepSet and its internals to use props.EquivGroups
instead, which keeps a single ColSet for each equiv group. This significantly
cuts down on allocations for queries with many columns and equalities, both
because less ColSets spill to heap, and because less FDs are added to
the deps slice.

Fixes #83963

Release note: None

@DrewKimball DrewKimball requested review from mgartner and a team December 17, 2024 00:06
@DrewKimball DrewKimball requested a review from a team as a code owner December 17, 2024 00:06
Copy link

blathers-crl bot commented Dec 17, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DrewKimball
Copy link
Collaborator Author

Continuation of #117272. First commit is #137558. Here's an updated slow-queries benchmark against master:

name                                       old time/op    new time/op    delta
SlowQueries/slow-query-1/reorder-join-0-8    1.03ms ± 1%    0.91ms ± 4%  -11.62%  (p=0.000 n=9+10)
SlowQueries/slow-query-1/reorder-join-8-8    13.5ms ± 0%    11.7ms ± 0%  -13.53%  (p=0.000 n=9+10)
SlowQueries/slow-query-2/reorder-join-0-8    1.97ms ± 1%    1.69ms ± 1%  -14.50%  (p=0.000 n=9+9)
SlowQueries/slow-query-2/reorder-join-8-8     217ms ± 2%     204ms ± 2%   -6.00%  (p=0.000 n=10+10)
SlowQueries/slow-query-3/reorder-join-0-8    43.0ms ± 1%    43.0ms ± 2%     ~     (p=1.000 n=10+10)
SlowQueries/slow-query-3/reorder-join-8-8    45.6ms ± 1%    45.1ms ± 2%   -1.10%  (p=0.023 n=10+10)
SlowQueries/slow-query-4/reorder-join-0-8    60.9ms ± 0%     6.4ms ± 1%  -89.45%  (p=0.000 n=8+9)
SlowQueries/slow-query-4/reorder-join-8-8    61.2ms ± 0%     6.4ms ± 0%  -89.50%  (p=0.000 n=8+8)
SlowQueries/slow-query-5/reorder-join-0-8    40.9ms ± 0%    34.9ms ± 1%  -14.80%  (p=0.000 n=8+9)
SlowQueries/slow-query-5/reorder-join-8-8     724ms ± 1%     668ms ± 0%   -7.77%  (p=0.000 n=10+8)
SlowQueries/slow-query-6/reorder-join-0-8    24.7ms ± 1%    14.7ms ± 1%  -40.66%  (p=0.000 n=9+10)
SlowQueries/slow-query-6/reorder-join-8-8     990ms ± 5%     776ms ± 2%  -21.60%  (p=0.000 n=10+10)
SlowQueries/slow-query-7/reorder-join-0-8    34.4ms ± 1%    19.4ms ± 0%  -43.62%  (p=0.000 n=9+10)
SlowQueries/slow-query-7/reorder-join-8-8     1.25s ± 1%     0.96s ± 1%  -23.67%  (p=0.000 n=9+8)

name                                       old alloc/op   new alloc/op   delta
SlowQueries/slow-query-1/reorder-join-0-8     690kB ± 0%     571kB ± 0%  -17.21%  (p=0.000 n=8+9)
SlowQueries/slow-query-1/reorder-join-8-8    6.66MB ± 0%    5.93MB ± 0%  -10.94%  (p=0.000 n=8+7)
SlowQueries/slow-query-2/reorder-join-0-8     957kB ± 0%     816kB ± 0%  -14.73%  (p=0.000 n=9+10)
SlowQueries/slow-query-2/reorder-join-8-8    51.1MB ± 0%    52.9MB ± 0%   +3.48%  (p=0.000 n=10+10)
SlowQueries/slow-query-3/reorder-join-0-8    39.9MB ± 0%    39.7MB ± 0%   -0.39%  (p=0.000 n=10+8)
SlowQueries/slow-query-3/reorder-join-8-8    42.0MB ± 0%    41.7MB ± 0%   -0.88%  (p=0.000 n=10+10)
SlowQueries/slow-query-4/reorder-join-0-8    30.3MB ± 0%     4.1MB ± 0%  -86.50%  (p=0.000 n=10+10)
SlowQueries/slow-query-4/reorder-join-8-8    30.3MB ± 0%     4.1MB ± 0%  -86.50%  (p=0.000 n=8+8)
SlowQueries/slow-query-5/reorder-join-0-8    33.4MB ± 0%    32.3MB ± 0%   -3.14%  (p=0.000 n=9+10)
SlowQueries/slow-query-5/reorder-join-8-8     256MB ± 0%     249MB ± 0%   -2.62%  (p=0.000 n=10+10)
SlowQueries/slow-query-6/reorder-join-0-8    10.5MB ± 0%     7.7MB ± 0%  -27.12%  (p=0.000 n=9+10)
SlowQueries/slow-query-6/reorder-join-8-8     424MB ± 1%     338MB ± 0%  -20.08%  (p=0.000 n=10+10)
SlowQueries/slow-query-7/reorder-join-0-8    14.0MB ± 0%    10.2MB ± 0%  -27.24%  (p=0.000 n=8+8)
SlowQueries/slow-query-7/reorder-join-8-8     534MB ± 0%     409MB ± 0%  -23.39%  (p=0.000 n=10+8)

name                                       old allocs/op  new allocs/op  delta
SlowQueries/slow-query-1/reorder-join-0-8     3.86k ± 0%     3.25k ± 0%  -15.79%  (p=0.000 n=8+9)
SlowQueries/slow-query-1/reorder-join-8-8     65.7k ± 0%     52.5k ± 0%  -20.06%  (p=0.000 n=8+7)
SlowQueries/slow-query-2/reorder-join-0-8     3.96k ± 0%     3.85k ± 0%   -2.63%  (p=0.000 n=9+10)
SlowQueries/slow-query-2/reorder-join-8-8      265k ± 0%      271k ± 0%   +2.36%  (p=0.000 n=10+10)
SlowQueries/slow-query-3/reorder-join-0-8      300k ± 0%      296k ± 0%   -1.07%  (p=0.000 n=10+8)
SlowQueries/slow-query-3/reorder-join-8-8      320k ± 0%      312k ± 0%   -2.41%  (p=0.000 n=10+10)
SlowQueries/slow-query-4/reorder-join-0-8      166k ± 0%       28k ± 0%  -83.02%  (p=0.000 n=10+10)
SlowQueries/slow-query-4/reorder-join-8-8      166k ± 0%       28k ± 0%  -83.02%  (p=0.000 n=8+8)
SlowQueries/slow-query-5/reorder-join-0-8      210k ± 0%      200k ± 0%   -5.00%  (p=0.000 n=9+10)
SlowQueries/slow-query-5/reorder-join-8-8     4.43M ± 0%     4.38M ± 0%   -1.17%  (p=0.000 n=10+10)
SlowQueries/slow-query-6/reorder-join-0-8      110k ± 0%       76k ± 0%  -31.18%  (p=0.000 n=8+9)
SlowQueries/slow-query-6/reorder-join-8-8     6.90M ± 1%     5.46M ± 0%  -20.84%  (p=0.000 n=10+9)
SlowQueries/slow-query-7/reorder-join-0-8      175k ± 0%      116k ± 0%  -33.65%  (p=0.000 n=8+8)
SlowQueries/slow-query-7/reorder-join-8-8     10.2M ± 0%      7.7M ± 0%  -24.02%  (p=0.000 n=10+8)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Very nice! 🚀

Reviewed 6 of 6 files at r2, 3 of 3 files at r3, 4 of 4 files at r4, 60 of 60 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)


pkg/sql/opt/props/equiv_set.go line 37 at r4 (raw file):

	if buildutil.CrdbTestBuild {
		defer eq.verify()
	}

nit: Methods that don't mutate the EquivGroup probably don't need to verify the set, do they? Or is this to catch cases where the user incorrectly mutates a ColSet returned from e.g. Group without copying it? In that case, it might be worth a comment mentioning that—Maybe a single comment on verify instead of comments throughout? Up to you.


pkg/sql/opt/props/equiv_set.go line 62 at r4 (raw file):

// GroupForCol returns the group of columns equivalent to the given column. It
// returns the empty set if no such group exists. The returned should not be

nit: "The returned should.." => "The returned ColSet should..."


pkg/sql/opt/props/equiv_set.go line 103 at r4 (raw file):

		}
		if eq.groups[i].Contains(right) {
			return eq.groups[i].Contains(left)

nit: This can be return false since we already checked if the group contains left.


pkg/sql/opt/props/equiv_set.go line 251 at r4 (raw file):

// * In the example, the (1-3) group is split into (1) and (2,3). Since the (1)
// group only has a single column, it is discarded as a trivial equivalence.
// * The (4-8) group is split into (4,8) and (5,6). Since both subsets have at

nit: (4,8) => (4,7,8)


pkg/sql/opt/props/equiv_set.go line 286 at r4 (raw file):

// AppendFromDisjoint unions the equiv groups from the given EquivGroups with
// this one, assuming the groups are disjoint.
func (eq *EquivGroups) AppendFromDisjoint(other *EquivGroups) {

nit: make the comment louder about the fact that the groups must be disjoint.


pkg/sql/opt/props/func_dep.go line 648 at r5 (raw file):

	for i, ok := cols.Next(0); ok; i, ok = cols.Next(i + 1) {
		// First check if the column is present in any "to" set of a dependency.
		// If not, then it is not redundant and must remain in the set. This is

nit: Update this comment to mention the equiv groups


pkg/sql/opt/props/func_dep.go line 1690 at r5 (raw file):

			needComma = true
			from := opt.MakeColSet(col)
			fmt.Fprintf(b, "%s==%s", from, group.Difference(from))

This verbose formatting of equiv sets is now vestigial. I think it's good to not create extra test output churn in this commit, but maybe we should consider a new format for equivalencies in the near future, like:

fd: (1)-->(2,3), ==(4,5), ==(9-11)

pkg/sql/opt/props/func_dep.go line 1821 at r5 (raw file):

	// Non-constant FDs are weaker than equivalence constraints.
	if f.equiv.AreAllColsEquiv(to.Union(from)) {

Something to consider if we see a lot of allocations here: it may be helpful to have a 2-set version of AreAllColsEquiv to avoid Union.


pkg/sql/opt/props/func_dep.go line 1889 at r5 (raw file):

			break
		}
	}

nit: In a subsequent PR/commit, consider returning the added equiv colset from AddNoCopy so we don't have to search for it here.


pkg/sql/opt/props/equiv_set_test.go line 73 at r3 (raw file):

	}

	var equivSet EquivGroups

nit: rename equivSet in the renaming commit.


pkg/sql/opt/props/equiv_set.go line 187 at r5 (raw file):

	}
	for i := range fdset.equiv.groups {
		// No copy is necessary because the equiv groups are immutable.

Is it really "necessary", or just "safe"?

This commit renames `EquivSet` to `EquivGroups` to better reflect the
fact that it tracks _groups_ of equivalent groups, rather than a single
set of equivalent columns.

Epic: None

Release note: None
This commit removes the inline buffer from `EquivGroups`, as well as the
`NewEquivGroups()` that was previously used to set up the buffer. This
simplifies usage, and doesn't appear to affect performance when the
set is used in `JoinOrderBuilder` or in `FuncDepSet`.

Informs cockroachdb#83963

Release note: None
This commit adds several new methods along with unit tests to `EquivGroups`
to prepare its use in tracking equivalencies in `FuncDepSet`.

Informs cockroachdb#83963

Release note: None
Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/opt/props/equiv_set.go line 37 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: Methods that don't mutate the EquivGroup probably don't need to verify the set, do they? Or is this to catch cases where the user incorrectly mutates a ColSet returned from e.g. Group without copying it? In that case, it might be worth a comment mentioning that—Maybe a single comment on verify instead of comments throughout? Up to you.

Yes, that's exactly why. Done.


pkg/sql/opt/props/equiv_set.go line 62 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: "The returned should.." => "The returned ColSet should..."

Done.


pkg/sql/opt/props/equiv_set.go line 103 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: This can be return false since we already checked if the group contains left.

Oh, good point. I ended up restructuring it to check containsLeft || containsRight, and then just return containsLeft && containsRight.


pkg/sql/opt/props/equiv_set.go line 251 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: (4,8) => (4,7,8)

Done. Good catch


pkg/sql/opt/props/equiv_set.go line 286 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: make the comment louder about the fact that the groups must be disjoint.

Done.


pkg/sql/opt/props/equiv_set.go line 187 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Is it really "necessary", or just "safe"?

Confusing phrasing. I just meant we don't have to copy in order to be safe. I changed the wording a bit.


pkg/sql/opt/props/func_dep.go line 648 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: Update this comment to mention the equiv groups

Done.


pkg/sql/opt/props/func_dep.go line 1690 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This verbose formatting of equiv sets is now vestigial. I think it's good to not create extra test output churn in this commit, but maybe we should consider a new format for equivalencies in the near future, like:

fd: (1)-->(2,3), ==(4,5), ==(9-11)

Yeah, I like that. Or maybe like this:

fd: (1)-->(2,3) equiv: (4,5), (9-11)

pkg/sql/opt/props/func_dep.go line 1821 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Something to consider if we see a lot of allocations here: it may be helpful to have a 2-set version of AreAllColsEquiv to avoid Union.

Good idea. I wonder if there are any other places where that could help. Do you think it's worth leaving a TODO?


pkg/sql/opt/props/func_dep.go line 1889 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: In a subsequent PR/commit, consider returning the added equiv colset from AddNoCopy so we don't have to search for it here.

Good idea. I added it to the commit that introduces the new methods and tests.


pkg/sql/opt/props/equiv_set_test.go line 73 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: rename equivSet in the renaming commit.

Done.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 3 files at r7, 3 of 4 files at r8, 63 of 63 files at r9, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)


pkg/sql/opt/props/func_dep.go line 1821 at r5 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Good idea. I wonder if there are any other places where that could help. Do you think it's worth leaving a TODO?

Probably not necessary.

Previously, `FuncDepSet` was fairly wasteful in how it tracked sets of
equivalent columns: for each column in the equiv group, an FD was maintained
from that column to all other columns in the group. This meant that there
were `2n` `ColSets` for each equiv group (where `n` is the number of columns
in the group).

This patch modifies `FuncDepSet` and its internals to use `props.EquivGroups`
instead, which keeps a single `ColSet` for each equiv group. This significantly
cuts down on allocations for queries with many columns and equalities, both
because less `ColSets` spill to heap, and because less FDs are added to
the `deps` slice.

Fixes cockroachdb#83963

Release note: None
@DrewKimball
Copy link
Collaborator Author

TFTR!

bors r+

@craig craig bot merged commit bc6d6e0 into cockroachdb:master Jan 4, 2025
22 checks passed
@DrewKimball DrewKimball deleted the equiv-fd-2 branch January 4, 2025 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants