-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt/props: use EquivSet in FuncDepSet for tracking equivalences #117272
Conversation
Previously, `FuncDepSet` was fairly wasteful in how it tracked sets of equivalent columns: for each column in the equiv group, an FD was maintained from that column to all other columns in the group. This meant that there were `2n` `ColSets` for each equiv group (where `n` is the number of columns in the group). This patch modifies `FuncDepSet` and its internals to use an `EquivSet` instead, which keeps a single `ColSet` for each equiv group. This significantly cuts down on allocations for queries with many columns and equalities, both because less `ColSets` spill to heap, and because less FDs are added to the `deps` slice. Fixes cockroachdb#83963 Release note: None
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
TODO: unit tests for the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! I agree that we should have extensive unit tests for the new methods in EquivSet
.
Now that you've got a satisfying benchmark result, and you've mapped out all the functionality EquivSet
needs, I recommend trying to split this up into multiple commits. You could add methods to EquivSet
in a separate commit (or even multiple commits) along with unit tests, and then a final commit to use EquivSet
in FuncDepSet
. Even that last commit will be a bit large, and ideally it could be done incrementally, but I don't see any easy way to do that—it may be possible for the two types of equiv sets to live side-by-side and incrementally phase out the old one, but use your best judgement on that.
Reviewed 1 of 58 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)
pkg/sql/opt/props/equiv_set.go
line 21 at r1 (raw file):
// TODO(drewk): incorporate EquivSets into FuncDepSets. type EquivSet struct { buf [equalityBufferSize]opt.ColSet
Why did you remove buf
? Does it add too much complexity or not provide any benefit?
pkg/sql/opt/props/equiv_set.go
line 22 at r1 (raw file):
// queries about which columns are equivalent to one another. Equivalence groups // are always non-empty and disjoint. type EquivSet struct {
I had a thought that EquivGroups
might be a better name. What do you think?
pkg/sql/opt/props/equiv_set.go
line 55 at r1 (raw file):
func (eq *EquivSet) AreColsEquiv(left, right opt.ColumnID) bool { if buildutil.CrdbTestBuild { defer eq.verify()
Is this assertion needed here if this method doesn't mutate the set? I suppose it doesn't hurt to have and it gives us additional coverage.
pkg/sql/opt/props/equiv_set.go
line 72 at r1 (raw file):
// Empty returns true if the set stores no equalities. func (eq *EquivSet) empty() bool {
The external API would be better defined by exporting the methods that are used outside this file (besides test-only ones) and that are safe to use without intimate knowledge of the inner workings of EquivSet. So even though empty
and count
and get
are only used in the same package in func_dep.go
, naming them Empy
and Count
and Get
denotes that they are part of the "public" API for EquivSet.
pkg/sql/opt/props/equiv_set.go
line 77 at r1 (raw file):
// Count returns the number of equiv groups stored in the set. func (eq *EquivSet) count() int {
nit: Name this GroupCount()
pkg/sql/opt/props/equiv_set.go
line 83 at r1 (raw file):
// get returns the equiv group at the given index. The returned ColSet should be // considered immutable. func (eq *EquivSet) get(idx int) opt.ColSet {
nit: Name this Group(idx int)
pkg/sql/opt/props/equiv_set.go
line 166 at r1 (raw file):
if eq.groups[idx].Intersects(eq.groups[i]) { eq.groups[idx] = eq.groups[idx].Union(eq.groups[i]) eq.groups[i] = eq.groups[len(eq.groups)-1]
Could this be broken? After moving the last group, let's call it lg
, to position i
we'll increment i
and effectively skip over lg
. We'll never check if lg
intersects eq.groups[idx]
. That would mean that some groups might not be merged when they could be.
pkg/sql/opt/props/equiv_set.go
line 229 at r1 (raw file):
// makePartition divides the equiv groups according to the given columns. If an // equiv group intersects the given ColSet but is not a subset, it is split into // the intersection and difference with the given ColSet.
I think this deserves more explanation. An example would be helpful. Why the name partitionBy
be better?
pkg/sql/opt/props/equiv_set.go
line 236 at r1 (raw file):
for i := len(eq.groups) - 1; i >= 0; i-- { if eq.groups[i].Intersects(cols) && !eq.groups[i].SubsetOf(cols) { // This group references both sides of the join, so split it.
The term "join" seems out of place here, or is lacking some context.
pkg/sql/opt/props/func_dep.go
line 647 at r1 (raw file):
if !inToSet { for j := 0; j < f.equiv.count(); j++ { if f.equiv.get(j).Contains(i) {
nit: Adding a EquivSet.Contains(col opt.ColumnID)
method might be a nice addition to the API and simplify this a bit.
pkg/sql/opt/props/func_dep.go
line 1685 at r1 (raw file):
needComma = true from := opt.MakeColSet(col) fmt.Fprintf(b, "%s==%s", from, group.Difference(from))
nit: this can be part of a Format
or String
method of EquivSet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgartner I'll respond to the comments here, but open up a new PR elsewhere.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
pkg/sql/opt/props/equiv_set.go
line 21 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Why did you remove
buf
? Does it add too much complexity or not provide any benefit?
Yeah, I don't expect much benefit since it's (almost) always embedded in a struct on the heap that gets reused, and it requires us to worry about properly initializing the set. This way, using the zero value doesn't leave any memory unused.
pkg/sql/opt/props/equiv_set.go
line 22 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I had a thought that
EquivGroups
might be a better name. What do you think?
Sure, works for me.
pkg/sql/opt/props/equiv_set.go
line 55 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is this assertion needed here if this method doesn't mutate the set? I suppose it doesn't hurt to have and it gives us additional coverage.
Having the assertion in non-mutating methods helps to ensure that no one is mutating any of the ColSets (not just the EquivGroups
). It actually caught a bug or two while I was writing tests.
pkg/sql/opt/props/equiv_set.go
line 72 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
The external API would be better defined by exporting the methods that are used outside this file (besides test-only ones) and that are safe to use without intimate knowledge of the inner workings of EquivSet. So even though
empty
andcount
andget
are only used in the same package infunc_dep.go
, naming themEmpy
andCount
andGet
denotes that they are part of the "public" API for EquivSet.
Good idea, done.
pkg/sql/opt/props/equiv_set.go
line 77 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: Name this
GroupCount()
Done.
pkg/sql/opt/props/equiv_set.go
line 83 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: Name this
Group(idx int)
Done.
pkg/sql/opt/props/equiv_set.go
line 166 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Could this be broken? After moving the last group, let's call it
lg
, to positioni
we'll incrementi
and effectively skip overlg
. We'll never check iflg
intersectseq.groups[idx]
. That would mean that some groups might not be merged when they could be.
You're right, this is broken. I think this means we would potentially have redundant filters in re-ordered joins, and we might fail to push down some join filters. I'll add a commit fixing it that we could backport.
pkg/sql/opt/props/equiv_set.go
line 229 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I think this deserves more explanation. An example would be helpful. Why the name
partitionBy
be better?
Added an example, and changed the name. LMK if you think anything more would be helpful.
pkg/sql/opt/props/equiv_set.go
line 236 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
The term "join" seems out of place here, or is lacking some context.
Good point, I'll remove it.
pkg/sql/opt/props/func_dep.go
line 647 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: Adding a
EquivSet.Contains(col opt.ColumnID)
method might be a nice addition to the API and simplify this a bit.
Nice idea, that does simplify things.
pkg/sql/opt/props/func_dep.go
line 1685 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: this can be part of a
Format
orString
method ofEquivSet
.
There's already a String method for debugging that uses a more condensed format. I'd prefer to keep this logic here, since it's "func-dep only", and doesn't really make sense to use in other contexts.
Previously,
FuncDepSet
was fairly wasteful in how it tracked sets of equivalent columns: for each column in the equiv group, an FD was maintained from that column to all other columns in the group. This meant that there were2n
ColSets
for each equiv group (wheren
is the number of columns in the group).This patch modifies
FuncDepSet
and its internals to use anEquivSet
instead, which keeps a singleColSet
for each equiv group. This significantly cuts down on allocations for queries with many columns and equalities, both because lessColSets
spill to heap, and because less FDs are added to thedeps
slice.Fixes #83963
Release note: None