-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: Add more efficient implementation of min, max, sum, avg aggregates when used as window functions. #26988
Conversation
@solongordon take a look if you're interested in what I've been working on :) |
26a1451
to
a2c99f2
Compare
Reviewed 2 of 4 files at r1. pkg/sql/window.go, line 772 at r1 (raw file):
Per my comment below, I think we should explore whether this entire block can be removed. pkg/sql/sem/builtins/window_frame_builtins.go, line 18 at r1 (raw file):
The spacing on these imports got a little messed up. pkg/sql/sem/builtins/window_frame_builtins.go, line 33 at r1 (raw file):
Is there a reason to ever not replace the On that note, I've gotten a little confused by pkg/sql/sem/builtins/window_frame_builtins.go, line 39 at r1 (raw file):
It would be nice if we only had to use pkg/sql/sem/builtins/window_frame_builtins.go, line 115 at r1 (raw file):
Use a pkg/sql/sem/builtins/window_frame_builtins_test.go, line 35 at r1 (raw file):
Nice testing! Comments from Reviewable |
Review status: complete! 0 of 0 LGTMs obtained pkg/sql/sem/builtins/window_frame_builtins.go, line 39 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
That will also allow you to remove Comments from Reviewable |
Review status: complete! 0 of 0 LGTMs obtained pkg/sql/window.go, line 772 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I added a field which indicated whether resetting behavior is required. Without it, pkg/sql/sem/builtins/window_frame_builtins.go, line 18 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Nice catch, thanks. pkg/sql/sem/builtins/window_frame_builtins.go, line 33 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think these were great suggestions, and thanks for the clarifications. pkg/sql/sem/builtins/window_frame_builtins.go, line 39 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Another great idea, thanks! Not sure what the appropriate initial buffer size should be. pkg/sql/sem/builtins/window_frame_builtins.go, line 115 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/sql/sem/builtins/window_frame_builtins_test.go, line 35 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Thanks :) Comments from Reviewable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 6 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/window.go, line 772 at r1 (raw file):
Previously, yuzefovich wrote…
I added a field which indicated whether resetting behavior is required. Without it,
framableAggregateWindowFunc
would always reset the aggregate, even when it is not necessary which would result in quadratic performance for aggregates (that don't have a sliding window implementation) when default frame is being used. Please let me know if you see a better way.
This is much better. I'd just make it clear in this comment that not resetting is an optimization.
pkg/sql/sem/builtins/window_frame_builtins.go, line 35 at r2 (raw file):
// ringBuffer is a deque of indexedValue's maintained over a ring buffer. type ringBuffer struct {
It's probably worth adding some tests for this type itself.
pkg/sql/sem/builtins/window_frame_builtins.go, line 40 at r2 (raw file):
tail int // the index of the first position right after the end of the deque. empty bool // indicates whether the deque is empty, necessary to distinguish
Make this a useful empty value by
- inverting this flag
- handling
cap(r.values) == 0
inringBuffer.add
and having some minimum allocation size.8
sounds good.
Useful empty values are a general idiom in Go that end up making code a lot nicer. You then don't need to initialize this at all in makeSlidingWindow
.
pkg/sql/sem/builtins/window_frame_builtins.go, line 63 at r2 (raw file):
if r.len() == cap(r.values) { newValues := make([]*indexedValue, 2*cap(r.values)) for pos := 0; pos < cap(r.values); pos++ {
Can we use two calls to copy
here? That would be faster because it could use a memcpy directly.
pkg/sql/sem/builtins/window_frame_builtins.go, line 144 at r2 (raw file):
func (sw *slidingWindow) add(iv *indexedValue) { var newEndIdx int for newEndIdx = sw.values.len() - 1; newEndIdx >= 0; newEndIdx-- {
nit: Delete newEndIdx
and replace it with i
, which is scoped only to this loop.
for i := sw.values.len() - 1; ...
pkg/sql/sem/builtins/window_frame_builtins.go, line 158 at r2 (raw file):
func (sw *slidingWindow) removeAllBefore(idx int) { var newStartIdx int for newStartIdx = 0; newStartIdx < sw.values.len() && newStartIdx < idx; newStartIdx++ {
Same thing here. No need for newStartIdx
to be scoped outside of this loop. Once it's a loop index variable, might as well call it i
to avoid noise.
pkg/sql/sem/builtins/window_frame_builtins.go, line 194 at r2 (raw file):
w.prevEnd = end if w.sw.values.empty {
.len() == 0
. Don't break the abstraction you fought so hard for!
pkg/sql/sem/builtins/window_frame_builtins.go, line 239 at r2 (raw file):
return w.agg.Add(ctx, tree.NewDFloat(-*v)) case *tree.DInterval: return w.agg.Add(ctx, &tree.DInterval{Duration: duration.Duration{}.Sub(v.Duration)})
-v.Duration
doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/window.go, line 772 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is much better. I'd just make it clear in this comment that not resetting is an optimization.
Done.
pkg/sql/sem/builtins/window_frame_builtins.go, line 35 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It's probably worth adding some tests for this type itself.
Done.
pkg/sql/sem/builtins/window_frame_builtins.go, line 40 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Make this a useful empty value by
- inverting this flag
- handling
cap(r.values) == 0
inringBuffer.add
and having some minimum allocation size.8
sounds good.Useful empty values are a general idiom in Go that end up making code a lot nicer. You then don't need to initialize this at all in
makeSlidingWindow
.
I agree, this way the code looks nicer.
pkg/sql/sem/builtins/window_frame_builtins.go, line 63 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we use two calls to
copy
here? That would be faster because it could use a memcpy directly.
Yes, we can.
pkg/sql/sem/builtins/window_frame_builtins.go, line 144 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: Delete
newEndIdx
and replace it withi
, which is scoped only to this loop.
for i := sw.values.len() - 1; ...
Done. At some point I was using it outside of the loop, but then changed the code and forgot the restrict the scope.
pkg/sql/sem/builtins/window_frame_builtins.go, line 158 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Same thing here. No need for
newStartIdx
to be scoped outside of this loop. Once it's a loop index variable, might as well call iti
to avoid noise.
Done.
pkg/sql/sem/builtins/window_frame_builtins.go, line 194 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
.len() == 0
. Don't break the abstraction you fought so hard for!
Nice catch, thanks.
pkg/sql/sem/builtins/window_frame_builtins.go, line 239 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
-v.Duration
doesn't work?
As far as options I tried, it doesn't.
However, I'm not sure whether I should include changes related to Interval
type since supporting this type requires more work - for instance, offsets can currently be only integers, but it could also be something like '10 days' PRECEDING
. I plan to get back to it after finishing up window functions in distsql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/sem/builtins/window_frame_builtins.go, line 239 at r2 (raw file):
As far as options I tried, it doesn't.
Oh, this is a duration.Duration
, not a time.Duration
. That makes sense.
However, I'm not sure whether I should include changes related to Interval type since supporting this type requires more work - for instance, offsets can currently be only integers, but it could also be something like '10 days' PRECEDING. I plan to get back to it after finishing up window functions in distsql.
Ok, please leave a TODO comment then.
pkg/sql/sem/builtins/window_frame_builtins.go, line 137 at r3 (raw file):
) *slidingWindow { return &slidingWindow{ values: ringBuffer{},
No need for this line.
Adds linear-time implementations of min, max, sum, and avg (using sliding window approach) instead of naive quadratic version. Addresses: cockroachdb#26464. Bonus: min and max are an order of magnitude faster than PG (when window frame doesn't include the whole partition). Release note (performance improvement): min, max, sum, avg now take linear time when used for aggregation as window functions for all supported window frame options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reviews!
Reviewable status: complete! 1 of 0 LGTMs obtained
pkg/sql/sem/builtins/window_frame_builtins.go, line 239 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
As far as options I tried, it doesn't.
Oh, this is a
duration.Duration
, not atime.Duration
. That makes sense.However, I'm not sure whether I should include changes related to Interval type since supporting this type requires more work - for instance, offsets can currently be only integers, but it could also be something like '10 days' PRECEDING. I plan to get back to it after finishing up window functions in distsql.
Ok, please leave a TODO comment then.
Done.
pkg/sql/sem/builtins/window_frame_builtins.go, line 137 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
No need for this line.
Done.
bors r+ |
26988: sql: Add more efficient implementation of min, max, sum, avg aggregates when used as window functions. r=yuzefovich a=yuzefovich Adds linear-time implementations of min, max, sum, and avg (using sliding window approach) instead of naive quadratic version. Addresses: #26464. Bonus: min and max is order of magnitude faster that PG (when window frame doesn't include the whole partition). Release note (performance improvement): min, max, sum, avg now take linear time when used for aggregation as window functions. Co-authored-by: yuzefovich <[email protected]>
Build succeeded |
Adds linear-time implementations of min, max, sum, and avg
(using sliding window approach) instead of naive quadratic
version.
Addresses: #26464.
Bonus: min and max is order of magnitude faster that PG
(when window frame doesn't include the whole partition).
Release note (performance improvement): min, max, sum, avg
now take linear time when used for aggregation as window
functions.