-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exec: some cleanup and small perf wins #36972
Conversation
Rebased on top of @solongordon's patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 2 of 2 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)
pkg/sql/exec/bool_vec_to_sel.go, line 68 at r6 (raw file):
} else { batch.SetSelection(true) sel := batch.Selection()
Do we need sel = sel[:n]
here as well? And in general, how do you check for this kind of things? Using something like "go.godbolt.com" to compile the code?
pkg/sql/exec/distinct_tmpl.go, line 211 at r6 (raw file):
col = col[startIdx:n] outputCol = outputCol[startIdx:n] _ = outputCol[len(col)-1]
[question]: what is the goal of this instruction? (Sorry, if I asked this before.)
pkg/sql/exec/distinct_tmpl.go, line 244 at r6 (raw file):
// _CHECK_DISTINCT retrieves the value at the ith index of col, compares it // to the passed in lastVal, and sets the ith value of outputCol to true if the // compared values were distinct. It returns the value at the ith index of col.
[nit]: the method doesn't seem to return anything.
Release note: None
Remove some bounds checks and a conditional branch. ``` name old time/op new time/op delta SortedDistinct-24 3.10µs ± 1% 2.70µs ± 0% -12.85% (p=0.000 n=10+10) name old speed new speed delta SortedDistinct-24 7.92GB/s ± 1% 9.08GB/s ± 0% +14.74% (p=0.000 n=10+10) name old alloc/op new alloc/op delta SortedDistinct-24 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta SortedDistinct-24 0.00 0.00 ~ (all equal) ``` Release note: None
Release note: None
And pass them by value, to get a big savings in the non-const case. ``` ProjPlusInt64Int64ConstOp-12 406ns ± 2% 375ns ± 1% -7.62% (p=0.000 n=10+10) ProjPlusInt64Int64Op-12 495ns ± 1% 386ns ± 1% -21.90% (p=0.000 n=9+9) name old speed new speed delta ProjPlusInt64Int64ConstOp-12 20.2GB/s ± 2% 21.8GB/s ± 1% +8.24% (p=0.000 n=10+10) ProjPlusInt64Int64Op-12 33.1GB/s ± 1% 42.5GB/s ± 3% +28.41% (p=0.000 n=9+10) name old alloc/op new alloc/op delta ProjPlusInt64Int64ConstOp-12 0.00B 0.00B ~ (all equal) ProjPlusInt64Int64Op-12 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta ProjPlusInt64Int64ConstOp-12 0.00 0.00 ~ (all equal) ProjPlusInt64Int64Op-12 0.00 0.00 ~ (all equal) ``` Release note: None
``` Columnarize-24 810µs ± 1% 784µs ± 1% -3.19% (p=0.000 n=10+10) ColumnarizeMaterialize-24 1.69ms ± 1% 1.66ms ± 1% -1.73% (p=0.000 n=10+10) name old speed new speed delta Columnarize-24 198MB/s ± 1% 204MB/s ± 1% +3.29% (p=0.000 n=10+10) ColumnarizeMaterialize-24 94.7MB/s ± 1% 96.4MB/s ± 1% +1.76% (p=0.000 n=10+10) name old alloc/op new alloc/op delta Columnarize-24 164kB ± 0% 164kB ± 0% ~ (all equal) ColumnarizeMaterialize-24 327kB ± 0% 327kB ± 0% ~ (p=1.000 n=10+10) name old allocs/op new allocs/op delta Columnarize-24 20.0k ± 0% 20.0k ± 0% ~ (all equal) ColumnarizeMaterialize-24 21.3k ± 0% 21.3k ± 0% ~ (all equal) ``` Release note: None
It wasn't permitting post filter expressions due to failing to export its column types. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @solongordon and @yuzefovich)
pkg/sql/exec/bool_vec_to_sel.go, line 68 at r6 (raw file):
Previously, yuzefovich wrote…
Do we need
sel = sel[:n]
here as well? And in general, how do you check for this kind of things? Using something like "go.godbolt.com" to compile the code?
You can ask the compiler to spit out its analysis:
go build -gcflags='-d=ssa/check_bce' ./pkg/sql/exec
In this case, sel = sel[:n]
doesn't help us with anything because the compiler doesn't know how big idx
in the loop below can get. We know that it's guaranteed to stay less than n
, but I don't think there's any way to prove that to the compiler.
pkg/sql/exec/distinct_tmpl.go, line 211 at r6 (raw file):
Previously, yuzefovich wrote…
[question]: what is the goal of this instruction? (Sorry, if I asked this before.)
Even with the line above it, the compiler cannot prove that the bounds check is eliminated. This line helps the compiler realize that since it's already emitted a bounds check for outputCol
at the maximum that i
will be in the loop before, it does not have to emit any more bounds checks within the inner loop.
Use go build -gcflags='-d=ssa/check_bce' ./pkg/sql/exec
to understand further - it will print out one line per bounds check emitted.
pkg/sql/exec/distinct_tmpl.go, line 244 at r6 (raw file):
Previously, yuzefovich wrote…
[nit]: the method doesn't seem to return anything.
Oops, you are right. I left this in by accident from an earlier experiment where I tried to convert this to be an ordinary function in the hopes that it would be inlined just as we do manually with templating. The result of the experiment was that the function was inlined, bounds checks were eliminated, but the code was still inexplicably not as fast as the manually templated version, so I reverted it.
bors r+ |
36972: exec: some cleanup and small perf wins r=jordanlewis a=jordanlewis See individual commits for details. Just did a bunch of bounds check elimination stuff while I was on the plane. Also had an interesting observation which is that it's unnecessary to have pointer receivers for `Next` a lot of the time, which speeds things up by removing redundant checks from the compiler which couldn't prove in the pointer receiver case that data from the struct hadn't gotten modified by another goroutine. And fixed a bug in hash join planning that caused us to fail to run some TPCH queries. Co-authored-by: Jordan Lewis <[email protected]>
Build succeeded |
36988: exec: small expression planning refactor; support some new projections r=jordanlewis a=jordanlewis This PR is based off of #36972 - that one should be reviewed first. Previously, the planning for projection and selection scalar expressions was performed by a single function, which made it difficult to do things like project on the result of a comparison. This PR splits projection and selection scalar expression construction into separate routines. Also, add support for const projections (`select 3 from t`) and filtering on arbitrary boolean columns (`select * from t where bcol`). Co-authored-by: Jordan Lewis <[email protected]>
See individual commits for details. Just did a bunch of bounds check elimination stuff while I was on the plane. Also had an interesting observation which is that it's unnecessary to have pointer receivers for
Next
a lot of the time, which speeds things up by removing redundant checks from the compiler which couldn't prove in the pointer receiver case that data from the struct hadn't gotten modified by another goroutine.And fixed a bug in hash join planning that caused us to fail to run some TPCH queries.