exec: some cleanup and small perf wins #36972

jordanlewis · 2019-04-20T14:28:32Z

See individual commits for details. Just did a bunch of bounds check elimination stuff while I was on the plane. Also had an interesting observation which is that it's unnecessary to have pointer receivers for Next a lot of the time, which speeds things up by removing redundant checks from the compiler which couldn't prove in the pointer receiver case that data from the struct hadn't gotten modified by another goroutine.

And fixed a bug in hash join planning that caused us to fail to run some TPCH queries.

cockroach-teamcity · 2019-04-20T14:28:42Z

This change is

jordanlewis · 2019-04-22T16:13:12Z

Rebased on top of @solongordon's patch.

yuzefovich

Nice finds!

Reviewed 3 of 3 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 2 of 2 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)

pkg/sql/exec/bool_vec_to_sel.go, line 68 at r6 (raw file):

		} else {
			batch.SetSelection(true)
			sel := batch.Selection()

Do we need sel = sel[:n] here as well? And in general, how do you check for this kind of things? Using something like "go.godbolt.com" to compile the code?

pkg/sql/exec/distinct_tmpl.go, line 211 at r6 (raw file):

		col = col[startIdx:n]
		outputCol = outputCol[startIdx:n]
		_ = outputCol[len(col)-1]

[question]: what is the goal of this instruction? (Sorry, if I asked this before.)

pkg/sql/exec/distinct_tmpl.go, line 244 at r6 (raw file):

// _CHECK_DISTINCT retrieves the value at the ith index of col, compares it
// to the passed in lastVal, and sets the ith value of outputCol to true if the
// compared values were distinct. It returns the value at the ith index of col.

[nit]: the method doesn't seem to return anything.

Release note: None

Remove some bounds checks and a conditional branch. ``` name old time/op new time/op delta SortedDistinct-24 3.10µs ± 1% 2.70µs ± 0% -12.85% (p=0.000 n=10+10) name old speed new speed delta SortedDistinct-24 7.92GB/s ± 1% 9.08GB/s ± 0% +14.74% (p=0.000 n=10+10) name old alloc/op new alloc/op delta SortedDistinct-24 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta SortedDistinct-24 0.00 0.00 ~ (all equal) ``` Release note: None

Release note: None

And pass them by value, to get a big savings in the non-const case. ``` ProjPlusInt64Int64ConstOp-12 406ns ± 2% 375ns ± 1% -7.62% (p=0.000 n=10+10) ProjPlusInt64Int64Op-12 495ns ± 1% 386ns ± 1% -21.90% (p=0.000 n=9+9) name old speed new speed delta ProjPlusInt64Int64ConstOp-12 20.2GB/s ± 2% 21.8GB/s ± 1% +8.24% (p=0.000 n=10+10) ProjPlusInt64Int64Op-12 33.1GB/s ± 1% 42.5GB/s ± 3% +28.41% (p=0.000 n=9+10) name old alloc/op new alloc/op delta ProjPlusInt64Int64ConstOp-12 0.00B 0.00B ~ (all equal) ProjPlusInt64Int64Op-12 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta ProjPlusInt64Int64ConstOp-12 0.00 0.00 ~ (all equal) ProjPlusInt64Int64Op-12 0.00 0.00 ~ (all equal) ``` Release note: None

``` Columnarize-24 810µs ± 1% 784µs ± 1% -3.19% (p=0.000 n=10+10) ColumnarizeMaterialize-24 1.69ms ± 1% 1.66ms ± 1% -1.73% (p=0.000 n=10+10) name old speed new speed delta Columnarize-24 198MB/s ± 1% 204MB/s ± 1% +3.29% (p=0.000 n=10+10) ColumnarizeMaterialize-24 94.7MB/s ± 1% 96.4MB/s ± 1% +1.76% (p=0.000 n=10+10) name old alloc/op new alloc/op delta Columnarize-24 164kB ± 0% 164kB ± 0% ~ (all equal) ColumnarizeMaterialize-24 327kB ± 0% 327kB ± 0% ~ (p=1.000 n=10+10) name old allocs/op new allocs/op delta Columnarize-24 20.0k ± 0% 20.0k ± 0% ~ (all equal) ColumnarizeMaterialize-24 21.3k ± 0% 21.3k ± 0% ~ (all equal) ``` Release note: None

It wasn't permitting post filter expressions due to failing to export its column types. Release note: None

jordanlewis

TFTR!

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @solongordon and @yuzefovich)

pkg/sql/exec/bool_vec_to_sel.go, line 68 at r6 (raw file):

Previously, yuzefovich wrote…

Do we need sel = sel[:n] here as well? And in general, how do you check for this kind of things? Using something like "go.godbolt.com" to compile the code?

You can ask the compiler to spit out its analysis:

go build -gcflags='-d=ssa/check_bce' ./pkg/sql/exec

In this case, sel = sel[:n] doesn't help us with anything because the compiler doesn't know how big idx in the loop below can get. We know that it's guaranteed to stay less than n, but I don't think there's any way to prove that to the compiler.

pkg/sql/exec/distinct_tmpl.go, line 211 at r6 (raw file):

Previously, yuzefovich wrote…

[question]: what is the goal of this instruction? (Sorry, if I asked this before.)

Even with the line above it, the compiler cannot prove that the bounds check is eliminated. This line helps the compiler realize that since it's already emitted a bounds check for outputCol at the maximum that i will be in the loop before, it does not have to emit any more bounds checks within the inner loop.

Use go build -gcflags='-d=ssa/check_bce' ./pkg/sql/exec to understand further - it will print out one line per bounds check emitted.

pkg/sql/exec/distinct_tmpl.go, line 244 at r6 (raw file):

Previously, yuzefovich wrote…

[nit]: the method doesn't seem to return anything.

Oops, you are right. I left this in by accident from an earlier experiment where I tried to convert this to be an ordinary function in the hopes that it would be inlined just as we do manually with templating. The result of the experiment was that the function was inlined, bounds checks were eliminated, but the code was still inexplicably not as fast as the manually templated version, so I reverted it.

jordanlewis · 2019-04-22T16:58:06Z

bors r+

36972: exec: some cleanup and small perf wins r=jordanlewis a=jordanlewis See individual commits for details. Just did a bunch of bounds check elimination stuff while I was on the plane. Also had an interesting observation which is that it's unnecessary to have pointer receivers for `Next` a lot of the time, which speeds things up by removing redundant checks from the compiler which couldn't prove in the pointer receiver case that data from the struct hadn't gotten modified by another goroutine. And fixed a bug in hash join planning that caused us to fail to run some TPCH queries. Co-authored-by: Jordan Lewis <[email protected]>

craig · 2019-04-22T17:38:21Z

Build succeeded

GitHub CI (Cockroach)

36988: exec: small expression planning refactor; support some new projections r=jordanlewis a=jordanlewis This PR is based off of #36972 - that one should be reviewed first. Previously, the planning for projection and selection scalar expressions was performed by a single function, which made it difficult to do things like project on the result of a comparison. This PR splits projection and selection scalar expression construction into separate routines. Also, add support for const projections (`select 3 from t`) and filtering on arbitrary boolean columns (`select * from t where bcol`). Co-authored-by: Jordan Lewis <[email protected]>

jordanlewis requested review from solongordon, yuzefovich and a team April 20, 2019 14:28

jordanlewis mentioned this pull request Apr 20, 2019

exec: eliminate a few bounds checks in selection #36780

Merged

jordanlewis force-pushed the cleanup branch from 1c6f4cb to 1bd20d0 Compare April 21, 2019 04:30

jordanlewis mentioned this pull request Apr 22, 2019

exec: small expression planning refactor; support some new projections #36988

Merged

jordanlewis force-pushed the cleanup branch from 1bd20d0 to a512672 Compare April 22, 2019 15:04

yuzefovich approved these changes Apr 22, 2019

View reviewed changes

jordanlewis added 6 commits April 22, 2019 09:45

exec: remove unnecessary quotes in distinct tmpl

78539c3

Release note: None

exec: eliminate a bounds check in count_agg

e5d1803

Release note: None

exec: bugfix in hash join planning

14cbaa6

It wasn't permitting post filter expressions due to failing to export its column types. Release note: None

jordanlewis force-pushed the cleanup branch from a512672 to 14cbaa6 Compare April 22, 2019 16:45

jordanlewis commented Apr 22, 2019

View reviewed changes

craig bot merged commit 14cbaa6 into cockroachdb:master Apr 22, 2019

jordanlewis deleted the cleanup branch April 22, 2019 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exec: some cleanup and small perf wins #36972

exec: some cleanup and small perf wins #36972

jordanlewis commented Apr 20, 2019

cockroach-teamcity commented Apr 20, 2019

jordanlewis commented Apr 22, 2019

yuzefovich left a comment

jordanlewis left a comment

jordanlewis commented Apr 22, 2019

craig bot commented Apr 22, 2019

exec: some cleanup and small perf wins #36972

exec: some cleanup and small perf wins #36972

Conversation

jordanlewis commented Apr 20, 2019

cockroach-teamcity commented Apr 20, 2019

jordanlewis commented Apr 22, 2019

yuzefovich left a comment

Choose a reason for hiding this comment

jordanlewis left a comment

Choose a reason for hiding this comment

jordanlewis commented Apr 22, 2019

craig bot commented Apr 22, 2019

Build succeeded