Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: some cleanup and small perf wins #36972

Merged
merged 6 commits into from
Apr 22, 2019
Merged

Conversation

jordanlewis
Copy link
Member

See individual commits for details. Just did a bunch of bounds check elimination stuff while I was on the plane. Also had an interesting observation which is that it's unnecessary to have pointer receivers for Next a lot of the time, which speeds things up by removing redundant checks from the compiler which couldn't prove in the pointer receiver case that data from the struct hadn't gotten modified by another goroutine.

And fixed a bug in hash join planning that caused us to fail to run some TPCH queries.

@jordanlewis jordanlewis requested review from solongordon, yuzefovich and a team April 20, 2019 14:28
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis
Copy link
Member Author

Rebased on top of @solongordon's patch.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finds! :lgtm:

Reviewed 3 of 3 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 2 of 2 files at r5, 1 of 1 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)


pkg/sql/exec/bool_vec_to_sel.go, line 68 at r6 (raw file):

		} else {
			batch.SetSelection(true)
			sel := batch.Selection()

Do we need sel = sel[:n] here as well? And in general, how do you check for this kind of things? Using something like "go.godbolt.com" to compile the code?


pkg/sql/exec/distinct_tmpl.go, line 211 at r6 (raw file):

		col = col[startIdx:n]
		outputCol = outputCol[startIdx:n]
		_ = outputCol[len(col)-1]

[question]: what is the goal of this instruction? (Sorry, if I asked this before.)


pkg/sql/exec/distinct_tmpl.go, line 244 at r6 (raw file):

// _CHECK_DISTINCT retrieves the value at the ith index of col, compares it
// to the passed in lastVal, and sets the ith value of outputCol to true if the
// compared values were distinct. It returns the value at the ith index of col.

[nit]: the method doesn't seem to return anything.

Remove some bounds checks and a conditional branch.

```
name               old time/op    new time/op    delta
SortedDistinct-24    3.10µs ± 1%    2.70µs ± 0%  -12.85%  (p=0.000 n=10+10)

name               old speed      new speed      delta
SortedDistinct-24  7.92GB/s ± 1%  9.08GB/s ± 0%  +14.74%  (p=0.000 n=10+10)

name               old alloc/op   new alloc/op   delta
SortedDistinct-24     0.00B          0.00B          ~     (all equal)

name               old allocs/op  new allocs/op  delta
SortedDistinct-24      0.00           0.00          ~     (all equal)
```

Release note: None
And pass them by value, to get a big savings in the non-const case.

```
ProjPlusInt64Int64ConstOp-12     406ns ± 2%     375ns ± 1%   -7.62% (p=0.000 n=10+10)
ProjPlusInt64Int64Op-12          495ns ± 1%     386ns ± 1%  -21.90% (p=0.000 n=9+9)

name                          old speed      new speed      delta
ProjPlusInt64Int64ConstOp-12  20.2GB/s ± 2%  21.8GB/s ± 1%   +8.24% (p=0.000 n=10+10)
ProjPlusInt64Int64Op-12       33.1GB/s ± 1%  42.5GB/s ± 3%  +28.41% (p=0.000 n=9+10)

name                          old alloc/op   new alloc/op   delta
ProjPlusInt64Int64ConstOp-12     0.00B          0.00B          ~ (all equal)
ProjPlusInt64Int64Op-12          0.00B          0.00B          ~ (all equal)

name                          old allocs/op  new allocs/op  delta
ProjPlusInt64Int64ConstOp-12      0.00           0.00          ~ (all equal)
ProjPlusInt64Int64Op-12           0.00           0.00          ~ (all equal)
```

Release note: None
```
Columnarize-24                810µs ± 1%     784µs ± 1%  -3.19%  (p=0.000 n=10+10)
ColumnarizeMaterialize-24    1.69ms ± 1%    1.66ms ± 1%  -1.73%  (p=0.000 n=10+10)

name                       old speed      new speed      delta
Columnarize-24              198MB/s ± 1%   204MB/s ± 1%  +3.29%  (p=0.000 n=10+10)
ColumnarizeMaterialize-24  94.7MB/s ± 1%  96.4MB/s ± 1%  +1.76%  (p=0.000 n=10+10)

name                       old alloc/op   new alloc/op   delta
Columnarize-24                164kB ± 0%     164kB ± 0%    ~     (all equal)
ColumnarizeMaterialize-24     327kB ± 0%     327kB ± 0%    ~     (p=1.000 n=10+10)

name                       old allocs/op  new allocs/op  delta
Columnarize-24                20.0k ± 0%     20.0k ± 0%    ~     (all equal)
ColumnarizeMaterialize-24     21.3k ± 0%     21.3k ± 0%    ~     (all equal)
```

Release note: None
It wasn't permitting post filter expressions due to failing to export
its column types.

Release note: None
Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @solongordon and @yuzefovich)


pkg/sql/exec/bool_vec_to_sel.go, line 68 at r6 (raw file):

Previously, yuzefovich wrote…

Do we need sel = sel[:n] here as well? And in general, how do you check for this kind of things? Using something like "go.godbolt.com" to compile the code?

You can ask the compiler to spit out its analysis:

go build -gcflags='-d=ssa/check_bce' ./pkg/sql/exec

In this case, sel = sel[:n] doesn't help us with anything because the compiler doesn't know how big idx in the loop below can get. We know that it's guaranteed to stay less than n, but I don't think there's any way to prove that to the compiler.


pkg/sql/exec/distinct_tmpl.go, line 211 at r6 (raw file):

Previously, yuzefovich wrote…

[question]: what is the goal of this instruction? (Sorry, if I asked this before.)

Even with the line above it, the compiler cannot prove that the bounds check is eliminated. This line helps the compiler realize that since it's already emitted a bounds check for outputCol at the maximum that i will be in the loop before, it does not have to emit any more bounds checks within the inner loop.

Use go build -gcflags='-d=ssa/check_bce' ./pkg/sql/exec to understand further - it will print out one line per bounds check emitted.


pkg/sql/exec/distinct_tmpl.go, line 244 at r6 (raw file):

Previously, yuzefovich wrote…

[nit]: the method doesn't seem to return anything.

Oops, you are right. I left this in by accident from an earlier experiment where I tried to convert this to be an ordinary function in the hopes that it would be inlined just as we do manually with templating. The result of the experiment was that the function was inlined, bounds checks were eliminated, but the code was still inexplicably not as fast as the manually templated version, so I reverted it.

@jordanlewis
Copy link
Member Author

bors r+

craig bot pushed a commit that referenced this pull request Apr 22, 2019
36972: exec: some cleanup and small perf wins r=jordanlewis a=jordanlewis

See individual commits for details. Just did a bunch of bounds check elimination stuff while I was on the plane. Also had an interesting observation which is that it's unnecessary to have pointer receivers for `Next` a lot of the time, which speeds things up by removing redundant checks from the compiler which couldn't prove in the pointer receiver case that data from the struct hadn't gotten modified by another goroutine.

And fixed a bug in hash join planning that caused us to fail to run some TPCH queries.

Co-authored-by: Jordan Lewis <[email protected]>
@craig
Copy link
Contributor

craig bot commented Apr 22, 2019

Build succeeded

@craig craig bot merged commit 14cbaa6 into cockroachdb:master Apr 22, 2019
@jordanlewis jordanlewis deleted the cleanup branch April 22, 2019 23:19
craig bot pushed a commit that referenced this pull request Apr 24, 2019
36988: exec: small expression planning refactor; support some new projections r=jordanlewis a=jordanlewis

This PR is based off of #36972 - that one should be reviewed first.

Previously, the planning for projection and selection scalar expressions was performed by a single function, which made it difficult to do things like project on the result of a comparison.

This PR splits projection and selection scalar expression construction into separate routines.

Also, add support for const projections (`select 3 from t`) and filtering on arbitrary boolean columns (`select * from t where bcol`).

Co-authored-by: Jordan Lewis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants