Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce overheads related to DictionaryBlock#getSizeInBytes() #10970

Merged

Conversation

pettyjamesm
Copy link
Member

Description

Reduces the overhead of calculating DictionaryBlock#getSizeInBytes() by making two changes to the Block interface definition:

  • Adds a new argument selectedPositionCount to Block#getPositionsSizeInBytes(boolean[] positions, int selectedPositionCount). Callers to the previous method getPositionsSizeInBytes(boolean[] positions) could trivially calculate and pass the total count of the positions selected, and failing to do so required the called blocks to count the number of true values in the positions array- sometimes repeatedly in the case of eg: RowBlock.
  • Adds a new method Block#fixedSizeInBytesPerPosition() that allows blocks to describe whether their getSizeInBytes() can be calculated directly from the position count, ie: the size is not variable based on the specific positions selected

Also includes a change to eagerly populate the unique position count and size in bytes on the results of DictionaryBlock#getPositions when the result is not compact, since the selected positions array may have already been created and would otherwise have to be reconstructed in a subsequent call to DictionaryBlock#getSizeInBytes().

General information

Is this change a fix, improvement, new feature, refactoring, or other?

This is an improvement that reduces the overhead for common scenarios involving DictionaryBlock#getSizeInBytes()

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

This change affects the SPI interface of blocks and refactors the implementations of DictionaryBlock#getPositions and Block#getPositionsSizeInBytes to leverage the new information available.

How would you describe this change to a non-technical end user or system administrator?

Specifically describing this change to a non-technical audience should not be necessary

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`5678`)

@cla-bot cla-bot bot added the cla-signed label Feb 7, 2022
@findepi findepi requested a review from sopel39 February 8, 2022 09:52
@pettyjamesm pettyjamesm force-pushed the improve-dictionary-size-calculation branch from d2878b6 to 533f5f0 Compare February 8, 2022 14:44
@pettyjamesm pettyjamesm marked this pull request as ready for review February 8, 2022 15:16
@pettyjamesm
Copy link
Member Author

Benchmark results

Before

Benchmark                                                (selectedPositions)  (valueType)  Mode  Cnt     Score     Error  Units
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                  100      varchar  avgt    5   441.915 ±  39.146  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                  100      integer  avgt    5   357.674 ±  12.478  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                 1000      varchar  avgt    5   436.585 ±   5.135  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                 1000      integer  avgt    5   374.206 ±   4.870  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                10000      varchar  avgt    5   838.399 ±  17.185  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                10000      integer  avgt    5   647.638 ±   5.097  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes               100000      varchar  avgt    5  2437.982 ± 119.726  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes               100000      integer  avgt    5  1754.578 ±  87.259  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                         100      varchar  avgt    5   383.916 ±  9.507  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                         100      integer  avgt    5   329.267 ±  7.083  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                        1000      varchar  avgt    5   420.201 ±  4.594  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                        1000      integer  avgt    5   361.897 ± 28.052  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                       10000      varchar  avgt    5   836.371 ± 20.350  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                       10000      integer  avgt    5   636.434 ±  4.285  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                      100000      varchar  avgt    5  2489.630 ± 44.489  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                      100000      integer  avgt    5  1806.359 ± 24.145  us/op

After

Benchmark                                                (selectedPositions)  (valueType)  Mode  Cnt     Score    Error  Units
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                  100      varchar  avgt    5   253.769 ± 10.734  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                  100      integer  avgt    5    56.284 ±  1.955  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                 1000      varchar  avgt    5   311.319 ±  7.131  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                 1000      integer  avgt    5    51.795 ±  0.427  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                10000      varchar  avgt    5   772.982 ± 13.421  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes                10000      integer  avgt    5   137.741 ±  8.286  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes               100000      varchar  avgt    5  2216.938 ± 45.850  us/op
BenchmarkDictionaryBlock.getPositionsThenGetSizeInBytes               100000      integer  avgt    5   623.966 ± 14.182  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                         100      varchar  avgt    5   303.344 ± 14.147  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                         100      integer  avgt    5    91.252 ± 42.979  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                        1000      varchar  avgt    5   374.223 ± 15.776  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                        1000      integer  avgt    5    85.253 ±  4.161  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                       10000      varchar  avgt    5   778.172 ± 16.958  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                       10000      integer  avgt    5   213.559 ±  1.584  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                      100000      varchar  avgt    5  2374.654 ± 10.486  us/op
BenchmarkDictionaryBlock.getPositionsSizeInBytes                      100000      integer  avgt    5   725.355 ± 26.155  us/op

@pettyjamesm pettyjamesm force-pushed the improve-dictionary-size-calculation branch from 805d041 to d6d3860 Compare February 8, 2022 19:26
@sopel39 sopel39 requested a review from skrzypo987 February 8, 2022 22:12
@pettyjamesm pettyjamesm force-pushed the improve-dictionary-size-calculation branch 2 times, most recently from 1c8e4cd to d56413d Compare February 9, 2022 18:21
@pettyjamesm pettyjamesm force-pushed the improve-dictionary-size-calculation branch 3 times, most recently from 9064e23 to 105467f Compare February 10, 2022 23:31
@pettyjamesm
Copy link
Member Author

@skrzypo987 think you'll have a chance to take a full review pass on this PR sometime soon?

@skrzypo987
Copy link
Member

@skrzypo987 think you'll have a chance to take a full review pass on this PR sometime soon?

I'll take a look today

Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok from my perspective. If you want to get it merged soon ping some other maintainer since @sopel39 is out this week.

@pettyjamesm pettyjamesm requested a review from dain February 15, 2022 18:52
@pettyjamesm
Copy link
Member Author

@dain maybe you have a minute to take a look and decide whether to merge it while Karol is out?

@@ -103,6 +106,12 @@ public Block getRegion(int position, int length)
getRawElementBlock());
}

@Override
public OptionalInt fixedSizeInBytesPerPosition()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe just getSizeInBytesPerPosition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "fixed" adds extra information here and makes it more clear (to me) that this method is only for blocks where the size in bytes is not dependent on the value at any given position, but that's just my opinion- if you feel strongly then I'm willing to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics is was kinda natural for me even with shorter name based on the fact that single number is returned and is guarded by Optional to handle case when we cannot compute it. But I do not feel strongly. it may stay.

}
else if (rawElementBlock instanceof RunLengthEncodedBlock) {
// RLE blocks don't have fixed size per position, but accept null for the positions array
elementsSizeInBytes = rawElementBlock.getPositionsSizeInBytes(null, countSelectedPositionsFromOffsets(positions, offsets, offsetBase));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are exploiting the fact that rawElementBlock is of specific class do we need to compute countSelectedPositionsFromOffsets. This arg is ignored by RunLengthEncodedBlock.getPositionsSizeInBytes anyway.

Copy link
Member Author

@pettyjamesm pettyjamesm Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I left this in because I was planning on making RunLengthEncodedBlock#getPositionsSizeInBytes do something like return selectedPositions > 0 ? value.getSizeInBytes() : 0; but that broke a lot of tests and it didn't seem worth the headache- so I instead added tests that assert the opposite.

It doesn't quite feel right for AbstractArrayBlock to special case the handling of RunLengthEncodedBlock by calling RunLengthEncodedBlock#getSizeInBytes, but it certainly would reduce the overhead involved.

I'm open to doing whatever you think makes the most sense here.

@losipiuk
Copy link
Member

LGTM. But it is not the area I feel best with. @dain / @sopel39 want to take a look still?

@pettyjamesm pettyjamesm force-pushed the improve-dictionary-size-calculation branch 2 times, most recently from dcbbe4d to d992eb0 Compare February 23, 2022 23:46
Adds a positionCount argument to Block#getPositionSizeInBytes
and adds a new method: Block#fixedSizeInBytesPerPosition() to
reduce the overhead associated with calculating DictionaryBlock
size in bytes when the underlying dictionary size in bytes can
be calculated without specific information about which positions
are referenced.
@pettyjamesm
Copy link
Member Author

More complete benchmark results were easier to produce in the Presto equivalent PR, and links to those results can be found in my comment there.

@losipiuk
Copy link
Member

@pettyjamesm The JMH benchmarks look nice! I have a question still. Did you have a chance to verify how much the change impacts end-to-end query execution? how much are we gaining in CPU/wall time over some verbose benchmark (e.g. tpc-h/ds), or over your production queries?
The change surely adds some complexity and it would be nice to have a good justification for adding that.

@pettyjamesm
Copy link
Member Author

Did you have a chance to verify how much the change impacts end-to-end query execution? how much are we gaining in CPU/wall time over some verbose benchmark (e.g. tpc-h/ds), or over your production queries?

I haven't had a chance to put these changes into our production environment, and in TPCDS / TPCH queries I wouldn't expect much of a measurable improvement at the macro level. The biggest gains are going to be in queries that make heavy use of RowBlock / ArrayBlock / DictionaryBlock combinations. I just tweaked the BenchmarkUnnestOperator implementation to call operatorContext.addProcessedInput(inputPage.getSizeInBytes(), inputPage.getPositionCount()) as well as for the output page. Results in about ~45% throughput improvement on my laptop (eg: ~6.5 ops/second to ~10.0 ops/second)

@losipiuk
Copy link
Member

losipiuk commented Mar 1, 2022

I haven't had a chance to put these changes into our production environment, and in TPCDS / TPCH queries I wouldn't expect much of a measurable improvement at the macro level. The biggest gains are going to be in queries that make heavy use of RowBlock / ArrayBlock / DictionaryBlock combinations. I just tweaked the BenchmarkUnnestOperator implementation to call operatorContext.addProcessedInput(inputPage.getSizeInBytes(), inputPage.getPositionCount()) as well as for the output page. Results in about ~45% throughput improvement on my laptop (eg: ~6.5 ops/second to ~10.0 ops/second)

Nice. Thanks.

@losipiuk losipiuk merged commit 7b96c8d into trinodb:master Mar 1, 2022
@pettyjamesm pettyjamesm deleted the improve-dictionary-size-calculation branch March 1, 2022 02:36
@github-actions github-actions bot added this to the 372 milestone Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants