-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long aggregation improvements #11031
Long aggregation improvements #11031
Conversation
088029d
to
cb2031e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments.
@skrzypo987 Do you have some numbers for the improvements?
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
if (block.mayHaveNull()) { | ||
boolean haveNull = false; | ||
|
||
int[] hashes = new int[batchSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would reuse hashes array to limit GC pressure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, should this be named hashPositions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed.
By reusing you mean making it a class field? I don't think it will make much sense. This is a short-living object allocated on eden and dying pretty quickly. GC loves objects like that.
} | ||
} | ||
|
||
private void batchedPutIfAbsent(int batchSize, int lastPosition, Block block, long[] groupIds, int groupIdOffset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastPosition -> startPosition, groupIds -> ouGroupIds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to blockOffset
.
As of outGroupIds
, I don't really like it. Maybe we can figure something else out
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
} | ||
int batchSize = min(min(leftToProcess, MAX_BATCH_SIZE), leftBeforeRehash); | ||
|
||
long[] dummyGroupIds = new long[batchSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this outside of loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. I hope that JIT is getting rid of it anyway.
long value = BIGINT.getLong(block, lastPosition + i); | ||
long storedValue = values[hashes[i]]; | ||
boolean match = value == storedValue; | ||
groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the perf improvement from this line vs just if (!match) groupIds[groupIdOffset + i] = -1
? I would hope that the if
could be replaced with cmove
by jit in bad cases.
If this stays, at least add a comment on what this line does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match
will be (hopefully) casted to a number by the compiler. That way there is no branch at all.
Added a comment
for (int i = 0; i < batchSize; i++) { | ||
if (groupIds[groupIdOffset + i] >= 0) { | ||
long value = BIGINT.getLong(block, lastPosition + i); | ||
long storedValue = values[hashes[i]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a large number of groups, this is a cache miss. The same is true for the this.groupIds[hashes[i]]
above.
Since this loop is not super tight, manually unrolling could help here (I have seen improvement with doing 4 items at once)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave it a try. No difference.
The loop actually is super tight IMO.
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java
Outdated
Show resolved
Hide resolved
That's a damn good review. |
6eab0c5
to
d1d7a9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments addressed.
I also added another commit with batched dictionary processing. Brings 5-10% improvement in microbenchmarks so nothing significant
I also started macro benchmarks for partitioned data. Will share the results when they finish
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
} | ||
int batchSize = min(min(leftToProcess, MAX_BATCH_SIZE), leftBeforeRehash); | ||
|
||
long[] dummyGroupIds = new long[batchSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. I hope that JIT is getting rid of it anyway.
} | ||
} | ||
|
||
private void batchedPutIfAbsent(int batchSize, int lastPosition, Block block, long[] groupIds, int groupIdOffset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to blockOffset
.
As of outGroupIds
, I don't really like it. Maybe we can figure something else out
if (block.mayHaveNull()) { | ||
boolean haveNull = false; | ||
|
||
int[] hashes = new int[batchSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed.
By reusing you mean making it a class field? I don't think it will make much sense. This is a short-living object allocated on eden and dying pretty quickly. GC loves objects like that.
for (int i = 0; i < batchSize; i++) { | ||
if (groupIds[groupIdOffset + i] >= 0) { | ||
long value = BIGINT.getLong(block, lastPosition + i); | ||
long storedValue = values[hashes[i]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave it a try. No difference.
The loop actually is super tight IMO.
long value = BIGINT.getLong(block, lastPosition + i); | ||
long storedValue = values[hashes[i]]; | ||
boolean match = value == storedValue; | ||
groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match
will be (hopefully) casted to a number by the compiler. That way there is no branch at all.
Added a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % comments
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
// if (!match) | ||
// groupIds[groupIdOffset + i] = -1; | ||
// but without explicit branches | ||
groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
is nicer code and could be faster if compiled to cmov (cmov has 1 cycle latency). I would check what both expressions compile to and choose acordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had some benchmarks with replacing simple if
s with this ? 1:0
and it was about 2x faster every single time.
IMHO A loop without a branch is always going to be faster then a loop with a branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmov
is not a branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite frankly I don't know what it is. We should be platform agnostic.
We use this "trick" throughout the Trino codebase, I see no reason why we should not use it here.
boolean finished; | ||
do { | ||
finished = work.process(); | ||
results.add(work.getResult()); // Pretend the results are used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use org.openjdk.jmh.infra.Blackhole
for that. it avoids keeps not needed stuff in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class seems serious. It cannot even be easily instantiated. I'd rather stay with basic solutions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't create it yourself. jmh will inject it if you have it as param to the benchmark method.
this is the exact use case for it. see https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then you cannot use the benchmark method from outside the benchmark, i.e. runGroupByTpch
method, because you cannot create a Blackhole
instance. I'd rather stay with the current option
public enum ColumnType | ||
{ | ||
BIGINT(BigintType.BIGINT, (blockBuilder, positionCount, cardinality, seed) -> { | ||
Random r = new Random(seed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are different seeds needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that two columns with the same type got different values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use one static Random value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way if you have two bigint columns their contents will be exactly the same. So 2 columns with cardinality of 10 will produce 10 groups. We want them to be independent from each other and have 100 groups in aggregation
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java
Outdated
Show resolved
Hide resolved
d1d7a9b
to
6fc6ed9
Compare
boolean finished; | ||
do { | ||
finished = work.process(); | ||
results.add(work.getResult()); // Pretend the results are used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't create it yourself. jmh will inject it if you have it as param to the benchmark method.
this is the exact use case for it. see https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.java
// if (!match) | ||
// groupIds[groupIdOffset + i] = -1; | ||
// but without explicit branches | ||
groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmov
is not a branch
public enum ColumnType | ||
{ | ||
BIGINT(BigintType.BIGINT, (blockBuilder, positionCount, cardinality, seed) -> { | ||
Random r = new Random(seed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use one static Random value
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java
Outdated
Show resolved
Hide resolved
6fc6ed9
to
cf181c0
Compare
3e95042
to
a924346
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm Replace LongBigArray with long[] ...
I think you can split it into smaller PRs so that it lands faster
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
|
||
/** | ||
* This class attempts to emulate aggregations done while running tpch queries on unpartitioned data. | ||
* The data has been acquired on a single node Trino using sf10 scale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does that mean?
The data has been acquired on a single node Trino using sf10 scale.
What is data
?
Please elaborate what is benchmarked specifically and what parameters are affected.
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java
Outdated
Show resolved
Hide resolved
@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) | ||
@Measurement(iterations = 20, time = 500, timeUnit = TimeUnit.MILLISECONDS) | ||
@BenchmarkMode(Mode.AverageTime) | ||
public class BenchmarkGroupByHashOnTpch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think these benchmarks are beneficial, I think we might be overoptimizing for TPCH use cases. Could we just parametrize various use cases instead of trying to match TPCH data characteristic? Maybe we could use TPCH datagen directly instead of simulating already artificial data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would either parametrize it and not pretend to be related to TPCH or use actual TPCH data.
BTW: take a look at io.trino.benchmark.GroupByAggregationSqlBenchmark
. I don't think its used, but maybe it's worth reviving. For one, I think that framework uses generated TPCH instead of pre-generated data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could use TPCH datagen directly
It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the purpose of this PR is to optimize bigint aggregation, bot add some fancy benchmarks so maybe I should just delete the commit? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones
What do you mean by not being correlated enough?
Actually the purpose of this PR is to optimize bigint aggregation, bot add some fancy benchmarks so maybe I should just delete the commit? WDYT?
We can leave it out of this PR and create a new one. Do you have JMH numbers from existing benchmarks?
Anyway, it probably makes sense to revise that old Sql benchmarking framework
this.distinctDictionaries = distinctDictionaries; | ||
} | ||
|
||
public Block[] createBlocks(int blockCount, int positionsPerBlock, int channel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just use TPCH data generator directly. Something like io.trino.plugin.hive.benchmark.BenchmarkFileFormatsUtils#createTpchDataSet(io.trino.plugin.hive.benchmark.FileFormat, io.trino.tpch.TpchTable<E>, java.util.List<io.trino.tpch.TpchColumn<E>>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones
sf1 is not sufficient? sf20 will have different characteristic? I think these benchmarks are fine as long as they don't overspecialize for a specific case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can split it into smaller PRs so that it lands faster
No I cannot. If there is a problem with the process of merging multi-commit PRs we should fix the process, not the PRs.
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) | ||
@Measurement(iterations = 20, time = 500, timeUnit = TimeUnit.MILLISECONDS) | ||
@BenchmarkMode(Mode.AverageTime) | ||
public class BenchmarkGroupByHashOnTpch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could use TPCH datagen directly
It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones
@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) | ||
@Measurement(iterations = 20, time = 500, timeUnit = TimeUnit.MILLISECONDS) | ||
@BenchmarkMode(Mode.AverageTime) | ||
public class BenchmarkGroupByHashOnTpch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the purpose of this PR is to optimize bigint aggregation, bot add some fancy benchmarks so maybe I should just delete the commit? WDYT?
this.distinctDictionaries = distinctDictionaries; | ||
} | ||
|
||
public Block[] createBlocks(int blockCount, int positionsPerBlock, int channel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the comment above.
a924346
to
304d70e
Compare
@skrzypo987 do you have macro benchmark results that you could attach here? |
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
private void batchedPutIfAbsentNoNull(int batchSize, int blockOffset, Block block, long[] groupIds, int groupIdOffset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here it could be:
private void batchedPutIfAbsentNoNull(
Block block,
int batchSize,
int blockOffset,
long[] groupIds,
int groupIdOffset) {
SelectedPositions nonNullPositions = positionsRange(0, batchSize);
int[] hashPositions = new int[batchSize];
getHashPositions(nonNullPositions, blockOffset, hashPositions);
getGroupIds(nonNullPositions, hashPositions, groupIds, groupIdOffset);
SelectedPositions nonMatchingPositions = SelectedPositions.withSize(nonNullPositions.length());
initialMatchPositions(nonNullPositions, block, blockOffset, hashPositions, groupIds, groupIdOffset, nonMatchingPositions);
putRemainingPositions(nonMatchingPositions, block, blockOffset, hashPositions, groupIds, groupIdOffset);
}
same building blocks can be reused between non-null and nullable case
@@ -60,8 +60,8 @@ | |||
private int mask; | |||
|
|||
// the hash table from values to groupIds | |||
private LongBigArray values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this based on the understanding that https://bugs.openjdk.java.net/browse/JDK-8027959 has solved the problem of GC pauses due to humongous objects ?
Could we still suffer from OOMs due to heap fragmentation as reported in http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2017-November/002725.html (https://bugs.openjdk.java.net/browse/JDK-8191565) ?
I guess our HeapRegionSize=32M jvm config recommendation should mitigate it to some extent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we still suffer from OOMs due to heap fragmentation as reported in http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2017-November/002725.html (https://bugs.openjdk.java.net/browse/JDK-8191565) ?
Possibly, but IMO gains are worth it. Note that MultiChannelGroupByHash
already uses raw arrays (albeit I suspect it will have less groups on average as multi-channel group by are based on user GROUP BY
).
I think #11011 will mitigate risk of fragmenting memory.
} | ||
} | ||
|
||
public static void main(String[] args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the benchmark commit first and include results before/after of this benchmark as well in the "Replace BigArrays with primitive" commit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first commit is already extracted and merged. Did that for others though
38a5ddb
to
d7fd0c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
d7fd0c7
to
35350cb
Compare
Added nulls to the benchmark |
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Outdated
Show resolved
Hide resolved
public static class ChannelDefinition | ||
{ | ||
private final ColumnType columnType; | ||
private final int distinctValuesCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment what these fields mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seriously?
columnType
means column type and distinctValuesCount
means distinct value count?
What else can I write in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else can I write in the comment?
Is it distinctValuesCount
within a dictionary or a total?
What does distinctDictionaries
mean? What is a non-distinct dictionary?
There parameters are not obvious for reader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some changes, see if you like it
private static final BlockTypeOperators TYPE_OPERATOR_FACTORY = new BlockTypeOperators(TYPE_OPERATORS); | ||
|
||
@SuppressWarnings("MismatchedQueryAndUpdateOfCollection") | ||
private final List<GroupByIdBlock> results = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not create it in groupBy
or use JMH Blackhole
? I think results
can also be shared between threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I create it in groupBy
it may get removed by the compiler.
We already had a discussion about Blackhole
here. That would complicate the class a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return both result page and List<GroupByIdBlock> results
in groupBy
then.
I'm not sure if List<GroupByIdBlock> results = new ArrayList<>();
will behave correctly in multi-threaded environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no multithreading here
Just return both result page and List results in groupBy then.
That's the best idea. done
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Outdated
Show resolved
Hide resolved
lgtm % |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batch BigintGroupByHash
lgtm % comments
long[] dummyGroupIds = new long[MAX_BATCH_SIZE]; | ||
while (remainingPositions != 0) { | ||
int positionCountUntilRehash = maxFill - nextGroupId; | ||
if (positionCountUntilRehash == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change it to:
if (positionCountUntilRehash < MAX_BATCH_SIZE) {
to avoid problem @lukasz-stec described below?
Then you wouldn't need min
with positionCountUntilRehash
below too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may increase the memory footprint, but as you wish
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
35350cb
to
658cee3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batched dictionary processing in BigintGroupByHash class
lgtm % comments
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Outdated
Show resolved
Hide resolved
private static final BlockTypeOperators TYPE_OPERATOR_FACTORY = new BlockTypeOperators(TYPE_OPERATORS); | ||
|
||
@SuppressWarnings("MismatchedQueryAndUpdateOfCollection") | ||
private final List<GroupByIdBlock> results = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return both result page and List<GroupByIdBlock> results
in groupBy
then.
I'm not sure if List<GroupByIdBlock> results = new ArrayList<>();
will behave correctly in multi-threaded environment
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Show resolved
Hide resolved
public static class ChannelDefinition | ||
{ | ||
private final ColumnType columnType; | ||
private final int distinctValuesCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else can I write in the comment?
Is it distinctValuesCount
within a dictionary or a total?
What does distinctDictionaries
mean? What is a non-distinct dictionary?
There parameters are not obvious for reader
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly lgtm % benchmarks % comments % question about AddDictionaryPageWork
batch performance
|
||
Block[] dictionaries = new Block[distinctDictionaries]; | ||
Set<Integer>[] possibleIndexesSet = new Set[distinctDictionaries]; | ||
// Generate dictionaries and positions from those dictionaries that are actually used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generate dictionaries
What these dictionaries contain?
positions from those dictionaries that are actually used
Are used where?
I find it really hard to figure what is happening in code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it much simpler and less elaborate.Benchmark results are similar
int positionInDictionary = block.getId(lastPosition); | ||
registerGroupId(dictionary, positionInDictionary); | ||
lastPosition++; | ||
int remainingPositions = positionCount - lastPosition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mentioned something about dictionary size threshold in 737394d#r827988340, but I don't see a need for it in the code
{ | ||
results.clear(); | ||
for (Page page : pages) { | ||
Work<GroupByIdBlock> work = groupByHash.getGroupIds(page); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you benchmark io.trino.operator.BigintGroupByHash#getGroupIds
only, but I think you should benchmark addPage
too. For example, I'm suspicious if AddDictionaryPageWork
benefits from batching (I think that only GetDictionaryGroupIdsWork
benefits from long[] groupIds
array instead of builder)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
658cee3
to
71afe0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adressed comments
{ | ||
results.clear(); | ||
for (Page page : pages) { | ||
Work<GroupByIdBlock> work = groupByHash.getGroupIds(page); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
private static final BlockTypeOperators TYPE_OPERATOR_FACTORY = new BlockTypeOperators(TYPE_OPERATORS); | ||
|
||
@SuppressWarnings("MismatchedQueryAndUpdateOfCollection") | ||
private final List<GroupByIdBlock> results = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no multithreading here
Just return both result page and List results in groupBy then.
That's the best idea. done
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Show resolved
Hide resolved
public static class ChannelDefinition | ||
{ | ||
private final ColumnType columnType; | ||
private final int distinctValuesCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some changes, see if you like it
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Show resolved
Hide resolved
|
||
Block[] dictionaries = new Block[distinctDictionaries]; | ||
Set<Integer>[] possibleIndexesSet = new Set[distinctDictionaries]; | ||
// Generate dictionaries and positions from those dictionaries that are actually used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it much simpler and less elaborate.Benchmark results are similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % comments.
- it seems that last commit is broken
- I would still look at nullable case to improve it
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java
Show resolved
Hide resolved
for (int i = 0; i < left; i++) { | ||
usedValues.add(r.nextInt(m)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: alternative would be:
List<Integer> allNumbers = IntStream.range(0, bound).boxed().collect(toList());
Collections.shuffle(allNumbers, RANDOM);
return allNumbers.stream().limit(count).collect(toImmutableSet());
as in io.trino.block.BlockAssertions#chooseRandomUnique
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
d721ef8
to
843d86d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
% comments
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java
Outdated
Show resolved
Hide resolved
Overall gain with small regressions for individual cases. Benchmarks before & after BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_2_GROUPS avgt 15 3,840 ± 0,005 ns/op 3,694 ± 0,104 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_10_GROUPS avgt 15 3,853 ± 0,095 ns/op 3,664 ± 0,095 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_1K_GROUPS avgt 15 4,840 ± 0,144 ns/op 4,876 ± 0,032 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_10K_GROUPS avgt 15 11,389 ± 0,349 ns/op 10,717 ± 0,465 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_100K_GROUPS avgt 15 11,245 ± 0,532 ns/op 10,544 ± 0,145 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_1M_GROUPS avgt 15 28,606 ± 1,104 ns/op 26,988 ± 0,638 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_10M_GROUPS avgt 15 89,223 ± 1,346 ns/op 79,167 ± 1,381 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_2_GROUPS avgt 15 5,511 ± 0,257 ns/op 4,760 ± 0,068 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_10_GROUPS avgt 15 5,335 ± 0,010 ns/op 5,087 ± 0,210 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_1K_GROUPS avgt 15 5,858 ± 0,279 ns/op 6,694 ± 0,304 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_10K_GROUPS avgt 15 11,595 ± 0,491 ns/op 13,082 ± 0,489 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_100K_GROUPS avgt 15 12,248 ± 0,107 ns/op 12,657 ± 0,421 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_1M_GROUPS avgt 15 28,886 ± 0,429 ns/op 28,475 ± 0,215 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_10M_GROUPS avgt 15 68,922 ± 1,376 ns/op 60,784 ± 1,994 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_2_GROUPS avgt 15 7,337 ± 0,091 ns/op 7,672 ± 0,218 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_10_GROUPS avgt 15 7,586 ± 0,366 ns/op 7,524 ± 0,356 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_1K_GROUPS avgt 15 7,528 ± 0,964 ns/op 8,544 ± 0,036 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_10K_GROUPS avgt 15 10,017 ± 0,550 ns/op 9,985 ± 0,119 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_100K_GROUPS avgt 15 11,768 ± 0,093 ns/op 11,893 ± 0,219 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_1M_GROUPS avgt 15 23,728 ± 1,584 ns/op 20,926 ± 0,396 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_10M_GROUPS avgt 15 48,788 ± 1,097 ns/op 41,916 ± 0,170 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_2_GROUPS avgt 15 3,684 ± 0,086 ns/op 2,975 ± 0,052 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_10_GROUPS avgt 15 3,788 ± 0,091 ns/op 3,125 ± 0,052 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_1K_GROUPS avgt 15 3,521 ± 0,022 ns/op 3,370 ± 0,017 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_10K_GROUPS avgt 15 4,476 ± 0,126 ns/op 3,908 ± 0,012 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_100K_GROUPS avgt 15 5,872 ± 0,180 ns/op 6,299 ± 0,048 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_1M_GROUPS avgt 15 8,190 ± 0,029 ns/op 7,391 ± 0,050 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_10M_GROUPS avgt 15 10,571 ± 0,149 ns/op 9,601 ± 0,155 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS avgt 15 8,240 ± 0,468 ns/op 5,390 ± 0,029 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10_GROUPS avgt 15 7,960 ± 0,373 ns/op 5,029 ± 0,081 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_1K_GROUPS avgt 15 7,619 ± 0,046 ns/op 7,447 ± 0,046 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10K_GROUPS avgt 15 15,679 ± 0,390 ns/op 13,051 ± 0,194 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_100K_GROUPS avgt 15 14,323 ± 0,348 ns/op 12,606 ± 0,301 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_1M_GROUPS avgt 15 35,308 ± 2,198 ns/op 27,489 ± 0,724 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10M_GROUPS avgt 15 96,250 ± 1,463 ns/op 80,487 ± 0,824 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS avgt 15 9,271 ± 0,979 ns/op 7,115 ± 0,038 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10_GROUPS avgt 15 9,244 ± 0,953 ns/op 6,559 ± 0,039 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_1K_GROUPS avgt 15 8,457 ± 0,351 ns/op 9,240 ± 0,153 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10K_GROUPS avgt 15 15,021 ± 0,076 ns/op 15,271 ± 0,286 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_100K_GROUPS avgt 15 14,020 ± 0,276 ns/op 14,550 ± 0,539 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_1M_GROUPS avgt 15 34,683 ± 1,120 ns/op 29,855 ± 1,752 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10M_GROUPS avgt 15 73,612 ± 0,484 ns/op 62,057 ± 1,362 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS avgt 15 11,672 ± 2,091 ns/op 7,775 ± 0,357 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10_GROUPS avgt 15 10,900 ± 1,606 ns/op 7,550 ± 0,301 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_1K_GROUPS avgt 15 9,998 ± 0,095 ns/op 9,034 ± 0,204 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10K_GROUPS avgt 15 13,536 ± 0,287 ns/op 12,194 ± 0,299 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_100K_GROUPS avgt 15 13,482 ± 0,167 ns/op 11,710 ± 0,125 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_1M_GROUPS avgt 15 27,322 ± 0,666 ns/op 22,543 ± 0,340 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10M_GROUPS avgt 15 51,799 ± 1,448 ns/op 42,143 ± 0,704 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS avgt 15 7,991 ± 0,040 ns/op 3,324 ± 0,160 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10_GROUPS avgt 15 8,796 ± 0,176 ns/op 3,321 ± 0,028 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_1K_GROUPS avgt 15 5,854 ± 0,124 ns/op 3,579 ± 0,030 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10K_GROUPS avgt 15 6,427 ± 0,074 ns/op 4,353 ± 0,112 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_100K_GROUPS avgt 15 7,498 ± 0,182 ns/op 5,162 ± 0,145 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_1M_GROUPS avgt 15 10,800 ± 0,585 ns/op 8,169 ± 0,059 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10M_GROUPS avgt 15 12,616 ± 0,383 ns/op 9,964 ± 0,106 ns/op
Instead of iterating over groups and jumping all over the hash table, we iterate over the hash table itself minimizing random memory access. The change is done only in BigintGroupByHash. MultiChannelGroupByHash is already properly implemented Before BenchmarkGroupByHashOnSimulatedData.groupBy 0 BIGINT_1M_GROUPS ADD avgt 30 26,855 ± 0,359 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy 0 BIGINT_10M_GROUPS ADD avgt 30 79,430 ± 0,678 ns/op After BenchmarkGroupByHashOnSimulatedData.groupBy 0 BIGINT_1M_GROUPS ADD avgt 30 25,910 ± 0,342 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy 0 BIGINT_10M_GROUPS ADD avgt 30 63,016 ± 0,748 ns/op
Benchmarks: For ADD work type changes are minimal. For GET_GROUPS some regressions are present, but overall gain is about 5% Before: BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 6,251 ± 0,146 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 6,602 ± 0,400 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 6,306 ± 0,117 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 6,786 ± 0,469 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 5,579 ± 0,142 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 19,389 ± 0,201 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 4,462 ± 0,139 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 4,317 ± 0,065 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 6,481 ± 1,010 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 4,687 ± 0,117 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 4,389 ± 0,038 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 21,026 ± 0,405 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 4,340 ± 0,170 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 5,991 ± 0,786 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 4,490 ± 0,100 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 4,712 ± 0,119 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 4,508 ± 0,072 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 17,024 ± 0,433 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 4,547 ± 0,155 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 4,402 ± 0,150 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 5,282 ± 0,808 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 4,792 ± 0,096 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 4,599 ± 0,126 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 9,405 ± 0,142 ns/op After: BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 4,791 ± 0,146 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 4,487 ± 0,400 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 4,562 ± 0,117 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 5,114 ± 0,469 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 4,351 ± 0,142 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 19,531 ± 0,201 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 5,286 ± 0,139 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 5,521 ± 0,065 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 4,576 ± 1,010 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 5,014 ± 0,117 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 3,809 ± 0,038 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 20,453 ± 0,405 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 5,184 ± 0,170 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 5,400 ± 0,786 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 4,664 ± 0,100 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 5,213 ± 0,119 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 3,849 ± 0,072 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 17,651 ± 0,433 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 5,612 ± 0,155 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 5,487 ± 0,150 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 4,699 ± 0,808 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 5,118 ± 0,096 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 3,621 ± 0,126 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 8,173 ± 0,142 ns/op
843d86d
to
8878eaf
Compare
@skrzypo987 should we close this? |
yep. Nothing more to do here |
Description
Details in commit messages
General information
improvement
core query engine
slow -> imrpovement -> fast
Related issues, pull requests, and links
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: