Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long aggregation improvements #11031

Closed

Conversation

skrzypo987
Copy link
Member

Description

Details in commit messages

General information

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine

How would you describe this change to a non-technical end user or system administrator?

slow -> imrpovement -> fast

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Improve aggregation performance of bigint columns

@cla-bot cla-bot bot added the cla-signed label Feb 14, 2022
@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 088029d to cb2031e Compare February 14, 2022 10:31
Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.
@skrzypo987 Do you have some numbers for the improvements?

if (block.mayHaveNull()) {
boolean haveNull = false;

int[] hashes = new int[batchSize];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reuse hashes array to limit GC pressure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, should this be named hashPositions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed.
By reusing you mean making it a class field? I don't think it will make much sense. This is a short-living object allocated on eden and dying pretty quickly. GC loves objects like that.

}
}

private void batchedPutIfAbsent(int batchSize, int lastPosition, Block block, long[] groupIds, int groupIdOffset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastPosition -> startPosition, groupIds -> ouGroupIds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to blockOffset.
As of outGroupIds, I don't really like it. Maybe we can figure something else out

}
int batchSize = min(min(leftToProcess, MAX_BATCH_SIZE), leftBeforeRehash);

long[] dummyGroupIds = new long[batchSize];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this outside of loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. I hope that JIT is getting rid of it anyway.

long value = BIGINT.getLong(block, lastPosition + i);
long storedValue = values[hashes[i]];
boolean match = value == storedValue;
groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the perf improvement from this line vs just if (!match) groupIds[groupIdOffset + i] = -1? I would hope that the if could be replaced with cmove by jit in bad cases.
If this stays, at least add a comment on what this line does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match will be (hopefully) casted to a number by the compiler. That way there is no branch at all.
Added a comment

for (int i = 0; i < batchSize; i++) {
if (groupIds[groupIdOffset + i] >= 0) {
long value = BIGINT.getLong(block, lastPosition + i);
long storedValue = values[hashes[i]];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a large number of groups, this is a cache miss. The same is true for the this.groupIds[hashes[i]] above.
Since this loop is not super tight, manually unrolling could help here (I have seen improvement with doing 4 items at once)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a try. No difference.
The loop actually is super tight IMO.

@skrzypo987
Copy link
Member Author

Left some comments. @skrzypo987 Do you have some numbers for the improvements?

That's a damn good review.
I run some macro benchmarks on unpartitioned data:
tpch - 4.7% CPU, 3.6% wall time. tpcds without significant changes.
For paritioned data gains should be slightly smaller.

@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch 2 times, most recently from 6eab0c5 to d1d7a9b Compare February 16, 2022 08:29
Copy link
Member Author

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed.
I also added another commit with batched dictionary processing. Brings 5-10% improvement in microbenchmarks so nothing significant
I also started macro benchmarks for partitioned data. Will share the results when they finish

}
int batchSize = min(min(leftToProcess, MAX_BATCH_SIZE), leftBeforeRehash);

long[] dummyGroupIds = new long[batchSize];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. I hope that JIT is getting rid of it anyway.

}
}

private void batchedPutIfAbsent(int batchSize, int lastPosition, Block block, long[] groupIds, int groupIdOffset)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to blockOffset.
As of outGroupIds, I don't really like it. Maybe we can figure something else out

if (block.mayHaveNull()) {
boolean haveNull = false;

int[] hashes = new int[batchSize];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed.
By reusing you mean making it a class field? I don't think it will make much sense. This is a short-living object allocated on eden and dying pretty quickly. GC loves objects like that.

for (int i = 0; i < batchSize; i++) {
if (groupIds[groupIdOffset + i] >= 0) {
long value = BIGINT.getLong(block, lastPosition + i);
long storedValue = values[hashes[i]];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a try. No difference.
The loop actually is super tight IMO.

long value = BIGINT.getLong(block, lastPosition + i);
long storedValue = values[hashes[i]];
boolean match = value == storedValue;
groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match will be (hopefully) casted to a number by the compiler. That way there is no branch at all.
Added a comment

Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

// if (!match)
// groupIds[groupIdOffset + i] = -1;
// but without explicit branches
groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if is nicer code and could be faster if compiled to cmov (cmov has 1 cycle latency). I would check what both expressions compile to and choose acordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some benchmarks with replacing simple ifs with this ? 1:0 and it was about 2x faster every single time.
IMHO A loop without a branch is always going to be faster then a loop with a branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmov is not a branch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite frankly I don't know what it is. We should be platform agnostic.
We use this "trick" throughout the Trino codebase, I see no reason why we should not use it here.

boolean finished;
do {
finished = work.process();
results.add(work.getResult()); // Pretend the results are used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use org.openjdk.jmh.infra.Blackhole for that. it avoids keeps not needed stuff in memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class seems serious. It cannot even be easily instantiated. I'd rather stay with basic solutions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't create it yourself. jmh will inject it if you have it as param to the benchmark method.
this is the exact use case for it. see https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.java

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then you cannot use the benchmark method from outside the benchmark, i.e. runGroupByTpch method, because you cannot create a Blackhole instance. I'd rather stay with the current option

public enum ColumnType
{
BIGINT(BigintType.BIGINT, (blockBuilder, positionCount, cardinality, seed) -> {
Random r = new Random(seed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are different seeds needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that two columns with the same type got different values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use one static Random value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way if you have two bigint columns their contents will be exactly the same. So 2 columns with cardinality of 10 will produce 10 groups. We want them to be independent from each other and have 100 groups in aggregation

@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from d1d7a9b to 6fc6ed9 Compare February 16, 2022 14:51
boolean finished;
do {
finished = work.process();
results.add(work.getResult()); // Pretend the results are used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't create it yourself. jmh will inject it if you have it as param to the benchmark method.
this is the exact use case for it. see https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.java

// if (!match)
// groupIds[groupIdOffset + i] = -1;
// but without explicit branches
groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmov is not a branch

public enum ColumnType
{
BIGINT(BigintType.BIGINT, (blockBuilder, positionCount, cardinality, seed) -> {
Random r = new Random(seed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use one static Random value

@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 6fc6ed9 to cf181c0 Compare February 21, 2022 09:36
@skrzypo987 skrzypo987 requested review from sopel39 and removed request for sopel39 February 22, 2022 16:12
@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch 2 times, most recently from 3e95042 to a924346 Compare February 28, 2022 14:55
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm Replace LongBigArray with long[] ...

I think you can split it into smaller PRs so that it lands faster


/**
* This class attempts to emulate aggregations done while running tpch queries on unpartitioned data.
* The data has been acquired on a single node Trino using sf10 scale.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that mean?
The data has been acquired on a single node Trino using sf10 scale.
What is data?

Please elaborate what is benchmarked specifically and what parameters are affected.

@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public class BenchmarkGroupByHashOnTpch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think these benchmarks are beneficial, I think we might be overoptimizing for TPCH use cases. Could we just parametrize various use cases instead of trying to match TPCH data characteristic? Maybe we could use TPCH datagen directly instead of simulating already artificial data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would either parametrize it and not pretend to be related to TPCH or use actual TPCH data.

BTW: take a look at io.trino.benchmark.GroupByAggregationSqlBenchmark. I don't think its used, but maybe it's worth reviving. For one, I think that framework uses generated TPCH instead of pre-generated data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use TPCH datagen directly

It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the purpose of this PR is to optimize bigint aggregation, bot add some fancy benchmarks so maybe I should just delete the commit? WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones

What do you mean by not being correlated enough?

Actually the purpose of this PR is to optimize bigint aggregation, bot add some fancy benchmarks so maybe I should just delete the commit? WDYT?

We can leave it out of this PR and create a new one. Do you have JMH numbers from existing benchmarks?

Anyway, it probably makes sense to revise that old Sql benchmarking framework

this.distinctDictionaries = distinctDictionaries;
}

public Block[] createBlocks(int blockCount, int positionsPerBlock, int channel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just use TPCH data generator directly. Something like io.trino.plugin.hive.benchmark.BenchmarkFileFormatsUtils#createTpchDataSet(io.trino.plugin.hive.benchmark.FileFormat, io.trino.tpch.TpchTable<E>, java.util.List<io.trino.tpch.TpchColumn<E>>)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the comment above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones

sf1 is not sufficient? sf20 will have different characteristic? I think these benchmarks are fine as long as they don't overspecialize for a specific case

Copy link
Member Author

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can split it into smaller PRs so that it lands faster

No I cannot. If there is a problem with the process of merging multi-commit PRs we should fix the process, not the PRs.

@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public class BenchmarkGroupByHashOnTpch
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use TPCH datagen directly

It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones

@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public class BenchmarkGroupByHashOnTpch
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the purpose of this PR is to optimize bigint aggregation, bot add some fancy benchmarks so maybe I should just delete the commit? WDYT?

this.distinctDictionaries = distinctDictionaries;
}

public Block[] createBlocks(int blockCount, int positionsPerBlock, int channel)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the comment above.

@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from a924346 to 304d70e Compare March 3, 2022 08:38
@sopel39
Copy link
Member

sopel39 commented Mar 9, 2022

@skrzypo987 do you have macro benchmark results that you could attach here?

@sopel39 sopel39 requested a review from raunaqmorarka March 9, 2022 11:39
}
}

private void batchedPutIfAbsentNoNull(int batchSize, int blockOffset, Block block, long[] groupIds, int groupIdOffset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it could be:

private void batchedPutIfAbsentNoNull(
    Block block,
    int batchSize,
    int blockOffset,
    long[] groupIds,
    int groupIdOffset) {
  SelectedPositions nonNullPositions = positionsRange(0, batchSize);
  int[] hashPositions = new int[batchSize];
  getHashPositions(nonNullPositions, blockOffset, hashPositions);
  getGroupIds(nonNullPositions, hashPositions, groupIds, groupIdOffset);
  SelectedPositions nonMatchingPositions = SelectedPositions.withSize(nonNullPositions.length());
  initialMatchPositions(nonNullPositions, block, blockOffset, hashPositions, groupIds, groupIdOffset, nonMatchingPositions);
  putRemainingPositions(nonMatchingPositions, block, blockOffset, hashPositions, groupIds, groupIdOffset);
}

same building blocks can be reused between non-null and nullable case

@@ -60,8 +60,8 @@
private int mask;

// the hash table from values to groupIds
private LongBigArray values;
Copy link
Member

@raunaqmorarka raunaqmorarka Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this based on the understanding that https://bugs.openjdk.java.net/browse/JDK-8027959 has solved the problem of GC pauses due to humongous objects ?
Could we still suffer from OOMs due to heap fragmentation as reported in http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2017-November/002725.html (https://bugs.openjdk.java.net/browse/JDK-8191565) ?
I guess our HeapRegionSize=32M jvm config recommendation should mitigate it to some extent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we still suffer from OOMs due to heap fragmentation as reported in http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2017-November/002725.html (https://bugs.openjdk.java.net/browse/JDK-8191565) ?

Possibly, but IMO gains are worth it. Note that MultiChannelGroupByHash already uses raw arrays (albeit I suspect it will have less groups on average as multi-channel group by are based on user GROUP BY).

I think #11011 will mitigate risk of fragmenting memory.

}
}

public static void main(String[] args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the benchmark commit first and include results before/after of this benchmark as well in the "Replace BigArrays with primitive" commit ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit is already extracted and merged. Did that for others though

@skrzypo987 skrzypo987 requested a review from lukasz-stec March 11, 2022 11:59
@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 38a5ddb to d7fd0c7 Compare March 14, 2022 07:39
Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from d7fd0c7 to 35350cb Compare March 15, 2022 10:11
@skrzypo987
Copy link
Member Author

Added nulls to the benchmark

public static class ChannelDefinition
{
private final ColumnType columnType;
private final int distinctValuesCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment what these fields mean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seriously?
columnType means column type and distinctValuesCount means distinct value count?
What else can I write in the comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else can I write in the comment?

Is it distinctValuesCount within a dictionary or a total?

What does distinctDictionaries mean? What is a non-distinct dictionary?

There parameters are not obvious for reader

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some changes, see if you like it

private static final BlockTypeOperators TYPE_OPERATOR_FACTORY = new BlockTypeOperators(TYPE_OPERATORS);

@SuppressWarnings("MismatchedQueryAndUpdateOfCollection")
private final List<GroupByIdBlock> results = new ArrayList<>();
Copy link
Member

@sopel39 sopel39 Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not create it in groupBy or use JMH Blackhole? I think results can also be shared between threads

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I create it in groupBy it may get removed by the compiler.
We already had a discussion about Blackhole here. That would complicate the class a bit

Copy link
Member

@sopel39 sopel39 Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return both result page and List<GroupByIdBlock> results in groupBy then.

I'm not sure if List<GroupByIdBlock> results = new ArrayList<>(); will behave correctly in multi-threaded environment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no multithreading here

Just return both result page and List results in groupBy then.

That's the best idea. done

@sopel39
Copy link
Member

sopel39 commented Mar 15, 2022

lgtm % Optimize rehashing hash table in aggregation. I suggest extract it as separate PR

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch BigintGroupByHash lgtm % comments

long[] dummyGroupIds = new long[MAX_BATCH_SIZE];
while (remainingPositions != 0) {
int positionCountUntilRehash = maxFill - nextGroupId;
if (positionCountUntilRehash == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change it to:

if (positionCountUntilRehash < MAX_BATCH_SIZE) {

to avoid problem @lukasz-stec described below?

Then you wouldn't need min with positionCountUntilRehash below too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may increase the memory footprint, but as you wish

@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 35350cb to 658cee3 Compare March 16, 2022 13:07
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batched dictionary processing in BigintGroupByHash class lgtm % comments

private static final BlockTypeOperators TYPE_OPERATOR_FACTORY = new BlockTypeOperators(TYPE_OPERATORS);

@SuppressWarnings("MismatchedQueryAndUpdateOfCollection")
private final List<GroupByIdBlock> results = new ArrayList<>();
Copy link
Member

@sopel39 sopel39 Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return both result page and List<GroupByIdBlock> results in groupBy then.

I'm not sure if List<GroupByIdBlock> results = new ArrayList<>(); will behave correctly in multi-threaded environment

public static class ChannelDefinition
{
private final ColumnType columnType;
private final int distinctValuesCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else can I write in the comment?

Is it distinctValuesCount within a dictionary or a total?

What does distinctDictionaries mean? What is a non-distinct dictionary?

There parameters are not obvious for reader

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm % benchmarks % comments % question about AddDictionaryPageWork batch performance


Block[] dictionaries = new Block[distinctDictionaries];
Set<Integer>[] possibleIndexesSet = new Set[distinctDictionaries];
// Generate dictionaries and positions from those dictionaries that are actually used
Copy link
Member

@sopel39 sopel39 Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generate dictionaries

What these dictionaries contain?

positions from those dictionaries that are actually used

Are used where?

I find it really hard to figure what is happening in code below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it much simpler and less elaborate.Benchmark results are similar

int positionInDictionary = block.getId(lastPosition);
registerGroupId(dictionary, positionInDictionary);
lastPosition++;
int remainingPositions = positionCount - lastPosition;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mentioned something about dictionary size threshold in 737394d#r827988340, but I don't see a need for it in the code

{
results.clear();
for (Page page : pages) {
Work<GroupByIdBlock> work = groupByHash.getGroupIds(page);
Copy link
Member

@sopel39 sopel39 Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you benchmark io.trino.operator.BigintGroupByHash#getGroupIds only, but I think you should benchmark addPage too. For example, I'm suspicious if AddDictionaryPageWork benefits from batching (I think that only GetDictionaryGroupIdsWork benefits from long[] groupIds array instead of builder)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 658cee3 to 71afe0a Compare March 22, 2022 12:52
Copy link
Member Author

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adressed comments

{
results.clear();
for (Page page : pages) {
Work<GroupByIdBlock> work = groupByHash.getGroupIds(page);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

private static final BlockTypeOperators TYPE_OPERATOR_FACTORY = new BlockTypeOperators(TYPE_OPERATORS);

@SuppressWarnings("MismatchedQueryAndUpdateOfCollection")
private final List<GroupByIdBlock> results = new ArrayList<>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no multithreading here

Just return both result page and List results in groupBy then.

That's the best idea. done

public static class ChannelDefinition
{
private final ColumnType columnType;
private final int distinctValuesCount;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some changes, see if you like it


Block[] dictionaries = new Block[distinctDictionaries];
Set<Integer>[] possibleIndexesSet = new Set[distinctDictionaries];
// Generate dictionaries and positions from those dictionaries that are actually used
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it much simpler and less elaborate.Benchmark results are similar

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments.

  1. it seems that last commit is broken
  2. I would still look at nullable case to improve it

for (int i = 0; i < left; i++) {
usedValues.add(r.nextInt(m));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: alternative would be:

        List<Integer> allNumbers = IntStream.range(0, bound).boxed().collect(toList());
        Collections.shuffle(allNumbers, RANDOM);
        return allNumbers.stream().limit(count).collect(toImmutableSet());

as in io.trino.block.BlockAssertions#chooseRandomUnique

@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch 2 times, most recently from d721ef8 to 843d86d Compare April 15, 2022 17:36
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% comments

skrzypo987 added 3 commits April 20, 2022 11:49
Overall gain with small regressions for individual cases.

Benchmarks before & after
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD             0     BIGINT_2_GROUPS  avgt   15   3,840 ± 0,005  ns/op    3,694 ± 0,104 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD             0    BIGINT_10_GROUPS  avgt   15   3,853 ± 0,095  ns/op    3,664 ± 0,095 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD             0    BIGINT_1K_GROUPS  avgt   15   4,840 ± 0,144  ns/op    4,876 ± 0,032 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD             0   BIGINT_10K_GROUPS  avgt   15  11,389 ± 0,349  ns/op    10,717 ± 0,465 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD             0  BIGINT_100K_GROUPS  avgt   15  11,245 ± 0,532  ns/op    10,544 ± 0,145 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD             0    BIGINT_1M_GROUPS  avgt   15  28,606 ± 1,104  ns/op    26,988 ± 0,638 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD             0   BIGINT_10M_GROUPS  avgt   15  89,223 ± 1,346  ns/op    79,167 ± 1,381 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .1     BIGINT_2_GROUPS  avgt   15   5,511 ± 0,257  ns/op    4,760 ± 0,068 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .1    BIGINT_10_GROUPS  avgt   15   5,335 ± 0,010  ns/op    5,087 ± 0,210 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .1    BIGINT_1K_GROUPS  avgt   15   5,858 ± 0,279  ns/op    6,694 ± 0,304 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .1   BIGINT_10K_GROUPS  avgt   15  11,595 ± 0,491  ns/op    13,082 ± 0,489 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .1  BIGINT_100K_GROUPS  avgt   15  12,248 ± 0,107  ns/op    12,657 ± 0,421 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .1    BIGINT_1M_GROUPS  avgt   15  28,886 ± 0,429  ns/op    28,475 ± 0,215 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .1   BIGINT_10M_GROUPS  avgt   15  68,922 ± 1,376  ns/op    60,784 ± 1,994 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .5     BIGINT_2_GROUPS  avgt   15   7,337 ± 0,091  ns/op    7,672 ± 0,218 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .5    BIGINT_10_GROUPS  avgt   15   7,586 ± 0,366  ns/op    7,524 ± 0,356 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .5    BIGINT_1K_GROUPS  avgt   15   7,528 ± 0,964  ns/op    8,544 ± 0,036 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .5   BIGINT_10K_GROUPS  avgt   15  10,017 ± 0,550  ns/op    9,985 ± 0,119 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .5  BIGINT_100K_GROUPS  avgt   15  11,768 ± 0,093  ns/op    11,893 ± 0,219 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .5    BIGINT_1M_GROUPS  avgt   15  23,728 ± 1,584  ns/op    20,926 ± 0,396 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .5   BIGINT_10M_GROUPS  avgt   15  48,788 ± 1,097  ns/op    41,916 ± 0,170 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .9     BIGINT_2_GROUPS  avgt   15   3,684 ± 0,086  ns/op    2,975 ± 0,052 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .9    BIGINT_10_GROUPS  avgt   15   3,788 ± 0,091  ns/op    3,125 ± 0,052 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .9    BIGINT_1K_GROUPS  avgt   15   3,521 ± 0,022  ns/op    3,370 ± 0,017 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .9   BIGINT_10K_GROUPS  avgt   15   4,476 ± 0,126  ns/op    3,908 ± 0,012 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .9  BIGINT_100K_GROUPS  avgt   15   5,872 ± 0,180  ns/op    6,299 ± 0,048 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .9    BIGINT_1M_GROUPS  avgt   15   8,190 ± 0,029  ns/op    7,391 ± 0,050 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy          ADD            .9   BIGINT_10M_GROUPS  avgt   15  10,571 ± 0,149  ns/op    9,601 ± 0,155 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0     BIGINT_2_GROUPS  avgt   15   8,240 ± 0,468  ns/op    5,390 ± 0,029 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0    BIGINT_10_GROUPS  avgt   15   7,960 ± 0,373  ns/op    5,029 ± 0,081 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0    BIGINT_1K_GROUPS  avgt   15   7,619 ± 0,046  ns/op    7,447 ± 0,046 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0   BIGINT_10K_GROUPS  avgt   15  15,679 ± 0,390  ns/op    13,051 ± 0,194 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0  BIGINT_100K_GROUPS  avgt   15  14,323 ± 0,348  ns/op    12,606 ± 0,301 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0    BIGINT_1M_GROUPS  avgt   15  35,308 ± 2,198  ns/op    27,489 ± 0,724 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0   BIGINT_10M_GROUPS  avgt   15  96,250 ± 1,463  ns/op    80,487 ± 0,824 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1     BIGINT_2_GROUPS  avgt   15   9,271 ± 0,979  ns/op    7,115 ± 0,038 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1    BIGINT_10_GROUPS  avgt   15   9,244 ± 0,953  ns/op    6,559 ± 0,039 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1    BIGINT_1K_GROUPS  avgt   15   8,457 ± 0,351  ns/op    9,240 ± 0,153 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1   BIGINT_10K_GROUPS  avgt   15  15,021 ± 0,076  ns/op    15,271 ± 0,286 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1  BIGINT_100K_GROUPS  avgt   15  14,020 ± 0,276  ns/op    14,550 ± 0,539 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1    BIGINT_1M_GROUPS  avgt   15  34,683 ± 1,120  ns/op    29,855 ± 1,752 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1   BIGINT_10M_GROUPS  avgt   15  73,612 ± 0,484  ns/op    62,057 ± 1,362 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5     BIGINT_2_GROUPS  avgt   15  11,672 ± 2,091  ns/op    7,775 ± 0,357 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5    BIGINT_10_GROUPS  avgt   15  10,900 ± 1,606  ns/op    7,550 ± 0,301 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5    BIGINT_1K_GROUPS  avgt   15   9,998 ± 0,095  ns/op    9,034 ± 0,204 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5   BIGINT_10K_GROUPS  avgt   15  13,536 ± 0,287  ns/op    12,194 ± 0,299 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5  BIGINT_100K_GROUPS  avgt   15  13,482 ± 0,167  ns/op    11,710 ± 0,125 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5    BIGINT_1M_GROUPS  avgt   15  27,322 ± 0,666  ns/op    22,543 ± 0,340 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5   BIGINT_10M_GROUPS  avgt   15  51,799 ± 1,448  ns/op    42,143 ± 0,704 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9     BIGINT_2_GROUPS  avgt   15   7,991 ± 0,040  ns/op    3,324 ± 0,160 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9    BIGINT_10_GROUPS  avgt   15   8,796 ± 0,176  ns/op    3,321 ± 0,028 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9    BIGINT_1K_GROUPS  avgt   15   5,854 ± 0,124  ns/op    3,579 ± 0,030 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9   BIGINT_10K_GROUPS  avgt   15   6,427 ± 0,074  ns/op    4,353 ± 0,112 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9  BIGINT_100K_GROUPS  avgt   15   7,498 ± 0,182  ns/op    5,162 ± 0,145 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9    BIGINT_1M_GROUPS  avgt   15  10,800 ± 0,585  ns/op    8,169 ± 0,059 ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9   BIGINT_10M_GROUPS  avgt   15  12,616 ± 0,383  ns/op    9,964 ± 0,106 ns/op
Instead of iterating over groups and jumping all over the hash table,
we iterate over the hash table itself minimizing random memory access.
The change is done only in BigintGroupByHash. MultiChannelGroupByHash
is already properly implemented

Before
BenchmarkGroupByHashOnSimulatedData.groupBy             0   BIGINT_1M_GROUPS         ADD  avgt   30  26,855 ± 0,359  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy             0  BIGINT_10M_GROUPS         ADD  avgt   30  79,430 ± 0,678  ns/op
After
BenchmarkGroupByHashOnSimulatedData.groupBy             0   BIGINT_1M_GROUPS         ADD  avgt   30  25,910 ± 0,342  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy             0  BIGINT_10M_GROUPS         ADD  avgt   30  63,016 ± 0,748  ns/op
Benchmarks:
For ADD work type changes are minimal.
For GET_GROUPS some regressions are present, but overall gain is about 5%

Before:
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0         BIGINT_2_GROUPS_1_SMALL_DICTIONARY  avgt   30   6,251 ± 0,146  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0           BIGINT_2_GROUPS_1_BIG_DICTIONARY  avgt   30   6,602 ± 0,400  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0  BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY  avgt   30   6,306 ± 0,117  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0    BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY  avgt   30   6,786 ± 0,469  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0             BIGINT_10K_GROUPS_1_DICTIONARY  avgt   30   5,579 ± 0,142  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0      BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY  avgt   30  19,389 ± 0,201  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1         BIGINT_2_GROUPS_1_SMALL_DICTIONARY  avgt   30   4,462 ± 0,139  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1           BIGINT_2_GROUPS_1_BIG_DICTIONARY  avgt   30   4,317 ± 0,065  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1  BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY  avgt   30   6,481 ± 1,010  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1    BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY  avgt   30   4,687 ± 0,117  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1             BIGINT_10K_GROUPS_1_DICTIONARY  avgt   30   4,389 ± 0,038  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1      BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY  avgt   30  21,026 ± 0,405  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5         BIGINT_2_GROUPS_1_SMALL_DICTIONARY  avgt   30   4,340 ± 0,170  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5           BIGINT_2_GROUPS_1_BIG_DICTIONARY  avgt   30   5,991 ± 0,786  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5  BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY  avgt   30   4,490 ± 0,100  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5    BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY  avgt   30   4,712 ± 0,119  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5             BIGINT_10K_GROUPS_1_DICTIONARY  avgt   30   4,508 ± 0,072  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5      BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY  avgt   30  17,024 ± 0,433  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9         BIGINT_2_GROUPS_1_SMALL_DICTIONARY  avgt   30   4,547 ± 0,155  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9           BIGINT_2_GROUPS_1_BIG_DICTIONARY  avgt   30   4,402 ± 0,150  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9  BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY  avgt   30   5,282 ± 0,808  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9    BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY  avgt   30   4,792 ± 0,096  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9             BIGINT_10K_GROUPS_1_DICTIONARY  avgt   30   4,599 ± 0,126  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9      BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY  avgt   30   9,405 ± 0,142  ns/op
After:
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0         BIGINT_2_GROUPS_1_SMALL_DICTIONARY  avgt   30   4,791 ± 0,146  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0           BIGINT_2_GROUPS_1_BIG_DICTIONARY  avgt   30   4,487 ± 0,400  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0  BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY  avgt   30   4,562 ± 0,117  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0    BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY  avgt   30   5,114 ± 0,469  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0             BIGINT_10K_GROUPS_1_DICTIONARY  avgt   30   4,351 ± 0,142  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS             0      BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY  avgt   30  19,531 ± 0,201  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1         BIGINT_2_GROUPS_1_SMALL_DICTIONARY  avgt   30   5,286 ± 0,139  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1           BIGINT_2_GROUPS_1_BIG_DICTIONARY  avgt   30   5,521 ± 0,065  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1  BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY  avgt   30   4,576 ± 1,010  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1    BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY  avgt   30   5,014 ± 0,117  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1             BIGINT_10K_GROUPS_1_DICTIONARY  avgt   30   3,809 ± 0,038  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .1      BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY  avgt   30  20,453 ± 0,405  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5         BIGINT_2_GROUPS_1_SMALL_DICTIONARY  avgt   30   5,184 ± 0,170  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5           BIGINT_2_GROUPS_1_BIG_DICTIONARY  avgt   30   5,400 ± 0,786  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5  BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY  avgt   30   4,664 ± 0,100  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5    BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY  avgt   30   5,213 ± 0,119  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5             BIGINT_10K_GROUPS_1_DICTIONARY  avgt   30   3,849 ± 0,072  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .5      BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY  avgt   30  17,651 ± 0,433  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9         BIGINT_2_GROUPS_1_SMALL_DICTIONARY  avgt   30   5,612 ± 0,155  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9           BIGINT_2_GROUPS_1_BIG_DICTIONARY  avgt   30   5,487 ± 0,150  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9  BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY  avgt   30   4,699 ± 0,808  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9    BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY  avgt   30   5,118 ± 0,096  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9             BIGINT_10K_GROUPS_1_DICTIONARY  avgt   30   3,621 ± 0,126  ns/op
BenchmarkGroupByHashOnSimulatedData.groupBy   GET_GROUPS            .9      BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY  avgt   30   8,173 ± 0,142  ns/op
@skrzypo987 skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 843d86d to 8878eaf Compare April 20, 2022 10:04
@sopel39
Copy link
Member

sopel39 commented Apr 29, 2022

@skrzypo987 should we close this?

@skrzypo987
Copy link
Member Author

yep. Nothing more to do here

@skrzypo987 skrzypo987 closed this Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants