Long aggregation improvements #11031

skrzypo987 · 2022-02-14T09:01:48Z

Description

Details in commit messages

General information

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine

How would you describe this change to a non-technical end user or system administrator?

slow -> imrpovement -> fast

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Improve aggregation performance of bigint columns

lukasz-stec

Left some comments.
@skrzypo987 Do you have some numbers for the improvements?

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

lukasz-stec · 2022-02-14T20:01:57Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+        if (block.mayHaveNull()) {
+            boolean haveNull = false;
+
+            int[] hashes = new int[batchSize];


I would reuse hashes array to limit GC pressure.

also, should this be named hashPositions?

renamed.
By reusing you mean making it a class field? I don't think it will make much sense. This is a short-living object allocated on eden and dying pretty quickly. GC loves objects like that.

lukasz-stec · 2022-02-14T20:05:01Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+        }
+    }
+
+    private void batchedPutIfAbsent(int batchSize, int lastPosition, Block block, long[] groupIds, int groupIdOffset)


lastPosition -> startPosition, groupIds -> ouGroupIds?

Changed it to blockOffset.
As of outGroupIds, I don't really like it. Maybe we can figure something else out

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

lukasz-stec · 2022-02-14T20:20:52Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+                }
+                int batchSize = min(min(leftToProcess, MAX_BATCH_SIZE), leftBeforeRehash);
+
+                long[] dummyGroupIds = new long[batchSize];


move this outside of loop

done. I hope that JIT is getting rid of it anyway.

lukasz-stec · 2022-02-14T20:54:45Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+                    long value = BIGINT.getLong(block, lastPosition + i);
+                    long storedValue = values[hashes[i]];
+                    boolean match = value == storedValue;
+                    groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1;


whats the perf improvement from this line vs just if (!match) groupIds[groupIdOffset + i] = -1? I would hope that the if could be replaced with cmove by jit in bad cases.
If this stays, at least add a comment on what this line does.

match will be (hopefully) casted to a number by the compiler. That way there is no branch at all.
Added a comment

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

lukasz-stec · 2022-02-14T21:06:45Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+            for (int i = 0; i < batchSize; i++) {
+                if (groupIds[groupIdOffset + i] >= 0) {
+                    long value = BIGINT.getLong(block, lastPosition + i);
+                    long storedValue = values[hashes[i]];


for a large number of groups, this is a cache miss. The same is true for the this.groupIds[hashes[i]] above.
Since this loop is not super tight, manually unrolling could help here (I have seen improvement with doing 4 items at once)

Gave it a try. No difference.
The loop actually is super tight IMO.

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

skrzypo987 · 2022-02-15T08:25:52Z

Left some comments. @skrzypo987 Do you have some numbers for the improvements?

That's a damn good review.
I run some macro benchmarks on unpartitioned data:
tpch - 4.7% CPU, 3.6% wall time. tpcds without significant changes.
For paritioned data gains should be slightly smaller.

skrzypo987

Comments addressed.
I also added another commit with batched dictionary processing. Brings 5-10% improvement in microbenchmarks so nothing significant
I also started macro benchmarks for partitioned data. Will share the results when they finish

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

skrzypo987 · 2022-02-16T07:15:38Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+                }
+                int batchSize = min(min(leftToProcess, MAX_BATCH_SIZE), leftBeforeRehash);
+
+                long[] dummyGroupIds = new long[batchSize];


done. I hope that JIT is getting rid of it anyway.

skrzypo987 · 2022-02-16T07:18:36Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+        }
+    }
+
+    private void batchedPutIfAbsent(int batchSize, int lastPosition, Block block, long[] groupIds, int groupIdOffset)


Changed it to blockOffset.
As of outGroupIds, I don't really like it. Maybe we can figure something else out

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

skrzypo987 · 2022-02-16T07:34:53Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+        if (block.mayHaveNull()) {
+            boolean haveNull = false;
+
+            int[] hashes = new int[batchSize];


renamed.
By reusing you mean making it a class field? I don't think it will make much sense. This is a short-living object allocated on eden and dying pretty quickly. GC loves objects like that.

skrzypo987 · 2022-02-16T07:38:59Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+            for (int i = 0; i < batchSize; i++) {
+                if (groupIds[groupIdOffset + i] >= 0) {
+                    long value = BIGINT.getLong(block, lastPosition + i);
+                    long storedValue = values[hashes[i]];


Gave it a try. No difference.
The loop actually is super tight IMO.

skrzypo987 · 2022-02-16T07:42:19Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+                    long value = BIGINT.getLong(block, lastPosition + i);
+                    long storedValue = values[hashes[i]];
+                    boolean match = value == storedValue;
+                    groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1;


match will be (hopefully) casted to a number by the compiler. That way there is no branch at all.
Added a comment

lukasz-stec

lgtm % comments

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

lukasz-stec · 2022-02-16T10:30:26Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+                // if (!match)
+                //   groupIds[groupIdOffset + i] = -1;
+                // but without explicit branches
+                groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1;


if is nicer code and could be faster if compiled to cmov (cmov has 1 cycle latency). I would check what both expressions compile to and choose acordingly.

We had some benchmarks with replacing simple ifs with this ? 1:0 and it was about 2x faster every single time.
IMHO A loop without a branch is always going to be faster then a loop with a branch.

cmov is not a branch

Quite frankly I don't know what it is. We should be platform agnostic.
We use this "trick" throughout the Trino codebase, I see no reason why we should not use it here.

lukasz-stec · 2022-02-16T10:50:52Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

+            boolean finished;
+            do {
+                finished = work.process();
+                results.add(work.getResult()); // Pretend the results are used


you can use org.openjdk.jmh.infra.Blackhole for that. it avoids keeps not needed stuff in memory.

This class seems serious. It cannot even be easily instantiated. I'd rather stay with basic solutions

you don't create it yourself. jmh will inject it if you have it as param to the benchmark method.
this is the exact use case for it. see https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.java

And then you cannot use the benchmark method from outside the benchmark, i.e. runGroupByTpch method, because you cannot create a Blackhole instance. I'd rather stay with the current option

lukasz-stec · 2022-02-16T10:53:37Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

+    public enum ColumnType
+    {
+        BIGINT(BigintType.BIGINT, (blockBuilder, positionCount, cardinality, seed) -> {
+            Random r = new Random(seed);


why are different seeds needed?

So that two columns with the same type got different values.

why not use one static Random value

That way if you have two bigint columns their contents will be exactly the same. So 2 columns with cardinality of 10 will produce 10 groups. We want them to be independent from each other and have 100 groups in aggregation

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

lukasz-stec · 2022-02-16T16:12:36Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

+            boolean finished;
+            do {
+                finished = work.process();
+                results.add(work.getResult()); // Pretend the results are used


you don't create it yourself. jmh will inject it if you have it as param to the benchmark method.
this is the exact use case for it. see https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.java

lukasz-stec · 2022-02-16T16:13:26Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+                // if (!match)
+                //   groupIds[groupIdOffset + i] = -1;
+                // but without explicit branches
+                groupIds[groupIdOffset + i] = (groupIds[groupIdOffset + i] + 1) * (match ? 1 : 0) - 1;


cmov is not a branch

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

lukasz-stec · 2022-02-16T16:24:56Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

+    public enum ColumnType
+    {
+        BIGINT(BigintType.BIGINT, (blockBuilder, positionCount, cardinality, seed) -> {
+            Random r = new Random(seed);


why not use one static Random value

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

sopel39

lgtm Replace LongBigArray with long[] ...

I think you can split it into smaller PRs so that it lands faster

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

sopel39 · 2022-03-02T14:29:30Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

+
+/**
+ * This class attempts to emulate aggregations done while running tpch queries on unpartitioned data.
+ * The data has been acquired on a single node Trino using sf10 scale.


What does that mean?
The data has been acquired on a single node Trino using sf10 scale.
What is data?

Please elaborate what is benchmarked specifically and what parameters are affected.

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

sopel39 · 2022-03-02T14:40:48Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

+@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
+@Measurement(iterations = 20, time = 500, timeUnit = TimeUnit.MILLISECONDS)
+@BenchmarkMode(Mode.AverageTime)
+public class BenchmarkGroupByHashOnTpch


While I think these benchmarks are beneficial, I think we might be overoptimizing for TPCH use cases. Could we just parametrize various use cases instead of trying to match TPCH data characteristic? Maybe we could use TPCH datagen directly instead of simulating already artificial data?

I would either parametrize it and not pretend to be related to TPCH or use actual TPCH data.

BTW: take a look at io.trino.benchmark.GroupByAggregationSqlBenchmark. I don't think its used, but maybe it's worth reviving. For one, I think that framework uses generated TPCH instead of pre-generated data

Maybe we could use TPCH datagen directly

It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones

Actually the purpose of this PR is to optimize bigint aggregation, bot add some fancy benchmarks so maybe I should just delete the commit? WDYT?

It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones

What do you mean by not being correlated enough?

Actually the purpose of this PR is to optimize bigint aggregation, bot add some fancy benchmarks so maybe I should just delete the commit? WDYT?

We can leave it out of this PR and create a new one. Do you have JMH numbers from existing benchmarks?

Anyway, it probably makes sense to revise that old Sql benchmarking framework

sopel39 · 2022-03-02T14:43:08Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

+            this.distinctDictionaries = distinctDictionaries;
+        }
+
+        public Block[] createBlocks(int blockCount, int positionsPerBlock, int channel)


I think you can just use TPCH data generator directly. Something like io.trino.plugin.hive.benchmark.BenchmarkFileFormatsUtils#createTpchDataSet(io.trino.plugin.hive.benchmark.FileFormat, io.trino.tpch.TpchTable<E>, java.util.List<io.trino.tpch.TpchColumn<E>>)

Look at the comment above.

It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones

sf1 is not sufficient? sf20 will have different characteristic? I think these benchmarks are fine as long as they don't overspecialize for a specific case

skrzypo987

I think you can split it into smaller PRs so that it lands faster

No I cannot. If there is a problem with the process of merging multi-commit PRs we should fix the process, not the PRs.

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

skrzypo987 · 2022-03-03T07:40:59Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

+@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
+@Measurement(iterations = 20, time = 500, timeUnit = TimeUnit.MILLISECONDS)
+@BenchmarkMode(Mode.AverageTime)
+public class BenchmarkGroupByHashOnTpch


Maybe we could use TPCH datagen directly

It takes about 10 minutes to generate sf10. Numbers for smaller scale factors are not correlated enough with the bigger ones

skrzypo987 · 2022-03-03T07:42:14Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

+@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
+@Measurement(iterations = 20, time = 500, timeUnit = TimeUnit.MILLISECONDS)
+@BenchmarkMode(Mode.AverageTime)
+public class BenchmarkGroupByHashOnTpch


Actually the purpose of this PR is to optimize bigint aggregation, bot add some fancy benchmarks so maybe I should just delete the commit? WDYT?

skrzypo987 · 2022-03-03T07:47:56Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

+            this.distinctDictionaries = distinctDictionaries;
+        }
+
+        public Block[] createBlocks(int blockCount, int positionsPerBlock, int channel)


Look at the comment above.

sopel39 · 2022-03-09T10:36:13Z

@skrzypo987 do you have macro benchmark results that you could attach here?

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

sopel39 · 2022-03-09T13:45:34Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+        }
+    }
+
+    private void batchedPutIfAbsentNoNull(int batchSize, int blockOffset, Block block, long[] groupIds, int groupIdOffset)


here it could be:

private void batchedPutIfAbsentNoNull( Block block, int batchSize, int blockOffset, long[] groupIds, int groupIdOffset) { SelectedPositions nonNullPositions = positionsRange(0, batchSize); int[] hashPositions = new int[batchSize]; getHashPositions(nonNullPositions, blockOffset, hashPositions); getGroupIds(nonNullPositions, hashPositions, groupIds, groupIdOffset); SelectedPositions nonMatchingPositions = SelectedPositions.withSize(nonNullPositions.length()); initialMatchPositions(nonNullPositions, block, blockOffset, hashPositions, groupIds, groupIdOffset, nonMatchingPositions); putRemainingPositions(nonMatchingPositions, block, blockOffset, hashPositions, groupIds, groupIdOffset); }

same building blocks can be reused between non-null and nullable case

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

raunaqmorarka · 2022-03-10T07:42:48Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

@@ -60,8 +60,8 @@
    private int mask;

    // the hash table from values to groupIds
-    private LongBigArray values;


Is this based on the understanding that https://bugs.openjdk.java.net/browse/JDK-8027959 has solved the problem of GC pauses due to humongous objects ?
Could we still suffer from OOMs due to heap fragmentation as reported in http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2017-November/002725.html (https://bugs.openjdk.java.net/browse/JDK-8191565) ?
I guess our HeapRegionSize=32M jvm config recommendation should mitigate it to some extent.

Could we still suffer from OOMs due to heap fragmentation as reported in http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2017-November/002725.html (https://bugs.openjdk.java.net/browse/JDK-8191565) ?

Possibly, but IMO gains are worth it. Note that MultiChannelGroupByHash already uses raw arrays (albeit I suspect it will have less groups on average as multi-channel group by are based on user GROUP BY).

I think #11011 will mitigate risk of fragmenting memory.

raunaqmorarka · 2022-03-10T07:47:02Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnTpch.java

+        }
+    }
+
+    public static void main(String[] args)


Can we add the benchmark commit first and include results before/after of this benchmark as well in the "Replace BigArrays with primitive" commit ?

The first commit is already extracted and merged. Did that for others though

lukasz-stec

Some comments

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

skrzypo987 · 2022-03-15T10:12:01Z

Added nulls to the benchmark

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

sopel39 · 2022-03-15T12:59:15Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

+    public static class ChannelDefinition
+    {
+        private final ColumnType columnType;
+        private final int distinctValuesCount;


add comment what these fields mean

Seriously?
columnType means column type and distinctValuesCount means distinct value count?
What else can I write in the comment?

What else can I write in the comment?

Is it distinctValuesCount within a dictionary or a total?

What does distinctDictionaries mean? What is a non-distinct dictionary?

There parameters are not obvious for reader

Made some changes, see if you like it

sopel39 · 2022-03-15T14:06:28Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

+    private static final BlockTypeOperators TYPE_OPERATOR_FACTORY = new BlockTypeOperators(TYPE_OPERATORS);
+
+    @SuppressWarnings("MismatchedQueryAndUpdateOfCollection")
+    private final List<GroupByIdBlock> results = new ArrayList<>();


why not create it in groupBy or use JMH Blackhole? I think results can also be shared between threads

If I create it in groupBy it may get removed by the compiler.
We already had a discussion about Blackhole here. That would complicate the class a bit

Just return both result page and List<GroupByIdBlock> results in groupBy then.

I'm not sure if List<GroupByIdBlock> results = new ArrayList<>(); will behave correctly in multi-threaded environment

There is no multithreading here

Just return both result page and List results in groupBy then.

That's the best idea. done

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

sopel39 · 2022-03-15T14:45:14Z

lgtm % Optimize rehashing hash table in aggregation. I suggest extract it as separate PR

sopel39

Batch BigintGroupByHash lgtm % comments

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

sopel39 · 2022-03-15T16:09:37Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

+            long[] dummyGroupIds = new long[MAX_BATCH_SIZE];
+            while (remainingPositions != 0) {
+                int positionCountUntilRehash = maxFill - nextGroupId;
+                if (positionCountUntilRehash == 0) {


Could you change it to:

if (positionCountUntilRehash < MAX_BATCH_SIZE) {

to avoid problem @lukasz-stec described below?

Then you wouldn't need min with positionCountUntilRehash below too

That may increase the memory footprint, but as you wish

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

sopel39

Batched dictionary processing in BigintGroupByHash class lgtm % comments

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

sopel39 · 2022-03-17T11:22:12Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

+    private static final BlockTypeOperators TYPE_OPERATOR_FACTORY = new BlockTypeOperators(TYPE_OPERATORS);
+
+    @SuppressWarnings("MismatchedQueryAndUpdateOfCollection")
+    private final List<GroupByIdBlock> results = new ArrayList<>();


Just return both result page and List<GroupByIdBlock> results in groupBy then.

I'm not sure if List<GroupByIdBlock> results = new ArrayList<>(); will behave correctly in multi-threaded environment

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

sopel39 · 2022-03-17T11:31:28Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

+    public static class ChannelDefinition
+    {
+        private final ColumnType columnType;
+        private final int distinctValuesCount;


What else can I write in the comment?

Is it distinctValuesCount within a dictionary or a total?

What does distinctDictionaries mean? What is a non-distinct dictionary?

There parameters are not obvious for reader

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

sopel39

mostly lgtm % benchmarks % comments % question about AddDictionaryPageWork batch performance

sopel39 · 2022-03-17T14:26:53Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

+
+            Block[] dictionaries = new Block[distinctDictionaries];
+            Set<Integer>[] possibleIndexesSet = new Set[distinctDictionaries];
+            // Generate dictionaries and positions from those dictionaries that are actually used


Generate dictionaries

What these dictionaries contain?

positions from those dictionaries that are actually used

Are used where?

I find it really hard to figure what is happening in code below.

I made it much simpler and less elaborate.Benchmark results are similar

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

sopel39 · 2022-03-17T16:03:31Z

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

-                int positionInDictionary = block.getId(lastPosition);
-                registerGroupId(dictionary, positionInDictionary);
-                lastPosition++;
+            int remainingPositions = positionCount - lastPosition;


you mentioned something about dictionary size threshold in 737394d#r827988340, but I don't see a need for it in the code

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

sopel39 · 2022-03-17T16:08:11Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

+    {
+        results.clear();
+        for (Page page : pages) {
+            Work<GroupByIdBlock> work = groupByHash.getGroupIds(page);


you benchmark io.trino.operator.BigintGroupByHash#getGroupIds only, but I think you should benchmark addPage too. For example, I'm suspicious if AddDictionaryPageWork benefits from batching (I think that only GetDictionaryGroupIdsWork benefits from long[] groupIds array instead of builder)

skrzypo987

Adressed comments

skrzypo987 · 2022-03-18T12:12:56Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

+    {
+        results.clear();
+        for (Page page : pages) {
+            Work<GroupByIdBlock> work = groupByHash.getGroupIds(page);


skrzypo987 · 2022-03-18T12:28:08Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

+    private static final BlockTypeOperators TYPE_OPERATOR_FACTORY = new BlockTypeOperators(TYPE_OPERATORS);
+
+    @SuppressWarnings("MismatchedQueryAndUpdateOfCollection")
+    private final List<GroupByIdBlock> results = new ArrayList<>();


There is no multithreading here

Just return both result page and List results in groupBy then.

That's the best idea. done

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

skrzypo987 · 2022-03-18T12:35:09Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

+    public static class ChannelDefinition
+    {
+        private final ColumnType columnType;
+        private final int distinctValuesCount;


Made some changes, see if you like it

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

skrzypo987 · 2022-03-22T09:45:34Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

+
+            Block[] dictionaries = new Block[distinctDictionaries];
+            Set<Integer>[] possibleIndexesSet = new Set[distinctDictionaries];
+            // Generate dictionaries and positions from those dictionaries that are actually used


I made it much simpler and less elaborate.Benchmark results are similar

sopel39

lgtm % comments.

it seems that last commit is broken
I would still look at nullable case to improve it

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

sopel39 · 2022-03-25T14:35:10Z

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java

+                for (int i = 0; i < left; i++) {
+                    usedValues.add(r.nextInt(m));
+                }
+            }


nit: alternative would be:

List<Integer> allNumbers = IntStream.range(0, bound).boxed().collect(toList()); Collections.shuffle(allNumbers, RANDOM); return allNumbers.stream().limit(count).collect(toImmutableSet());

as in io.trino.block.BlockAssertions#chooseRandomUnique

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

sopel39

% comments

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java

Overall gain with small regressions for individual cases. Benchmarks before & after BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_2_GROUPS avgt 15 3,840 ± 0,005 ns/op 3,694 ± 0,104 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_10_GROUPS avgt 15 3,853 ± 0,095 ns/op 3,664 ± 0,095 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_1K_GROUPS avgt 15 4,840 ± 0,144 ns/op 4,876 ± 0,032 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_10K_GROUPS avgt 15 11,389 ± 0,349 ns/op 10,717 ± 0,465 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_100K_GROUPS avgt 15 11,245 ± 0,532 ns/op 10,544 ± 0,145 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_1M_GROUPS avgt 15 28,606 ± 1,104 ns/op 26,988 ± 0,638 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD 0 BIGINT_10M_GROUPS avgt 15 89,223 ± 1,346 ns/op 79,167 ± 1,381 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_2_GROUPS avgt 15 5,511 ± 0,257 ns/op 4,760 ± 0,068 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_10_GROUPS avgt 15 5,335 ± 0,010 ns/op 5,087 ± 0,210 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_1K_GROUPS avgt 15 5,858 ± 0,279 ns/op 6,694 ± 0,304 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_10K_GROUPS avgt 15 11,595 ± 0,491 ns/op 13,082 ± 0,489 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_100K_GROUPS avgt 15 12,248 ± 0,107 ns/op 12,657 ± 0,421 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_1M_GROUPS avgt 15 28,886 ± 0,429 ns/op 28,475 ± 0,215 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .1 BIGINT_10M_GROUPS avgt 15 68,922 ± 1,376 ns/op 60,784 ± 1,994 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_2_GROUPS avgt 15 7,337 ± 0,091 ns/op 7,672 ± 0,218 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_10_GROUPS avgt 15 7,586 ± 0,366 ns/op 7,524 ± 0,356 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_1K_GROUPS avgt 15 7,528 ± 0,964 ns/op 8,544 ± 0,036 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_10K_GROUPS avgt 15 10,017 ± 0,550 ns/op 9,985 ± 0,119 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_100K_GROUPS avgt 15 11,768 ± 0,093 ns/op 11,893 ± 0,219 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_1M_GROUPS avgt 15 23,728 ± 1,584 ns/op 20,926 ± 0,396 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .5 BIGINT_10M_GROUPS avgt 15 48,788 ± 1,097 ns/op 41,916 ± 0,170 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_2_GROUPS avgt 15 3,684 ± 0,086 ns/op 2,975 ± 0,052 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_10_GROUPS avgt 15 3,788 ± 0,091 ns/op 3,125 ± 0,052 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_1K_GROUPS avgt 15 3,521 ± 0,022 ns/op 3,370 ± 0,017 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_10K_GROUPS avgt 15 4,476 ± 0,126 ns/op 3,908 ± 0,012 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_100K_GROUPS avgt 15 5,872 ± 0,180 ns/op 6,299 ± 0,048 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_1M_GROUPS avgt 15 8,190 ± 0,029 ns/op 7,391 ± 0,050 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy ADD .9 BIGINT_10M_GROUPS avgt 15 10,571 ± 0,149 ns/op 9,601 ± 0,155 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS avgt 15 8,240 ± 0,468 ns/op 5,390 ± 0,029 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10_GROUPS avgt 15 7,960 ± 0,373 ns/op 5,029 ± 0,081 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_1K_GROUPS avgt 15 7,619 ± 0,046 ns/op 7,447 ± 0,046 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10K_GROUPS avgt 15 15,679 ± 0,390 ns/op 13,051 ± 0,194 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_100K_GROUPS avgt 15 14,323 ± 0,348 ns/op 12,606 ± 0,301 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_1M_GROUPS avgt 15 35,308 ± 2,198 ns/op 27,489 ± 0,724 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10M_GROUPS avgt 15 96,250 ± 1,463 ns/op 80,487 ± 0,824 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS avgt 15 9,271 ± 0,979 ns/op 7,115 ± 0,038 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10_GROUPS avgt 15 9,244 ± 0,953 ns/op 6,559 ± 0,039 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_1K_GROUPS avgt 15 8,457 ± 0,351 ns/op 9,240 ± 0,153 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10K_GROUPS avgt 15 15,021 ± 0,076 ns/op 15,271 ± 0,286 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_100K_GROUPS avgt 15 14,020 ± 0,276 ns/op 14,550 ± 0,539 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_1M_GROUPS avgt 15 34,683 ± 1,120 ns/op 29,855 ± 1,752 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10M_GROUPS avgt 15 73,612 ± 0,484 ns/op 62,057 ± 1,362 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS avgt 15 11,672 ± 2,091 ns/op 7,775 ± 0,357 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10_GROUPS avgt 15 10,900 ± 1,606 ns/op 7,550 ± 0,301 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_1K_GROUPS avgt 15 9,998 ± 0,095 ns/op 9,034 ± 0,204 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10K_GROUPS avgt 15 13,536 ± 0,287 ns/op 12,194 ± 0,299 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_100K_GROUPS avgt 15 13,482 ± 0,167 ns/op 11,710 ± 0,125 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_1M_GROUPS avgt 15 27,322 ± 0,666 ns/op 22,543 ± 0,340 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10M_GROUPS avgt 15 51,799 ± 1,448 ns/op 42,143 ± 0,704 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS avgt 15 7,991 ± 0,040 ns/op 3,324 ± 0,160 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10_GROUPS avgt 15 8,796 ± 0,176 ns/op 3,321 ± 0,028 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_1K_GROUPS avgt 15 5,854 ± 0,124 ns/op 3,579 ± 0,030 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10K_GROUPS avgt 15 6,427 ± 0,074 ns/op 4,353 ± 0,112 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_100K_GROUPS avgt 15 7,498 ± 0,182 ns/op 5,162 ± 0,145 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_1M_GROUPS avgt 15 10,800 ± 0,585 ns/op 8,169 ± 0,059 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10M_GROUPS avgt 15 12,616 ± 0,383 ns/op 9,964 ± 0,106 ns/op

Instead of iterating over groups and jumping all over the hash table, we iterate over the hash table itself minimizing random memory access. The change is done only in BigintGroupByHash. MultiChannelGroupByHash is already properly implemented Before BenchmarkGroupByHashOnSimulatedData.groupBy 0 BIGINT_1M_GROUPS ADD avgt 30 26,855 ± 0,359 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy 0 BIGINT_10M_GROUPS ADD avgt 30 79,430 ± 0,678 ns/op After BenchmarkGroupByHashOnSimulatedData.groupBy 0 BIGINT_1M_GROUPS ADD avgt 30 25,910 ± 0,342 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy 0 BIGINT_10M_GROUPS ADD avgt 30 63,016 ± 0,748 ns/op

Benchmarks: For ADD work type changes are minimal. For GET_GROUPS some regressions are present, but overall gain is about 5% Before: BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 6,251 ± 0,146 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 6,602 ± 0,400 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 6,306 ± 0,117 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 6,786 ± 0,469 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 5,579 ± 0,142 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 19,389 ± 0,201 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 4,462 ± 0,139 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 4,317 ± 0,065 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 6,481 ± 1,010 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 4,687 ± 0,117 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 4,389 ± 0,038 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 21,026 ± 0,405 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 4,340 ± 0,170 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 5,991 ± 0,786 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 4,490 ± 0,100 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 4,712 ± 0,119 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 4,508 ± 0,072 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 17,024 ± 0,433 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 4,547 ± 0,155 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 4,402 ± 0,150 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 5,282 ± 0,808 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 4,792 ± 0,096 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 4,599 ± 0,126 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 9,405 ± 0,142 ns/op After: BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 4,791 ± 0,146 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 4,487 ± 0,400 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 4,562 ± 0,117 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 5,114 ± 0,469 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 4,351 ± 0,142 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS 0 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 19,531 ± 0,201 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 5,286 ± 0,139 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 5,521 ± 0,065 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 4,576 ± 1,010 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 5,014 ± 0,117 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 3,809 ± 0,038 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .1 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 20,453 ± 0,405 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 5,184 ± 0,170 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 5,400 ± 0,786 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 4,664 ± 0,100 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 5,213 ± 0,119 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 3,849 ± 0,072 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .5 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 17,651 ± 0,433 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_1_SMALL_DICTIONARY avgt 30 5,612 ± 0,155 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_1_BIG_DICTIONARY avgt 30 5,487 ± 0,150 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_MULTIPLE_SMALL_DICTIONARY avgt 30 4,699 ± 0,808 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_2_GROUPS_MULTIPLE_BIG_DICTIONARY avgt 30 5,118 ± 0,096 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10K_GROUPS_1_DICTIONARY avgt 30 3,621 ± 0,126 ns/op BenchmarkGroupByHashOnSimulatedData.groupBy GET_GROUPS .9 BIGINT_10K_GROUPS_MULTIPLE_DICTIONARY avgt 30 8,173 ± 0,142 ns/op

sopel39 · 2022-04-29T11:29:18Z

@skrzypo987 should we close this?

skrzypo987 · 2022-04-29T12:08:16Z

yep. Nothing more to do here

cla-bot bot added the cla-signed label Feb 14, 2022

skrzypo987 requested review from lukasz-stec and sopel39 February 14, 2022 09:01

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 088029d to cb2031e Compare February 14, 2022 10:31

lukasz-stec reviewed Feb 14, 2022

View reviewed changes

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch 2 times, most recently from 6eab0c5 to d1d7a9b Compare February 16, 2022 08:29

skrzypo987 commented Feb 16, 2022

View reviewed changes

skrzypo987 mentioned this pull request Feb 16, 2022

Aggregation improvements #10965

Merged

lukasz-stec approved these changes Feb 16, 2022

View reviewed changes

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from d1d7a9b to 6fc6ed9 Compare February 16, 2022 14:51

lukasz-stec approved these changes Feb 16, 2022

View reviewed changes

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 6fc6ed9 to cf181c0 Compare February 21, 2022 09:36

skrzypo987 requested review from sopel39 and removed request for sopel39 February 22, 2022 16:12

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch 2 times, most recently from 3e95042 to a924346 Compare February 28, 2022 14:55

sopel39 reviewed Mar 2, 2022

View reviewed changes

skrzypo987 commented Mar 3, 2022

View reviewed changes

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from a924346 to 304d70e Compare March 3, 2022 08:38

sopel39 requested a review from raunaqmorarka March 9, 2022 11:39

sopel39 reviewed Mar 9, 2022

View reviewed changes

skrzypo987 mentioned this pull request Mar 9, 2022

Replace BigArrays with primitive ones in BigInt aggregation #11392

Merged

sopel39 reviewed Mar 9, 2022

View reviewed changes

raunaqmorarka reviewed Mar 10, 2022

View reviewed changes

skrzypo987 requested a review from lukasz-stec March 11, 2022 11:59

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 38a5ddb to d7fd0c7 Compare March 14, 2022 07:39

lukasz-stec reviewed Mar 15, 2022

View reviewed changes

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from d7fd0c7 to 35350cb Compare March 15, 2022 10:11

sopel39 reviewed Mar 15, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java Outdated Show resolved Hide resolved

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 35350cb to 658cee3 Compare March 16, 2022 13:07

sopel39 reviewed Mar 17, 2022

View reviewed changes

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHashOnSimulatedData.java Outdated Show resolved Hide resolved

sopel39 reviewed Mar 17, 2022

View reviewed changes

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 658cee3 to 71afe0a Compare March 22, 2022 12:52

skrzypo987 commented Mar 22, 2022

View reviewed changes

sopel39 reviewed Mar 25, 2022

View reviewed changes

Add new benchmark for group by hash

b030220

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch 2 times, most recently from d721ef8 to 843d86d Compare April 15, 2022 17:36

skrzypo987 requested review from sopel39 and radek-kondziolka April 15, 2022 17:46

sopel39 approved these changes Apr 19, 2022

View reviewed changes

skrzypo987 added 3 commits April 20, 2022 11:49

skrzypo987 force-pushed the skrzypo/055-long-aggregation branch from 843d86d to 8878eaf Compare April 20, 2022 10:04

skrzypo987 closed this Apr 29, 2022

Long aggregation improvements #11031

Long aggregation improvements #11031

Conversation

skrzypo987 commented Feb 14, 2022

Description

General information

Related issues, pull requests, and links

Documentation

Release notes

lukasz-stec left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skrzypo987 commented Feb 15, 2022

skrzypo987 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukasz-stec left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skrzypo987 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 commented Mar 9, 2022

Choose a reason for hiding this comment

raunaqmorarka Mar 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukasz-stec left a comment

Choose a reason for hiding this comment

skrzypo987 commented Mar 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 Mar 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raunaqmorarka Mar 10, 2022 •

edited

Loading

sopel39 Mar 15, 2022 •

edited

Loading

sopel39 Mar 17, 2022 •

edited

Loading