Fine grained manipulation on hashing partitioned data #4631

fuzhe1989 · 2022-04-12T08:12:02Z

Enhancement

We know that some operators (agg, join, etc.) could benefit from prehashing. If each inputstream belongs to one hash bucket:

For agg, we do not need the merging phase, each thread could directly generate the final blocks.
For build side of join, we do not need any lock and the resizing of hash table could only has influence to one thread.
For probe side of join, it could start as soon as its corresponding build job ends, not the whole build phase.
For exchange sender, we do not need to partition the data again, just directly send them.

Note that all the potential enhancements requires:

Use the same (or compatible) hash function.
The prehash key is the super set of the sink hash key.

So how to do prehashing? Here are some ideas:

For a hash partition table, just let each inputstream corresponds to one partition (N:1 or 1:1 relation). The key is how to carry hash info along with inputstream.
For large data volume, the agg will merge data into 256 buckets, however it then discards the hashing info, merging them into one inputstream. The upper operators could benefit from disclosing the hashing info from agg.
We could insert some virtual hash partition operators near to TableScan and Exchange operators (which are leaves of a MPP task) to generate inputstreams with hashing info.

Especially, for Exchange, ExchangeSender is a better place to partition data than ExchangeReceiver: partitioning is already part of ExchangeSender's job.

Note: blindly split data at ExchangeSender could make the packet smaller and weaken the effect of vectorization. A previous demo showed that blindly splitting will decrease the performance (using current implementation), while well-designed batching will increase the performance a lot.

Subtasks for window function:

tiflash code: Fine grained shuffle for window function #5142
kvproto: tiflash: add fine_grained_shuffle related field kvproto#921
enable fine grained shuffle when dispatching MPPTasks: support fine grained shuffle when dispatching MPPTasks tidb#35342
microbenchmark for fine grained shuffle: add microbenchmark for exchange and window function #5138
design doc: add fine-grained shuffle design doc #5149

Subtasks for HashAgg/Join:

HashAgg + fine-grained shuffle

fuzhe1989 · 2022-04-12T08:52:13Z

example: table scan and two-phase aggregation.

tablescan -> partial-agg -> exchange-sender -> exchange-receiver -> final-agg

We know that even a partial-agg (from TiDB's point of view), in TiFlash its execution is divided into two phases. Here we could perform two optimizations:

Let partial-agg output K inputstreams where K is the partition number of exchange-sender. Based on the fact that partital-agg already aggregates data into 256 buckets by hashing, the following aggregation of buckets to inputstreams has little cost. Then exchange-sender won't have to partition the data again, what it needs to do is pack blocks into packets and send them out.
Prepartition data of each connection into P partitions at exchange-sender, then exchange-receiver makes P inputstreams each for one partition. This optimization allows the final-agg eliminates its merge phase.

close #4631

close pingcap#4631

fuzhe1989 added the type/enhancement The issue or PR belongs to an enhancement. label Apr 12, 2022

This was referenced Jun 8, 2022

support fine grained shuffle for window function #5048

Merged

support fine grained shuffle when dispatching MPPTasks pingcap/tidb#35342

Closed

Fine grained shuffle for window function #5142

Closed

add fine-grained shuffle design doc #5149

Merged

ti-chi-bot closed this as completed in #5149 Jul 13, 2022

ti-chi-bot pushed a commit that referenced this issue Jul 13, 2022

add fine-grained shuffle design doc (#5149)

ab8b350

close #4631

guo-shaoge reopened this Jul 14, 2022

Lloyd-Pottiger pushed a commit to Lloyd-Pottiger/tiflash that referenced this issue Jul 19, 2022

add fine-grained shuffle design doc (pingcap#5149)

fba2271

close pingcap#4631

yibin87 mentioned this issue Oct 18, 2022

Join & Aggregation Fine Grained Partition Optimization #6157

Closed

6 tasks

fuzhe1989 closed this as completed Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine grained manipulation on hashing partitioned data #4631

Fine grained manipulation on hashing partitioned data #4631

fuzhe1989 commented Apr 12, 2022 •

edited by guo-shaoge

Loading

fuzhe1989 commented Apr 12, 2022

Fine grained manipulation on hashing partitioned data #4631

Fine grained manipulation on hashing partitioned data #4631

Comments

fuzhe1989 commented Apr 12, 2022 • edited by guo-shaoge Loading

Enhancement

fuzhe1989 commented Apr 12, 2022

fuzhe1989 commented Apr 12, 2022 •

edited by guo-shaoge

Loading