-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine grained manipulation on hashing partitioned data #4631
Labels
type/enhancement
The issue or PR belongs to an enhancement.
Comments
example: table scan and two-phase aggregation.
We know that even a partial-agg (from TiDB's point of view), in TiFlash its execution is divided into two phases. Here we could perform two optimizations:
|
This was referenced Jun 8, 2022
ti-chi-bot
pushed a commit
that referenced
this issue
Jul 13, 2022
Lloyd-Pottiger
pushed a commit
to Lloyd-Pottiger/tiflash
that referenced
this issue
Jul 19, 2022
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Enhancement
We know that some operators (agg, join, etc.) could benefit from prehashing. If each inputstream belongs to one hash bucket:
Note that all the potential enhancements requires:
So how to do prehashing? Here are some ideas:
Especially, for Exchange, ExchangeSender is a better place to partition data than ExchangeReceiver: partitioning is already part of ExchangeSender's job.
Note: blindly split data at ExchangeSender could make the packet smaller and weaken the effect of vectorization. A previous demo showed that blindly splitting will decrease the performance (using current implementation), while well-designed batching will increase the performance a lot.
Subtasks for window function:
Subtasks for HashAgg/Join:
The text was updated successfully, but these errors were encountered: