-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support window functions with PARTITION BY clause #299
Comments
We'd like to support window function in three or more steps:
|
@Dandandan I don't think it's that straightforward. when using given this, we can parallel inter-partitions if we make sure all rows within a partition is processed together with order preserved, but that also means splitting and deciding on number of partitions requires knowledge of the actual data (i.e. dependencies on |
No, but AFAIK you can pre-partition based on the partition expression, like for example we do for hash joins. You have to execute the partition too in the implementation of the window functions, but each partition has all of the equal partition values after doing a hash repartition. So a |
i wonder why postgres decided to use sort instead of hash partition regardless
maybe most likely due to code reuse? |
PostgreSQL uses a minimal amount of multithreading, as it is designed mostly for transactional processing (OLTP) on smaller datasets. For execution on one thread, doing extra work would slow it down a bit, so it would be better to not use that at all. For hash join we do the same too, the partitioning is only applied when I believe e.g. Spark always does a partitioning based on partition by, which makes it execute much faster / scalable in the presence of a partition by clause as each worker/thread can execute each part individually. |
that's a valid point. i guess here my plan is / continues to be:
|
That sounds like a great plan @jimexist . I agree, let's tackle correctness / completeness first, then try to improve performance based on some benchmarks 👍 |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Window functions have a
PARTITION BY
clause to split the data in partitions and calculate window functions over those partitions individually.Describe the solution you'd like
We can use
Repartition::Hash
to parallelize the execution.Describe alternatives you've considered
n/a
Additional context
http://www.vldb.org/pvldb/vol8/p1058-leis.pdf&ved=2ahUKEwj80-3OjrfwAhUJPOwKHfdRAssQFjAMegQIEhAC&usg=AOvVaw2KKUPeYhyc-pEFTmlqyboj
The text was updated successfully, but these errors were encountered: