Support window functions with PARTITION BY clause #299

Dandandan · 2021-05-09T09:44:53Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Window functions have a PARTITION BY clause to split the data in partitions and calculate window functions over those partitions individually.

Describe the solution you'd like
We can use Repartition::Hash to parallelize the execution.

Describe alternatives you've considered
n/a

Additional context
http://www.vldb.org/pvldb/vol8/p1058-leis.pdf&ved=2ahUKEwj80-3OjrfwAhUJPOwKHfdRAssQFjAMegQIEhAC&usg=AOvVaw2KKUPeYhyc-pEFTmlqyboj

The text was updated successfully, but these errors were encountered:

jimexist · 2021-05-19T15:45:48Z

We'd like to support window function in three or more steps:

Support window functions with basic logical planning and physical planning #359 basic structure
Support window functions with empty OVER clause #298 empty over clause
Support window functions with PARTITION BY clause #299 with partition clause (this one)
Support window functions with order by #360 with order by
Support window functions with window frame #361 with window frame

jimexist · 2021-06-11T09:57:16Z

We can use Repartition::Hash to parallelize the execution.

@Dandandan I don't think it's that straightforward.

when using Repartition we can hash M partitions into N partitions if the aggregation functions can be idempotent, meaning for e.g. accumulator avg it's easy to combine two aggregated results into one ((sum_a + sum_b) / (count_a + count_b)). this is not the same case with window functions, esp. when the function relies on relative positions and sliding windows - you can't just simply split the partition into two and combine the results back into one.

given this, we can parallel inter-partitions if we make sure all rows within a partition is processed together with order preserved, but that also means splitting and deciding on number of partitions requires knowledge of the actual data (i.e. dependencies on &RecordBatch). intra-partition parallelism is even harder because window frame when present can mean serial relationship between rows and it's inherently sequential.

Dandandan · 2021-06-11T10:28:47Z

@jimexist

No, but AFAIK you can pre-partition based on the partition expression, like for example we do for hash joins.

You have to execute the partition too in the implementation of the window functions, but each partition has all of the equal partition values after doing a hash repartition.

So a HashPartition(partition_by_expr) -> Window(partition_by_expr, order_by) (per partition), should be the same as Window(partition_by_expr, order_by) (on 1 partition)

jimexist · 2021-06-11T10:48:23Z

@jimexist

No, but AFAIK you can pre-partition based on the partition expression, like for example we do for hash joins.

You have to execute the partition too in the implementation of the window functions, but each partition has all of the equal partition values after doing a hash repartition.

So a HashPartition(partition_by_expr) -> Window(partition_by_expr, order_by) (per partition), should be the same as Window(partition_by_expr, order_by) (on 1 partition)

i wonder why postgres decided to use sort instead of hash partition regardless

# explain select max(c2) over (partition by c3) from test;
                            QUERY PLAN
-------------------------------------------------------------------
 WindowAgg  (cost=44.96..55.81 rows=620 width=6)
   ->  Sort  (cost=44.96..46.51 rows=620 width=6)
         Sort Key: c3
         ->  Seq Scan on test  (cost=0.00..16.20 rows=620 width=6)
(4 rows)

maybe most likely due to code reuse?

Dandandan · 2021-06-11T11:31:24Z

@jimexist
No, but AFAIK you can pre-partition based on the partition expression, like for example we do for hash joins.
You have to execute the partition too in the implementation of the window functions, but each partition has all of the equal partition values after doing a hash repartition.
So a HashPartition(partition_by_expr) -> Window(partition_by_expr, order_by) (per partition), should be the same as Window(partition_by_expr, order_by) (on 1 partition)

i wonder why postgres decided to use sort instead of hash partition regardless
# explain select max(c2) over (partition by c3) from test;
                            QUERY PLAN
-------------------------------------------------------------------
 WindowAgg  (cost=44.96..55.81 rows=620 width=6)
   ->  Sort  (cost=44.96..46.51 rows=620 width=6)
         Sort Key: c3
         ->  Seq Scan on test  (cost=0.00..16.20 rows=620 width=6)
(4 rows)
maybe most likely due to code reuse?

PostgreSQL uses a minimal amount of multithreading, as it is designed mostly for transactional processing (OLTP) on smaller datasets. For execution on one thread, doing extra work would slow it down a bit, so it would be better to not use that at all. For hash join we do the same too, the partitioning is only applied when concurrency>1.
Only for really big tables / costly queries PostgreSQL will opt to use multiple workers (which will be visible in the query plan), but not sure whether it will even use hash repartitioning in that case.

I believe e.g. Spark always does a partitioning based on partition by, which makes it execute much faster / scalable in the presence of a partition by clause as each worker/thread can execute each part individually.

jimexist · 2021-06-11T11:58:25Z

@jimexist
No, but AFAIK you can pre-partition based on the partition expression, like for example we do for hash joins.
You have to execute the partition too in the implementation of the window functions, but each partition has all of the equal partition values after doing a hash repartition.
So a HashPartition(partition_by_expr) -> Window(partition_by_expr, order_by) (per partition), should be the same as Window(partition_by_expr, order_by) (on 1 partition)

i wonder why postgres decided to use sort instead of hash partition regardless
# explain select max(c2) over (partition by c3) from test;
                            QUERY PLAN
-------------------------------------------------------------------
 WindowAgg  (cost=44.96..55.81 rows=620 width=6)
   ->  Sort  (cost=44.96..46.51 rows=620 width=6)
         Sort Key: c3
         ->  Seq Scan on test  (cost=0.00..16.20 rows=620 width=6)
(4 rows)
maybe most likely due to code reuse?
PostgreSQL uses a minimal amount of multithreading, as it is designed mostly for transactional processing (OLTP) on smaller datasets. For execution on one thread, doing extra work would slow it down a bit, so it would be better to not use that at all. For hash join we do the same too, the partitioning is only applied when concurrency>1.
Only for really big tables / costly queries PostgreSQL will opt to use multiple workers (which will be visible in the query plan), but not sure whether it will even use hash repartitioning in that case.

I believe e.g. Spark always does a partitioning based on partition by, which makes it execute much faster / scalable in the presence of a partition by clause as each worker/thread can execute each part individually.

that's a valid point. i guess here my plan is / continues to be:

implement a correct version using global sort that covers partition and order by and window frame
setup integration tests that compare results
come up with a more realistic benchmark dataset that's much larger than 100 rows
migrate to use repartition for inter-partition parallism

Dandandan · 2021-06-12T13:09:48Z

That sounds like a great plan @jimexist .

I agree, let's tackle correctness / completeness first, then try to improve performance based on some benchmarks 👍

Dandandan added the enhancement New feature or request label May 9, 2021

Dandandan changed the title ~~Support window functions with empty PARTITION BY clause~~ Support window functions with PARTITION BY clause May 9, 2021

jimexist mentioned this issue May 19, 2021

Support window functions with order by #360

Closed

2 tasks

This was referenced Jun 4, 2021

Add partition by constructs in window functions and modify logical planning #501

Merged

Refactor window aggregation, simplify batch processing logic #516

Merged

This was referenced Jun 13, 2021

parallelize window function evaluations #546

Closed

Implement window functions with partition_by clause #558

Merged

Add meaningful and realistic benchmark suites for window functions #565

Closed

alamb closed this as completed in #558 Jun 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support window functions with PARTITION BY clause #299

Support window functions with PARTITION BY clause #299

Dandandan commented May 9, 2021

jimexist commented May 19, 2021 •

edited

Loading

jimexist commented Jun 11, 2021 •

edited

Loading

Dandandan commented Jun 11, 2021

jimexist commented Jun 11, 2021 •

edited

Loading

Dandandan commented Jun 11, 2021 •

edited

Loading

jimexist commented Jun 11, 2021 •

edited

Loading

Dandandan commented Jun 12, 2021 •

edited

Loading

Support window functions with PARTITION BY clause #299

Support window functions with PARTITION BY clause #299

Comments

Dandandan commented May 9, 2021

jimexist commented May 19, 2021 • edited Loading

jimexist commented Jun 11, 2021 • edited Loading

Dandandan commented Jun 11, 2021

jimexist commented Jun 11, 2021 • edited Loading

Dandandan commented Jun 11, 2021 • edited Loading

jimexist commented Jun 11, 2021 • edited Loading

Dandandan commented Jun 12, 2021 • edited Loading

jimexist commented May 19, 2021 •

edited

Loading

jimexist commented Jun 11, 2021 •

edited

Loading

jimexist commented Jun 11, 2021 •

edited

Loading

Dandandan commented Jun 11, 2021 •

edited

Loading

jimexist commented Jun 11, 2021 •

edited

Loading

Dandandan commented Jun 12, 2021 •

edited

Loading