Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partition by constructs in window functions and modify logical planning #501

Merged
merged 2 commits into from
Jun 9, 2021

Conversation

jimexist
Copy link
Member

@jimexist jimexist commented Jun 4, 2021

Which issue does this PR close?

Add partition by constructs and modify logical planning.

Partly contributes to #299
Based on #492 so review that first

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@codecov-commenter
Copy link

codecov-commenter commented Jun 4, 2021

Codecov Report

Merging #501 (595fe99) into master (8495f95) will decrease coverage by 0.06%.
The diff coverage is 57.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #501      +/-   ##
==========================================
- Coverage   76.09%   76.03%   -0.07%     
==========================================
  Files         157      157              
  Lines       26913    26990      +77     
==========================================
+ Hits        20480    20521      +41     
- Misses       6433     6469      +36     
Impacted Files Coverage Δ
...sta/rust/core/src/serde/logical_plan/from_proto.rs 35.20% <0.00%> (-0.21%) ⬇️
...lista/rust/core/src/serde/logical_plan/to_proto.rs 61.34% <0.00%> (-0.31%) ⬇️
...ta/rust/core/src/serde/physical_plan/from_proto.rs 37.71% <0.00%> (-0.80%) ⬇️
datafusion/src/optimizer/utils.rs 45.63% <0.00%> (-2.42%) ⬇️
datafusion/src/logical_plan/expr.rs 84.28% <70.00%> (-0.28%) ⬇️
datafusion/src/sql/planner.rs 84.62% <81.57%> (+0.11%) ⬆️
datafusion/src/sql/utils.rs 68.85% <88.23%> (+1.23%) ⬆️
datafusion/src/logical_plan/plan.rs 81.50% <100.00%> (+0.43%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8495f95...595fe99. Read the comment docs.

@jimexist jimexist marked this pull request as ready for review June 4, 2021 15:28
@jimexist jimexist changed the title Add partition by constructs and modify logical planning Add partition by constructs and modify logical planning Jun 4, 2021
@jimexist jimexist changed the title Add partition by constructs and modify logical planning Add partition by constructs in window functions and modify logical planning Jun 4, 2021
@jimexist jimexist force-pushed the add-partition-by branch from 1912ec6 to 8bf1ca2 Compare June 4, 2021 15:44
let sql = "SELECT order_id, MAX(qty) OVER (PARTITION BY order_id) from orders";
let expected = "\
Projection: #order_id, #MAX(qty)\
\n WindowAggr: windowExpr=[[MAX(#qty)]] partitionBy=[]\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like order_id should appear in the partitionBy list

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me remove the list printing altogether for now to keep parity with psql

I can always add it back later but for now the partition information is already captured by the child sort by plan

@alamb alamb added the datafusion Changes in the datafusion crate label Jun 4, 2021
@jimexist jimexist force-pushed the add-partition-by branch 2 times, most recently from b6c6023 to c2bfd60 Compare June 7, 2021 12:03
@jimexist jimexist force-pushed the add-partition-by branch from 67c3ac8 to 1a96fd7 Compare June 7, 2021 15:08
@alamb
Copy link
Contributor

alamb commented Jun 7, 2021

This is on my review queue for tomorrow

@jimexist jimexist force-pushed the add-partition-by branch from 1a96fd7 to afd174d Compare June 7, 2021 23:30
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jimexist -- I think this PR looks reasonable. I had two suggestions I think are worth considering (handling errors in planner and including PARTITION BY in the explain plan)

I think it would also be fine to make those changes in a follow on PR if that would be easier for you

datafusion/src/optimizer/utils.rs Outdated Show resolved Hide resolved
datafusion/src/optimizer/utils.rs Outdated Show resolved Hide resolved
.map(|window_frame| window_frame.clone().try_into())
.transpose()?;
let fun = window_functions::WindowFunction::from_str(&name);
if let Ok(window_functions::WindowFunction::AggregateFunction(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misreading this, but it looks like the error message may have gotten lost.

Stylistically, this might be cleaner using a match rather than an if/else chain (and the compiler will tell you if you missed a case) something like

let fun = window_functions::WindowFunction::from_str(&name)?; // note question mark
use window_functions::WindowFunction::*;

match fun {
  AggregateFunction(aggregate_fun) =>{ .. code .. },
  BuiltInWindowFunction(window_fun) =>{ .. code .. },
}

Perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a good point. i was previously thinking maybe deferring the else the subsequent parsing part but then it's with over clause so the allowed listed of function names shall be known to be limited.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is cool now to see the compiler ensuring all cases are covered.

format!("{:?}", err)
);
fn over_partition_by() {
let sql = "SELECT order_id, MAX(qty) OVER (PARTITION BY order_id) from orders";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think including the PARTITION BY information somewhere in this plan would be valuable -- maybe it could be added to WindowExpr formatting?

I may be missing a reason to not include it as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it can be added in a subsequent PR, where i can revisit all the planning related printing (to be less verbose perhaps)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's address in #526

@jimexist jimexist force-pushed the add-partition-by branch from 1b38ae2 to b9d943e Compare June 9, 2021 01:04
@alamb alamb merged commit d5bca0e into apache:master Jun 9, 2021
@jimexist jimexist deleted the add-partition-by branch June 10, 2021 01:49
@houqp houqp added ballista enhancement New feature or request labels Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants