-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Part1] Partition and Sort Enforcement, PhysicalExpr enhancement #3969
Conversation
@alamb @tustvold @yahoNanJing |
I plan to review this carefully tomorrow |
@alamb Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good @mingmwang -- thank you!
I think the only thing that needs changing / explaining is the seemingly missing code in ScalarFunctionExpr::PartialEq
and I left some style comments / suggestions elsewhere
For other reviewers, you can see the larger context of what this PR enables in #3855
cc @isidentical as you have been working in this area
cc @yahoNanJing @Dandandan @avantgardnerio @andygrove and @thinkharderdev for your awareness
@@ -66,6 +67,14 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug { | |||
fn expr_stats(&self) -> Arc<dyn PhysicalExprStats> { | |||
Arc::new(BasicExpressionStats {}) | |||
} | |||
/// Get a list of child PhysicalExpr that provide the input for this expr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 these are foundational changes needed to do PhysicalExpr rewrites
self: Arc<Self>, | ||
children: Vec<Arc<dyn PhysicalExpr>>, | ||
) -> Result<Arc<dyn PhysicalExpr>> { | ||
Ok(Arc::new(BinaryExpr::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we should also assert_eq!(children.len(), 2)
to catch bugs earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already add the length check in the method with_new_children_if_necessary()
, and expr.with_new_children() should not be called directly. This is also aligned with the ExecutionPlan
.
pub fn with_new_children_if_necessary(
expr: Arc<dyn PhysicalExpr>,
children: Vec<Arc<dyn PhysicalExpr>>,
) -> Result<Arc<dyn PhysicalExpr>> {
if children.len() != expr.children().len() {
Err(DataFusionError::Internal(
"PhysicalExpr: Wrong number of children".to_string(),
))
} else if children.is_empty()
|| children
.iter()
.zip(expr.children().iter())
.any(|(c1, c2)| !Arc::ptr_eq(c1, c2))
{
expr.with_new_children(children)
} else {
Ok(expr)
}
}
chileren | ||
} | ||
|
||
// For physical CaseExpr, we do not allow modifying children size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Some(_) => None, | ||
_ => Some(children[0].clone()), | ||
}; | ||
let else_expr = match children[children.len() - 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would find it easier to read if the else_expr
handling was after the when_then_expr
handling to mirror the children()
but I think this is fine too
/// | ||
/// The nodes are visited using the following order | ||
/// ```text | ||
/// pre_visit(ParentNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these examples need to be change to use the terms transform_down
and transform_up
and transform
I also think it would be nice to use the same terms here as are used in the rewriter (previsit / postvisit) but I realize they don't quite match up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I keep the current naming ? As you had mentioned, they do not match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make sure the doc comments and code match -- the naming of the code is fine with me (for the reasons you mention)
|
||
impl PartialEq<dyn Any> for ScalarFunctionExpr { | ||
fn eq(&self, _other: &dyn Any) -> bool { | ||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems incorrect / unfinshed -- shouldn't it be checking the contents of the other
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave it here because I don't know how to compare fun
member, its type is the boxed Closure.
Or can we skip the fun
member and just compare name
, args
and return_type
?
pub struct ScalarFunctionExpr {
fun: ScalarFunctionImplementation,
name: String,
args: Vec<Arc<dyn PhysicalExpr>>,
return_type: DataType,
}
pub type ScalarFunctionImplementation =
Arc<dyn Fn(&[ColumnarValue]) -> Result<ColumnarValue> + Send + Sync>;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing name, args and return type seems like a pretty good comparison to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
@@ -128,6 +137,232 @@ impl PhysicalExprStats for BasicExpressionStats { | |||
} | |||
} | |||
|
|||
/// a Trait for marking tree node types that are rewritable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend putting the rewriting traits into some other module, such as datafusion/physical-expr/src/rewrite.rs
or example
We can do it as a follow on PR too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I can change it in this PR also.
/// For example two InListExpr can be considered to be equals no matter the order: | ||
/// | ||
/// In('a','b','c') == In('c','b','a') | ||
pub fn expr_list_eq_any_order( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it could be in a utils or some other module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, currently there is no utils, I will add one. In the following PRs more util methods will be added.
…nto issue-3854-part1
d233c69
to
cb941b2
Compare
@alamb Please help to take a look. |
@alamb Could you please approve and merge the PR? Then I can work on the following parts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good to me -- thank you @mingmwang
Any concerns about merging this in @yahoNanJing @andygrove @liukun4515 @Dandandan @isidentical @avantgardnerio ? Does anyone want longer to review it? |
.map(|x| { | ||
self.expr.eq(&x.expr) | ||
&& self.cast_type == x.cast_type | ||
&& self.cast_options.safe == x.cast_options.safe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a bit error-prone, considering that cast_options
might get new members and this part will not count them. Can we consider adding equality comparison support to CastOptions
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, it seems like we can't change it here. Created a ticket on arrow-rs.
assert!(expr_list_eq_any_order(list4.as_slice(), list3.as_slice())); | ||
assert!(expr_list_eq_any_order(list3.as_slice(), list3.as_slice())); | ||
assert!(expr_list_eq_any_order(list4.as_slice(), list4.as_slice())); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I didn't miss any, it seems like expr_list_eq_strict_order
can be also tested with these (it currently seems like missing tests):
assert!(expr_list_eq_strict_order(list1.as_slice(), list1.as_slice()); | |
assert!(expr_list_eq_strict_order(list2.as_slice(), list2.as_slice()); | |
assert!(expr_list_eq_strict_order(list3.as_slice(), list3.as_slice()); | |
assert!(expr_list_eq_strict_order(list4.as_slice(), list4.as_slice()); |
Only a minor testing suggestion, but other than that at least a brief overview of the general logic on comparisons and utilities looks great to me. |
No problems merging this in. The overall goal seems great to me, but not much time now to go over the code. |
Will plan to merge this in tomorrow unless I hear otherwise -- if @mingmwang hasn't had a chance, we can do the changes as a follow on PR |
I am merging this PR in as it has already been around for a while and is blocking work. @mingmwang can you please respond to @isidentical 's comment #3969 (comment) and submit another PR with additional tests, if appropriate? |
.downcast_ref::<Self>() | ||
.map(|x| { | ||
self.expr.eq(&x.expr) | ||
&& self.cast_type == x.cast_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a todo note in #4012
Benchmark runs are scheduled for baseline = 925a962 and contender = 6fae0bd. 6fae0bd is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
@alamb @isidentical |
…che#3969) * [Part1] Partition and Sort Enforcement, PhysicalExpr enhancement * Resolve review comments * tiny fix to doc
…che#3969) * [Part1] Partition and Sort Enforcement, PhysicalExpr enhancement * Resolve review comments * tiny fix to doc
Which issue does this PR close?
Partially Closes #3854. This PR covers the PhysicalExpr related enhancement.
Rationale for this change
What changes are included in this PR?
PartialEq<dyn Any>
forPhysicalExpr
PhysicalExpr::children()
andPhysicalExpr::new_with_children()
TreeNodeRewritable
Trait forPhysicalExpr
rewritingAre there any user-facing changes?