Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Part1] Partition and Sort Enforcement, PhysicalExpr enhancement #3969

Merged
merged 5 commits into from
Oct 29, 2022

Conversation

mingmwang
Copy link
Contributor

@mingmwang mingmwang commented Oct 26, 2022

Which issue does this PR close?

Partially Closes #3854. This PR covers the PhysicalExpr related enhancement.

Rationale for this change

What changes are included in this PR?

  1. Implement PartialEq<dyn Any> for PhysicalExpr
  2. Add PhysicalExpr::children() and PhysicalExpr::new_with_children()
  3. Add TreeNodeRewritable Trait for PhysicalExpr rewriting

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Physical Expressions label Oct 26, 2022
@mingmwang
Copy link
Contributor Author

@alamb @tustvold @yahoNanJing
Please help to review.

@alamb
Copy link
Contributor

alamb commented Oct 26, 2022

I plan to review this carefully tomorrow

@mingmwang
Copy link
Contributor Author

@alamb Thanks

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good @mingmwang -- thank you!

I think the only thing that needs changing / explaining is the seemingly missing code in ScalarFunctionExpr::PartialEq and I left some style comments / suggestions elsewhere

For other reviewers, you can see the larger context of what this PR enables in #3855

cc @isidentical as you have been working in this area
cc @yahoNanJing @Dandandan @avantgardnerio @andygrove and @thinkharderdev for your awareness

@@ -66,6 +67,14 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug {
fn expr_stats(&self) -> Arc<dyn PhysicalExprStats> {
Arc::new(BasicExpressionStats {})
}
/// Get a list of child PhysicalExpr that provide the input for this expr.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 these are foundational changes needed to do PhysicalExpr rewrites

self: Arc<Self>,
children: Vec<Arc<dyn PhysicalExpr>>,
) -> Result<Arc<dyn PhysicalExpr>> {
Ok(Arc::new(BinaryExpr::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we should also assert_eq!(children.len(), 2) to catch bugs earlier

Copy link
Contributor Author

@mingmwang mingmwang Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already add the length check in the method with_new_children_if_necessary(), and expr.with_new_children() should not be called directly. This is also aligned with the ExecutionPlan.

pub fn with_new_children_if_necessary(
    expr: Arc<dyn PhysicalExpr>,
    children: Vec<Arc<dyn PhysicalExpr>>,
) -> Result<Arc<dyn PhysicalExpr>> {
    if children.len() != expr.children().len() {
        Err(DataFusionError::Internal(
            "PhysicalExpr: Wrong number of children".to_string(),
        ))
    } else if children.is_empty()
        || children
            .iter()
            .zip(expr.children().iter())
            .any(|(c1, c2)| !Arc::ptr_eq(c1, c2))
    {
        expr.with_new_children(children)
    } else {
        Ok(expr)
    }
}

datafusion/physical-expr/src/expressions/case.rs Outdated Show resolved Hide resolved
chileren
}

// For physical CaseExpr, we do not allow modifying children size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Some(_) => None,
_ => Some(children[0].clone()),
};
let else_expr = match children[children.len() - 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would find it easier to read if the else_expr handling was after the when_then_expr handling to mirror the children() but I think this is fine too

///
/// The nodes are visited using the following order
/// ```text
/// pre_visit(ParentNode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these examples need to be change to use the terms transform_down and transform_up and transform

I also think it would be nice to use the same terms here as are used in the rewriter (previsit / postvisit) but I realize they don't quite match up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I keep the current naming ? As you had mentioned, they do not match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make sure the doc comments and code match -- the naming of the code is fine with me (for the reasons you mention)

datafusion/physical-expr/src/physical_expr.rs Outdated Show resolved Hide resolved

impl PartialEq<dyn Any> for ScalarFunctionExpr {
fn eq(&self, _other: &dyn Any) -> bool {
false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect / unfinshed -- shouldn't it be checking the contents of the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave it here because I don't know how to compare fun member, its type is the boxed Closure.
Or can we skip the fun member and just compare name, args and return_type ?


pub struct ScalarFunctionExpr {
    fun: ScalarFunctionImplementation,
    name: String,
    args: Vec<Arc<dyn PhysicalExpr>>,
    return_type: DataType,
}


pub type ScalarFunctionImplementation =
    Arc<dyn Fn(&[ColumnarValue]) -> Result<ColumnarValue> + Send + Sync>;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing name, args and return type seems like a pretty good comparison to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@@ -128,6 +137,232 @@ impl PhysicalExprStats for BasicExpressionStats {
}
}

/// a Trait for marking tree node types that are rewritable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend putting the rewriting traits into some other module, such as datafusion/physical-expr/src/rewrite.rs or example

We can do it as a follow on PR too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can change it in this PR also.

/// For example two InListExpr can be considered to be equals no matter the order:
///
/// In('a','b','c') == In('c','b','a')
pub fn expr_list_eq_any_order(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it could be in a utils or some other module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, currently there is no utils, I will add one. In the following PRs more util methods will be added.

…nto issue-3854-part1
@mingmwang
Copy link
Contributor Author

@alamb Please help to take a look.

@mingmwang
Copy link
Contributor Author

@alamb Could you please approve and merge the PR? Then I can work on the following parts.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good to me -- thank you @mingmwang

@alamb
Copy link
Contributor

alamb commented Oct 28, 2022

Any concerns about merging this in @yahoNanJing @andygrove @liukun4515 @Dandandan @isidentical @avantgardnerio ? Does anyone want longer to review it?

.map(|x| {
self.expr.eq(&x.expr)
&& self.cast_type == x.cast_type
&& self.cast_options.safe == x.cast_options.safe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a bit error-prone, considering that cast_options might get new members and this part will not count them. Can we consider adding equality comparison support to CastOptions directly?

Copy link
Contributor

@isidentical isidentical Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it seems like we can't change it here. Created a ticket on arrow-rs.

assert!(expr_list_eq_any_order(list4.as_slice(), list3.as_slice()));
assert!(expr_list_eq_any_order(list3.as_slice(), list3.as_slice()));
assert!(expr_list_eq_any_order(list4.as_slice(), list4.as_slice()));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I didn't miss any, it seems like expr_list_eq_strict_order can be also tested with these (it currently seems like missing tests):

Suggested change
assert!(expr_list_eq_strict_order(list1.as_slice(), list1.as_slice());
assert!(expr_list_eq_strict_order(list2.as_slice(), list2.as_slice());
assert!(expr_list_eq_strict_order(list3.as_slice(), list3.as_slice());
assert!(expr_list_eq_strict_order(list4.as_slice(), list4.as_slice());

@isidentical
Copy link
Contributor

Only a minor testing suggestion, but other than that at least a brief overview of the general logic on comparisons and utilities looks great to me.

@Dandandan
Copy link
Contributor

No problems merging this in. The overall goal seems great to me, but not much time now to go over the code.

@alamb
Copy link
Contributor

alamb commented Oct 28, 2022

Will plan to merge this in tomorrow unless I hear otherwise -- if @mingmwang hasn't had a chance, we can do the changes as a follow on PR

@alamb
Copy link
Contributor

alamb commented Oct 29, 2022

I am merging this PR in as it has already been around for a while and is blocking work. @mingmwang can you please respond to @isidentical 's comment #3969 (comment) and submit another PR with additional tests, if appropriate?

@alamb alamb merged commit 6fae0bd into apache:master Oct 29, 2022
.downcast_ref::<Self>()
.map(|x| {
self.expr.eq(&x.expr)
&& self.cast_type == x.cast_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a todo note in #4012

@ursabot
Copy link

ursabot commented Oct 29, 2022

Benchmark runs are scheduled for baseline = 925a962 and contender = 6fae0bd. 6fae0bd is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@mingmwang
Copy link
Contributor Author

I am merging this PR in as it has already been around for a while and is blocking work. @mingmwang can you please respond to @isidentical 's comment #3969 (comment) and submit another PR with additional tests, if appropriate?

@alamb @isidentical
Sure, will add one more UT in the following PR.

jimexist pushed a commit to jimexist/arrow-datafusion that referenced this pull request Oct 31, 2022
…che#3969)

* [Part1] Partition and Sort Enforcement, PhysicalExpr enhancement

* Resolve review comments

* tiny fix to doc
Dandandan pushed a commit to yuuch/arrow-datafusion that referenced this pull request Nov 5, 2022
…che#3969)

* [Part1] Partition and Sort Enforcement, PhysicalExpr enhancement

* Resolve review comments

* tiny fix to doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Partition and Sort Enforcement
5 participants