Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fast path of with_new_children() in ExecutionPlan #2168

Merged
merged 3 commits into from
Apr 7, 2022

Conversation

mingmwang
Copy link
Contributor

Which issue does this PR close?

Closes #1965.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added ballista datafusion Changes in the datafusion crate labels Apr 6, 2022
@mingmwang
Copy link
Contributor Author

@yjshen @alamb

Please help to take a look.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great @mingmwang -- thank you very much.

Another thing we might want to consider doing would be to change the signature of ExecutionPlan trait to require an Arc to call with_new_children, like so:

    fn with_new_children(
        self: Arc<Self>,
        children: Vec<Arc<dyn ExecutionPlan>>,
    ) -> Result<Arc<dyn ExecutionPlan>> {
  // put with_new_children_if_necessary implementation here
...
}

That would ensure that we could always apply the with_new_children_if_necessary logic

"TopKExec wrong number of children".to_string(),
)),
}
Ok(Arc::new(TopKExec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about adding an assert here such asassert_eq!(children.len(), 1) on the theory it might catch bugs faster when making changes?

The same question applies to other with_new_children implementations below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the length check in the with_new_children() method and move the check to with_new_children_if_necessary() so that we can avoid the duplicate logic in all the trait implementations. In future if someone misuse this and still call the with_new_children() and pass the wrong children param, it will lose the check, but if this is the case, the real problem is he should not call with_new_children() and everyone should call with_new_children_if_necessary().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure in Rust whether I can declare a method in the Trait to protected so that it can only be visible in this module and in the Trait implementations.

@@ -258,6 +258,32 @@ pub trait ExecutionPlan: Debug + Send + Sync {
fn statistics(&self) -> Statistics;
}

/// Returns a new plan where all children were replaced by new plans if the provided children
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mingmwang
Copy link
Contributor Author

ExecutionPlan

I think this looks great @mingmwang -- thank you very much.

Another thing we might want to consider doing would be to change the signature of ExecutionPlan trait to require an Arc to call with_new_children, like so:

    fn with_new_children(
        self: Arc<Self>,
        children: Vec<Arc<dyn ExecutionPlan>>,
    ) -> Result<Arc<dyn ExecutionPlan>> {
  // put with_new_children_if_necessary implementation here
...
}

That would ensure that we could always apply the with_new_children_if_necessary logic

Done.

@apache apache deleted a comment from alamb Apr 7, 2022
Copy link
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Minor comments on the doc.

Comment on lines 261 to 262
/// Returns a new plan where all children were replaced by new plans if the provided children
/// do not share the same point references with the existing children.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns a new plan where all children were replaced by new plans if the provided children
/// do not share the same point references with the existing children.
/// Returns a copy of this plan if we change any child according to pointer comparison.
///

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// do not share the same point references with the existing children.
/// The size of `children` must be equal to the size of `ExecutionPlan::children()`.
/// Allow the vtable address comparisons for ExecutionPlan Trait Objects,it is harmless even
/// in the case of 'false-native'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be false-positive I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be 'false-native'(two expected equal things end up not comparing equal) for Trait objects.

https://stackoverflow.com/questions/67109860/how-to-compare-trait-objects-within-an-arc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are reporting two equal items to be non-equal, it results in making a copy, so I term it 'false-positive'.

It should be 'false negative' if you only mean data equality. 😆

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a step in the right direction -- thank you @mingmwang

@alamb alamb merged commit fa9e016 into apache:master Apr 7, 2022
gandronchik pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 30, 2022
)

* Implement fast path of with_new_children() in ExecutionPlan

* resolve review comments

* refine comments
gandronchik pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 31, 2022
)

* Implement fast path of with_new_children() in ExecutionPlan

* resolve review comments

* refine comments
gandronchik pushed a commit to cube-js/arrow-datafusion that referenced this pull request Sep 2, 2022
)

* Implement fast path of with_new_children() in ExecutionPlan

* resolve review comments

* refine comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fast path of with_new_children() in ExecutionPlan
3 participants