-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Analyzer phase to DataFusion , add basic validation logic to Subquery Plans and Expressions #5570
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mingmwang -- I think this PR's structure looks great.
Do you plan to move the https://github.com/apache/arrow-datafusion/blob/main/datafusion/optimizer/src/type_coercion.rs logic into the analyzer as well?
I had some suggestions on how to reduce replication which I think would be nice but not strictly required.
use datafusion_expr::LogicalPlan; | ||
|
||
/// a Trait for marking tree node types that are rewritable | ||
pub trait TreeNodeRewritable: Clone { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to largely be the same as https://github.com/apache/arrow-datafusion/blob/146a949218ec970784974137277cde3b4e547d0a/datafusion/physical-expr/src/rewrite.rs#L25 but they are both generic,
Would it be possible to move the trait into datafusion_common
and only keep the impls in logical and physical plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past, I had tried to move the Trait to a common place but failed due to Rust's orphan rule limitations.
Either the trait or the type for which we are implementing the trait must be defined in the same crate. The major problem is the Arc
struct is from the std crate.
impl TreeNodeRewritable for Arc<dyn ExecutionPlan>
impl TreeNodeRewritable for Arc<dyn PhysicalExpr>
Sure, I can work on this in future. But recently I plan to address the subquery related gaps first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A great job🚀
|
||
/// `AnalyzerRule` transforms the unresolved ['LogicalPlan']s and unresolved ['Expr']s into | ||
/// the resolved form. | ||
pub trait AnalyzerRule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should add a new dir for analyzer.
Because we can pull some analyzer rule into it.
}) | ||
} | ||
|
||
fn check_subquery_expr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the future, we also can move type_coercion into analyzer
Will these new analyzer phase rules be visible in output of explain commands? If not, then worth a follow-on ticket to track it |
Just log a ticket to track the effort. |
I think this is ready to go -- any concerns about merging @mingmwang ? |
Sure, please help to merge this. |
Which issue does this PR close?
Related to #2361, #5463.
Rationale for this change
What changes are included in this PR?
This PR add an Analyzer phase to DataFusion. It also adds some basic validation logic to Subquery plans and expressions.
It does not check all the unsupported cases. This PR just come up with a framework to do the necessary checking:
In the current DataFusion, there are other gaps to correctly identify the out reference columns in the
SqlToRel
process.Those gaps will be addressed in other PRs.
Are these changes tested?
Are there any user-facing changes?