-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48966][SQL] Improve error message with invalid unresolved column reference in UDTF call #47447
Conversation
|
||
// Check for unresolved TABLE arguments after the main check above to allow other analysis | ||
// errors to apply first, providing better error messages. | ||
getAllExpressions(operator).foreach(_.foreachUp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check this together with getAllExpressions(operator).foreach(_.foreachUp {
in line 290 so we don't need to iterate the plan again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because we need to check all the expressions for "unresolved attribute" or other errors first, before we do this check for FunctionTableSubqueryArgumentExpression
instances.
That code on L290 starts with this:
getAllExpressions(operator).foreach(_.foreachUp {
case a: Attribute if !a.resolved =>
failUnresolvedAttribute(operator, a, "UNRESOLVED_COLUMN")
...
We need to make sure we perform that "unresolved attribute" check for all expressions in the plan before we proceed further to check for FunctionTableSubqueryArgumentExpression
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I get it now. This is a node that should have been rewritten during analysis. cc @cloud-fan do you know if there are better places to add this?
case _: FunctionTableSubqueryArgumentExpression => | ||
expr.failAnalysis( | ||
errorClass = "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT", | ||
messageParameters = Map("treeNode" -> planToString(plan))) | ||
// Do nothing here, since we will check for this pattern later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we just remove this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally tried this, but found that we encountered a MatchError
here in the presence of the FunctionTableSubqueryArgumentExpression
. Therefore I had to add this "do nothing" case back here, to allow the new code on L395-403 to check for this pattern separately later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @allisonwang-db thanks for your review! Responded to your comments, please take another look.
case _: FunctionTableSubqueryArgumentExpression => | ||
expr.failAnalysis( | ||
errorClass = "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT", | ||
messageParameters = Map("treeNode" -> planToString(plan))) | ||
// Do nothing here, since we will check for this pattern later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally tried this, but found that we encountered a MatchError
here in the presence of the FunctionTableSubqueryArgumentExpression
. Therefore I had to add this "do nothing" case back here, to allow the new code on L395-403 to check for this pattern separately later.
|
||
// Check for unresolved TABLE arguments after the main check above to allow other analysis | ||
// errors to apply first, providing better error messages. | ||
getAllExpressions(operator).foreach(_.foreachUp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because we need to check all the expressions for "unresolved attribute" or other errors first, before we do this check for FunctionTableSubqueryArgumentExpression
instances.
That code on L290 starts with this:
getAllExpressions(operator).foreach(_.foreachUp {
case a: Attribute if !a.resolved =>
failUnresolvedAttribute(operator, a, "UNRESOLVED_COLUMN")
...
We need to make sure we perform that "unresolved attribute" check for all expressions in the plan before we proceed further to check for FunctionTableSubqueryArgumentExpression
.
Merged to master. |
…mn reference in UDTF call ### What changes were proposed in this pull request? This bug covers improving an error message in the event of invalid UDTF calls. For example: ``` select * from udtf( observed => TABLE(select column from t), value_col => classic_dollars ) ``` Currently we get: ``` Unsupported subquery expression: Table arguments are used in a function where they are not supported: 'UnresolvedTableValuedFunction [udtf], [observed => table-argument#68918 [], value_col => 'classic_dollars, false +- Project ... +- SubqueryAlias ... +- Relation ... ``` But the real error is that the user passed column identifier classic_dollars rather than string "classic_dollars" into the string argument. The core reason is that `CheckAnalysis` checks the analyzer output tree for unresolved TABLE arguments first before checking for remaining unresolved attribute references. To fix it, we update `CheckAnalysis` to move the check for remaining `TABLE` arguments to later. Now that query returns something like: ``` { "errorClass" : "UNRESOLVED_COLUMN.WITHOUT_SUGGESTION", "sqlState" : "42703", "messageParameters" : { "objectName" : "`classic_dollars`" }, "queryContext" : [ { "objectType" : "", "objectName" : "", "startIndex" : 93, "stopIndex" : 109, "fragment" : "classic_dollars" } ] } ``` ### Why are the changes needed? This improves error messages for SQL queries that are invalid, but that a user might reasonably create accidentally while figuring out how the syntax works for calling table-valued functions. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This PR adds and updates golden file test coverage. ### Was this patch authored or co-authored using generative AI tooling? Just some casual, everyday GitHub CoPilot usage. Closes apache#47447 from dtenedor/improve-error-udtf. Authored-by: Daniel Tenedorio <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…mn reference in UDTF call ### What changes were proposed in this pull request? This bug covers improving an error message in the event of invalid UDTF calls. For example: ``` select * from udtf( observed => TABLE(select column from t), value_col => classic_dollars ) ``` Currently we get: ``` Unsupported subquery expression: Table arguments are used in a function where they are not supported: 'UnresolvedTableValuedFunction [udtf], [observed => table-argument#68918 [], value_col => 'classic_dollars, false +- Project ... +- SubqueryAlias ... +- Relation ... ``` But the real error is that the user passed column identifier classic_dollars rather than string "classic_dollars" into the string argument. The core reason is that `CheckAnalysis` checks the analyzer output tree for unresolved TABLE arguments first before checking for remaining unresolved attribute references. To fix it, we update `CheckAnalysis` to move the check for remaining `TABLE` arguments to later. Now that query returns something like: ``` { "errorClass" : "UNRESOLVED_COLUMN.WITHOUT_SUGGESTION", "sqlState" : "42703", "messageParameters" : { "objectName" : "`classic_dollars`" }, "queryContext" : [ { "objectType" : "", "objectName" : "", "startIndex" : 93, "stopIndex" : 109, "fragment" : "classic_dollars" } ] } ``` ### Why are the changes needed? This improves error messages for SQL queries that are invalid, but that a user might reasonably create accidentally while figuring out how the syntax works for calling table-valued functions. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This PR adds and updates golden file test coverage. ### Was this patch authored or co-authored using generative AI tooling? Just some casual, everyday GitHub CoPilot usage. Closes apache#47447 from dtenedor/improve-error-udtf. Authored-by: Daniel Tenedorio <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…mn reference in UDTF call ### What changes were proposed in this pull request? This bug covers improving an error message in the event of invalid UDTF calls. For example: ``` select * from udtf( observed => TABLE(select column from t), value_col => classic_dollars ) ``` Currently we get: ``` Unsupported subquery expression: Table arguments are used in a function where they are not supported: 'UnresolvedTableValuedFunction [udtf], [observed => table-argument#68918 [], value_col => 'classic_dollars, false +- Project ... +- SubqueryAlias ... +- Relation ... ``` But the real error is that the user passed column identifier classic_dollars rather than string "classic_dollars" into the string argument. The core reason is that `CheckAnalysis` checks the analyzer output tree for unresolved TABLE arguments first before checking for remaining unresolved attribute references. To fix it, we update `CheckAnalysis` to move the check for remaining `TABLE` arguments to later. Now that query returns something like: ``` { "errorClass" : "UNRESOLVED_COLUMN.WITHOUT_SUGGESTION", "sqlState" : "42703", "messageParameters" : { "objectName" : "`classic_dollars`" }, "queryContext" : [ { "objectType" : "", "objectName" : "", "startIndex" : 93, "stopIndex" : 109, "fragment" : "classic_dollars" } ] } ``` ### Why are the changes needed? This improves error messages for SQL queries that are invalid, but that a user might reasonably create accidentally while figuring out how the syntax works for calling table-valued functions. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This PR adds and updates golden file test coverage. ### Was this patch authored or co-authored using generative AI tooling? Just some casual, everyday GitHub CoPilot usage. Closes apache#47447 from dtenedor/improve-error-udtf. Authored-by: Daniel Tenedorio <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This bug covers improving an error message in the event of invalid UDTF calls. For example:
Currently we get:
But the real error is that the user passed column identifier classic_dollars rather than string "classic_dollars" into the string argument.
The core reason is that
CheckAnalysis
checks the analyzer output tree for unresolved TABLE arguments first before checking for remaining unresolved attribute references.To fix it, we update
CheckAnalysis
to move the check for remainingTABLE
arguments to later. Now that query returns something like:Why are the changes needed?
This improves error messages for SQL queries that are invalid, but that a user might reasonably create accidentally while figuring out how the syntax works for calling table-valued functions.
Does this PR introduce any user-facing change?
No
How was this patch tested?
This PR adds and updates golden file test coverage.
Was this patch authored or co-authored using generative AI tooling?
Just some casual, everyday GitHub CoPilot usage.