Skip to content

Commit

Permalink
[SPARK-49646][SQL] add spark config for fixing subquery decorrelation…
Browse files Browse the repository at this point in the history
… for union/set operations when parentOuterReferences has references not covered in collectedChildOuterReferences

### What changes were proposed in this pull request?

Spark config added for this change: apache#48109

### Why are the changes needed?

For safer backports

### Does this PR introduce _any_ user-facing change?

yes, adds a user-facing config `spark.sql.optimizer.decorrelateUnionOrSetOpUnderLimit.enabled`. Set it to true will enable decorrelating subqueries having correlated references under Union/Set operators which are under Limit operators. It is by default true, setting it to false make spark reverting to incorrect legacy behavior which raises exceptions when decorrelate the above query patterns.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#49536 from AveryQi115/SPARK-49646-2.

Lead-authored-by: Avery Qi <[email protected]>
Co-authored-by: Avery <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
AveryQi115 authored and cloud-fan committed Jan 22, 2025
1 parent 6658846 commit 6bde48c
Showing 2 changed files with 18 additions and 1 deletion.
Original file line number Diff line number Diff line change
@@ -1064,7 +1064,14 @@ object DecorrelateInnerQuery extends PredicateHelper {
// Project, they could get added at the beginning or the end of the output columns
// depending on the child plan.
// The inner expressions for the domain are the values of newOuterReferenceMap.
val domainProjections = newOuterReferences.map(newOuterReferenceMap(_))
val domainProjections =
if (SQLConf.get.getConf(
SQLConf.DECORRELATE_UNION_OR_SET_OP_UNDER_LIMIT_ENABLED
)) {
newOuterReferences.map(newOuterReferenceMap(_))
} else {
collectedChildOuterReferences.map(newOuterReferenceMap(_))
}
val newChild = Project(child.output ++ domainProjections, decorrelatedChild)
(newChild, newJoinCond, newOuterReferenceMap)
}
Original file line number Diff line number Diff line change
@@ -3998,6 +3998,16 @@ object SQLConf {
.booleanConf
.createWithDefault(true)

val DECORRELATE_UNION_OR_SET_OP_UNDER_LIMIT_ENABLED =
buildConf("spark.sql.optimizer.decorrelateUnionOrSetOpUnderLimit.enabled")
.internal()
.doc("Decorrelate UNION or SET operation under LIMIT operator. If not enabled," +
"revert to legacy incorrect behavior for certain subqueries with correlation under" +
"UNION/SET operator with a LIMIT operator above it.")
.version("4.0.0")
.booleanConf
.createWithDefault(true)

val DECORRELATE_EXISTS_IN_SUBQUERY_LEGACY_INCORRECT_COUNT_HANDLING_ENABLED =
buildConf("spark.sql.optimizer.decorrelateExistsSubqueryLegacyIncorrectCountHandling.enabled")
.internal()

0 comments on commit 6bde48c

Please sign in to comment.