-
Notifications
You must be signed in to change notification settings - Fork 28.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-49836][SQL][SS] Fix possibly broken query when window is provi…
…ded to window/session_window fn ### What changes were proposed in this pull request? This PR fixes the correctness issue about losing operators during analysis - it happens when window is provided to window()/session_window() function. The rule `TimeWindowing` and `SessionWindowing` are responsible to resolve the time window functions. When the window function has `window` as parameter (time column) (in other words, building time window from time window), the rule wraps window with WindowTime function so that the rule ResolveWindowTime will further resolve this. (And TimeWindowing/SessionWindowing will resolve this again against the result of ResolveWindowTime.) The issue is that the rule uses "return" for the above, which intends to have "early return" as the other branch is too long compared to this branch. This unfortunately does not work as intended - the intention is just to go out of current local scope (mostly end of curly brace), but it seems to break the loop of execution in "outer" side. (I haven't debugged further but it's simply clear that it doesn't work as intended.) Quoting from Scala doc: > Nonlocal returns are implemented by throwing and catching scala.runtime.NonLocalReturnException-s. It's not super clear where NonLocalReturnException is caught in the call stack; it might exit the execution for much broader scope (context) than expected. And it's finally deprecated in Scala 3.2 and likely be removed in future. https://dotty.epfl.ch/docs/reference/dropped-features/nonlocal-returns.html Interestingly it does not break every query for chained time window aggregations. Spark already has several tests with DataFrame API and they haven't failed. The reproducer in community report is using SQL statement - where each aggregation is considered as subquery. This PR fixes the rule to NOT use early return and instead have a huge if else. ### Why are the changes needed? Described in above. ### Does this PR introduce _any_ user-facing change? Yes, this fixes the possible query breakage. The impacted workloads may not be very huge as chained time window aggregations is an advanced usage, and it does not break every query for the usage. ### How was this patch tested? New UTs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48309 from HeartSaVioR/SPARK-49836. Lead-authored-by: Jungtaek Lim <[email protected]> Co-authored-by: Andrzej Zera <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
- Loading branch information
1 parent
0c653db
commit d8c04cf
Showing
3 changed files
with
232 additions
and
127 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.