-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation #40488
[SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation #40488
Conversation
e8f0649
to
ff974fa
Compare
@rednaxelafx, @cloud-fan let me know it this PR is a viable alternative to #40473. Or maybe if I should do a little cleanup like peter-toth@90421cb in this or in a follow-up PR... |
ff974fa
to
345b9b5
Compare
Before the recent rounds of changes to EquivalentExpressions, the old Your proposed PR here further orphans that function from any actual use. Which is okay for keeping binary compatibility as much as possible. BTW I updated my PR's test case because it makes more sense to check the return value from |
case a | ||
if AggregateExpression.isAggregate(a) && !equivalentAggregateExpressions.addExpr(a) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with addExpr
here? It does simplify the code IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line of thought would be: adding the supportedExpression
guard to addExpr()
would cause performance regression, so let's just close our eyes and make the only remaining use of addExpr
break away and do its own deduplication in the old logic without taking things like NamedLambdaVariable
into account -- which is the way it's been for quite a few releases. This PR essentially inlines the addExpr
path of the old EquivalentExpressions
into PhysicalAggregation
to recover what it used to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the above, although .addExpr()
fits here well and does the job, isn't it a bit weird that an add-like method of a collection-like object doesn't return true
when a new item was added, but actually it flips the meaning of the return value? If it was used at multiple places then I would keep it, but we use it only here. But maybe I'm just nitpicking...
Anyways, I'm ok with #40473 too.
What changes were proposed in this pull request?
This PR proposes to replace
EquivalentExpressions
to a simple mutable map inPhysicalAggregation
, the only place whereEquivalentExpressions.addExpr()
is used.EquivalentExpressions
is useful for common subexpression elimination but inPhysicalAggregation
it is used only to deduplicate whole expressions which can be easily done with a simple map.Why are the changes needed?
EquivalentExpressions.addExpr()
is not guarded bysupportedExpression()
and so it can cause inconsistent results when used together withEquivalentExpressions.getExprState()
. This PR proposes replacing.addExpr()
with other alternatives as its boolean result is a bit counter-intuitive to other collections'.add()
methods. It returnsfalse
if the expression was missing and either adds the expression or not depending on if the expression is deterministic.After this PR we no longer use
EquivalentExpressions.addExpr()
so it can be deprecated or even removed...Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added new UTs from @rednaxelafx's PR: #40473. Please note that those UTs actually pass after #40475, but they are added here to make sure there will be no regression in the future.