-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert outer join
to inner join
to improve performance
#1585
Comments
@xudong963 I'd like to pick this up once #1618 is (hopefully) eventually merged. Question re: implementation - do you think this should be a separate optimizer pass or something else? Apologies in advance if this is all obvious - the reason I ask is that it seems that this rule benefits from filter pushdown occurring both before and after the join rewrite happens. For example, consider the following contrived query: SELECT * FROM (
SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.uid
)
WHERE t2.uid IS NOT NULL If we do not push -- Push filter into subquery
SELECT * FROM (
SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.uid
WHERE t2.uid IS NOT NULL
)
-- Rewrite join
SELECT * FROM (
SELECT * FROM t1 INNER JOIN t2 ON t1.id = t2.uid
WHERE t2.uid IS NOT NULL
) However, at this point we would benefit from another pass with the filter pushdown rule, as It looks like optimizer passes are sequenced in a vector defined in |
My opinion is it would be cleaner to manage it as a separate rule because join rewrite is not really related to predicate pushdowns. I think we have a guiding rule that plan optimization run should be agnostic to the order of plan rules defined in |
Could you explain more specifically about the |
Relevant discussion: #1618 (comment) You can see a version of this code in Spark here: https://github.com/apache/spark/blob/aaf0e5e71509a2324e110e45366b753c7926c64b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L119-L135 |
I think the issue finished, close it! |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Under certain conditions, convert
outer join
toinner join
to improve performance.Describe the solution you'd like
During predicate pushdown, if the filter condition (
where
) only references the non-nullable-side table, then it can be pushed down. But if the filter condition (where
) references the nullable-side table, it can't be pushed down, things get bad, which leads to performance cracking.Fortunately, if the filtering conditions are strict, we can convert the outer join to inner join, all filter conditions can be pushed down.
Broadly speaking, a function, operator, or expression is considered strict if the input argument is NULL and the output is either NULL or FALSE.
Describe alternatives you've considered
No
Additional context
AFAIK, Postgres has the feature.
The text was updated successfully, but these errors were encountered: