[BUG] Update AST join split condition code to avoid idempotent issues #9680

revans2 · 2023-11-13T17:04:29Z

Describe the bug
This is related to #9635 (comment)

I am not 100% sure that this is a bug at all.

The current code for non-AST-able join condition splitting will do three passes through the join condition.

The first pass will verify that we can fix up the join condition to make this work.

The second pass will build come up with a plan on how to fix-up the condition

The third pass will actually execute the plan and update the condition.

The issue is that in both the second and third passes the join condition will be converted, partially or fully, to the GPU. This assumes that the conversion will be idempotent. I don't know of any case where it is not, but it could become a problem in the future because I don't think we ever guarantee that will happen.

The first pass happens both when we tag the expression to see if we can support this operation on the GPU, but also as we are trying to do the conversion. I think this is to avoid caching some data in-between tagging and conversion.

To me I think the long term solution is to operate on CPU expressions + BaseExprMeta instances instead of GPU expressions. This way we don't need to convert anything to the GPU until we are fully done with rewriting the query. This is not simple because the Meta objects are not setup to be modified in this kind of a way. In general I would like to see us walk through the the condition in almost exactly the same way that we walk through it to check if we can split up the join condition. We might even be able to use the same code for walking the tree. When we hit a spot where we would need to split we do the split right then and there. We put in a new Alias and append the sub-tree to an input project operation. If we need to we can also do deduping of sub-expressions that are replaced. For that I would do the ExpressionEquals check on the CPU expressions instead of the GPU expressions.

revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify tech debt labels Nov 13, 2023

revans2 mentioned this issue Nov 13, 2023

Support split non-AST-able join condition for BroadcastNestedLoopJoin #9635

Merged

winningsix mentioned this issue Nov 14, 2023

[FEA][Follow On] Support non AST rewrite as join condition #9661

Open

6 tasks

mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Update AST join split condition code to avoid idempotent issues #9680

[BUG] Update AST join split condition code to avoid idempotent issues #9680

revans2 commented Nov 13, 2023

[BUG] Update AST join split condition code to avoid idempotent issues #9680

[BUG] Update AST join split condition code to avoid idempotent issues #9680

Comments

revans2 commented Nov 13, 2023