Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Update AST join split condition code to avoid idempotent issues #9680

Open
revans2 opened this issue Nov 13, 2023 · 0 comments
Open

[BUG] Update AST join split condition code to avoid idempotent issues #9680

revans2 opened this issue Nov 13, 2023 · 0 comments
Labels
bug Something isn't working tech debt

Comments

@revans2
Copy link
Collaborator

revans2 commented Nov 13, 2023

Describe the bug
This is related to #9635 (comment)

I am not 100% sure that this is a bug at all.

The current code for non-AST-able join condition splitting will do three passes through the join condition.

The first pass will verify that we can fix up the join condition to make this work.

The second pass will build come up with a plan on how to fix-up the condition

The third pass will actually execute the plan and update the condition.

The issue is that in both the second and third passes the join condition will be converted, partially or fully, to the GPU. This assumes that the conversion will be idempotent. I don't know of any case where it is not, but it could become a problem in the future because I don't think we ever guarantee that will happen.

The first pass happens both when we tag the expression to see if we can support this operation on the GPU, but also as we are trying to do the conversion. I think this is to avoid caching some data in-between tagging and conversion.

To me I think the long term solution is to operate on CPU expressions + BaseExprMeta instances instead of GPU expressions. This way we don't need to convert anything to the GPU until we are fully done with rewriting the query. This is not simple because the Meta objects are not setup to be modified in this kind of a way. In general I would like to see us walk through the the condition in almost exactly the same way that we walk through it to check if we can split up the join condition. We might even be able to use the same code for walking the tree. When we hit a spot where we would need to split we do the split right then and there. We put in a new Alias and append the sub-tree to an input project operation. If we need to we can also do deduping of sub-expressions that are replaced. For that I would do the ExpressionEquals check on the CPU expressions instead of the GPU expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tech debt
Projects
None yet
Development

No branches or pull requests

2 participants