You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current code for non-AST-able join condition splitting will do three passes through the join condition.
The first pass will verify that we can fix up the join condition to make this work.
The second pass will build come up with a plan on how to fix-up the condition
The third pass will actually execute the plan and update the condition.
The issue is that in both the second and third passes the join condition will be converted, partially or fully, to the GPU. This assumes that the conversion will be idempotent. I don't know of any case where it is not, but it could become a problem in the future because I don't think we ever guarantee that will happen.
The first pass happens both when we tag the expression to see if we can support this operation on the GPU, but also as we are trying to do the conversion. I think this is to avoid caching some data in-between tagging and conversion.
To me I think the long term solution is to operate on CPU expressions + BaseExprMeta instances instead of GPU expressions. This way we don't need to convert anything to the GPU until we are fully done with rewriting the query. This is not simple because the Meta objects are not setup to be modified in this kind of a way. In general I would like to see us walk through the the condition in almost exactly the same way that we walk through it to check if we can split up the join condition. We might even be able to use the same code for walking the tree. When we hit a spot where we would need to split we do the split right then and there. We put in a new Alias and append the sub-tree to an input project operation. If we need to we can also do deduping of sub-expressions that are replaced. For that I would do the ExpressionEquals check on the CPU expressions instead of the GPU expressions.
The text was updated successfully, but these errors were encountered:
Describe the bug
This is related to #9635 (comment)
I am not 100% sure that this is a bug at all.
The current code for non-AST-able join condition splitting will do three passes through the join condition.
The first pass will verify that we can fix up the join condition to make this work.
The second pass will build come up with a plan on how to fix-up the condition
The third pass will actually execute the plan and update the condition.
The issue is that in both the second and third passes the join condition will be converted, partially or fully, to the GPU. This assumes that the conversion will be idempotent. I don't know of any case where it is not, but it could become a problem in the future because I don't think we ever guarantee that will happen.
The first pass happens both when we tag the expression to see if we can support this operation on the GPU, but also as we are trying to do the conversion. I think this is to avoid caching some data in-between tagging and conversion.
To me I think the long term solution is to operate on CPU expressions + BaseExprMeta instances instead of GPU expressions. This way we don't need to convert anything to the GPU until we are fully done with rewriting the query. This is not simple because the Meta objects are not setup to be modified in this kind of a way. In general I would like to see us walk through the the condition in almost exactly the same way that we walk through it to check if we can split up the join condition. We might even be able to use the same code for walking the tree. When we hit a spot where we would need to split we do the split right then and there. We put in a new Alias and append the sub-tree to an input project operation. If we need to we can also do deduping of sub-expressions that are replaced. For that I would do the ExpressionEquals check on the CPU expressions instead of the GPU expressions.
The text was updated successfully, but these errors were encountered: