-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1865904 fix query gen when nested cte node is partitioned #2816
SNOW-1865904 fix query gen when nested cte node is partitioned #2816
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a couple of nits. Thanks!
Co-authored-by: Hazem Elmeleegy <[email protected]>
for parent in parents: | ||
replace_child(parent, child, temp_table_selectable, self._query_generator) | ||
|
||
nodes_to_reset = list(parents) | ||
while nodes_to_reset: | ||
node = nodes_to_reset.pop() | ||
if node in updated_nodes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give some more detailed description about what is the problem if we skip the node update here ? the cte optimizaiton is doing a level to level update, so it can skip the node update if it is already updated, but for large query breakdown, i recall you were doing a dfs, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider a tree below
node1
|
node2
/ \
node3 node4
suppose we start by re-resolving node3 -> node2 -> node1
. In this process, node2
and node1
are marked in updated_nodes
. Now, when we go updating the ancestors of node4
, re-resolution of node2
and node1
would be skipped. This is not ideal if node4
update can also trigger a re-update of node2
and node1
.
For example, this could be problematic is when node3
and node4
before the update had referenced_cte
map. After the first update node3 -> node2 -> node1
, node2
will be resolved with an older version of node4
. If after a re-resolved node4
there are no referenced_ctes, then node2
will not be updated with this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it is because you are doing a dfs, if you are doing updated in the order of node3, node4 then node2, last node1 you shouldn't run into problems.
@@ -185,7 +185,6 @@ def compile(self) -> Dict[PlanQueryType, List[Query]]: | |||
error_type=type(e).__name__, | |||
error_message=str(e), | |||
) | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a bug before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. I just noticed an unnecessary pass
so I cleaned it up.
…-cte-node-is-partitioned
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1865904
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Update query generation for with query block where cte query is updated if the child plan is re-resolved during optimization process.