-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1869362: Plan plotter improvements #2813
Conversation
os.environ["ENABLE_SNOWPARK_LOGICAL_PLAN_PLOTTING"] = str(enabled) | ||
os.environ["SNOWPARK_LOGICAL_PLAN_PLOTTING_THRESHOLD"] = str( | ||
plotting_score_threshold | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be better done with something like:
with mock.patch.dict(os.environ, {...}):
try: | ||
os.environ["ENABLE_SNOWPARK_LOGICAL_PLAN_PLOTTING"] = str(enabled) | ||
os.environ["SNOWPARK_LOGICAL_PLAN_PLOTTING_THRESHOLD"] = str( | ||
plotting_score_threshold | ||
) | ||
tmp_dir = tempfile.gettempdir() | ||
|
||
with patch("graphviz.Graph.render") as mock_render: | ||
large_query_df.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we perhaps add a comment explaining that the actual complexity for large_query_df
falls somewhere between 0 and 10M?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
@@ -381,15 +383,27 @@ def plot_plan_if_enabled(root: LogicalPlan, filename: str) -> None: | |||
): | |||
return | |||
|
|||
if int( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this Plotting threshold used for? seems it is used for restricting the complexity score? maybe call this SNOWPARK_LOGICAL_PLAN_PLOTTING_COMPLEXITY_THRESHOLD to be more clear
if node is None: | ||
return "EMPTY_SOURCE_PLAN" # pragma: no cover | ||
addr = hex(id(node)) | ||
name = str(type(node)).split(".")[-1].split("'")[0] | ||
return f"{name}({addr})" | ||
suffix = "" | ||
if isinstance(node, SnowflakeCreateTable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here about what are the different printing used here
@@ -381,15 +383,27 @@ def plot_plan_if_enabled(root: LogicalPlan, filename: str) -> None: | |||
): | |||
return | |||
|
|||
if int( | |||
os.environ.get("SNOWPARK_LOGICAL_PLAN_PLOTTING_THRESHOLD", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's simply make the default threshold -1, be clear that by default plot out all nodes.
was there a reason about why we want to add this threshold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. In my tests, I generally want to plot and debug "big" plans but sometime the plans get overwritten by smaller plan if they are present somewhere. That's why I added this variable. I don't think this is the best way - I'm open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific about " sometime the plans get overwritten by smaller plan if they are present somewhere"? not quite getting this part, and what information you want to get to help your debugging process?
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1869362
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Made the following improvements with this PR: