Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1869362: Plan plotter improvements #2813

Merged
merged 4 commits into from
Jan 3, 2025

Conversation

sfc-gh-aalam
Copy link
Contributor

@sfc-gh-aalam sfc-gh-aalam commented Dec 29, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1869362

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

    Made the following improvements with this PR:

  • Add threshold so only those plans which are above a threshold score are plotted.
  • Note the name of with query block
  • Print the name of SnowflakeCreateTable
  • Print the name of SelectableEntitity
  • Shade with query blocks in gray.

@sfc-gh-aalam sfc-gh-aalam added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label Dec 29, 2024
@sfc-gh-aalam sfc-gh-aalam marked this pull request as ready for review January 2, 2025 19:13
@sfc-gh-aalam sfc-gh-aalam requested review from a team as code owners January 2, 2025 19:13
Comment on lines 798 to 801
os.environ["ENABLE_SNOWPARK_LOGICAL_PLAN_PLOTTING"] = str(enabled)
os.environ["SNOWPARK_LOGICAL_PLAN_PLOTTING_THRESHOLD"] = str(
plotting_score_threshold
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be better done with something like:
with mock.patch.dict(os.environ, {...}):

try:
os.environ["ENABLE_SNOWPARK_LOGICAL_PLAN_PLOTTING"] = str(enabled)
os.environ["SNOWPARK_LOGICAL_PLAN_PLOTTING_THRESHOLD"] = str(
plotting_score_threshold
)
tmp_dir = tempfile.gettempdir()

with patch("graphviz.Graph.render") as mock_render:
large_query_df.collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we perhaps add a comment explaining that the actual complexity for large_query_df falls somewhere between 0 and 10M?

Copy link
Contributor

@sfc-gh-helmeleegy sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@@ -381,15 +383,27 @@ def plot_plan_if_enabled(root: LogicalPlan, filename: str) -> None:
):
return

if int(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this Plotting threshold used for? seems it is used for restricting the complexity score? maybe call this SNOWPARK_LOGICAL_PLAN_PLOTTING_COMPLEXITY_THRESHOLD to be more clear

if node is None:
return "EMPTY_SOURCE_PLAN" # pragma: no cover
addr = hex(id(node))
name = str(type(node)).split(".")[-1].split("'")[0]
return f"{name}({addr})"
suffix = ""
if isinstance(node, SnowflakeCreateTable):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here about what are the different printing used here

@@ -381,15 +383,27 @@ def plot_plan_if_enabled(root: LogicalPlan, filename: str) -> None:
):
return

if int(
os.environ.get("SNOWPARK_LOGICAL_PLAN_PLOTTING_THRESHOLD", 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's simply make the default threshold -1, be clear that by default plot out all nodes.

was there a reason about why we want to add this threshold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. In my tests, I generally want to plot and debug "big" plans but sometime the plans get overwritten by smaller plan if they are present somewhere. That's why I added this variable. I don't think this is the best way - I'm open to suggestions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific about " sometime the plans get overwritten by smaller plan if they are present somewhere"? not quite getting this part, and what information you want to get to help your debugging process?

@sfc-gh-aalam sfc-gh-aalam merged commit e75b506 into main Jan 3, 2025
40 checks passed
@sfc-gh-aalam sfc-gh-aalam deleted the aalam-SNOW-1869362-improve-plan-plotter branch January 3, 2025 18:23
@github-actions github-actions bot locked and limited conversation to collaborators Jan 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants