Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add REST API to generate DOT graph for individual query stage #310

Merged
merged 7 commits into from
Oct 5, 2022

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

Closes N/A

Rationale for this change

For large and complex queries, I would like to be able to view the query plan for an individual query stage.

What changes are included in this PR?

http://localhost:50050/api/job/rGk0Imj/stage/2/dot
digraph G {
		stage_2_0 [shape=box, label="ShuffleWriter [3 partitions]"]
		stage_2_0_0 [shape=box, label="CoalesceBatches [batchSize=4096]"]
		stage_2_0_0_0 [shape=box, label="Filter: d_year@1 = 2000"]
		stage_2_0_0_0_0 [shape=box, label="Parquet: mnt/bigdata/tpcds/sf100-parquet/date_dim.parquet [3 files] [3 partitions]"]
		stage_2_0_0_0_0 -> stage_2_0_0_0
		stage_2_0_0_0 -> stage_2_0_0
		stage_2_0_0 -> stage_2_0
}

Are there any user-facing changes?

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/// Generate a dot graph for the specified job id and return as plain text
pub(crate) async fn get_job_dot_graph<T: AsLogicalPlan, U: AsExecutionPlan>(
data_server: SchedulerServer<T, U>,
job_id: String,
) -> Result<String, Rejection> {
let graph = data_server
let maybe_graph = data_server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked just "graph" 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this to avoid the need for this intermediate variable

/// Get the name of the variant
pub(crate) fn name(&self) -> &str {
match self {
ExecutionStage::UnResolved(_) => "Unresolved",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this simply be the impl Display of ExecutionStage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this to variant_name() to make it clear that this is just returning the variant name, not the full details of the stage

)?);
}
}
writeln!(&mut dot, "\t\tlabel = \"Stage {} [{}]\";", id, stage.name())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor 😃

@andygrove
Copy link
Member Author

@avantgardnerio I pushed changes to address feeback and also changed job name from Option<String> to just String as you suggested in a previous PR.

@andygrove andygrove merged commit 0e6f046 into apache:master Oct 5, 2022
@andygrove andygrove deleted the query-stage-dot branch October 5, 2022 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants