Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add progress bar support for fault tolerant execution #16036

Merged
merged 1 commit into from
Apr 15, 2023

Conversation

linzebing
Copy link
Member

Description

Unlike pipelined execution, fault tolerant execution doesn't execute stages all at once. This means in the middle of execution, some stages will be in PLANNED state, which is seen as not scheduled. Therefore, no progress will be displayed until the very end for fault tolerant execution.

This PR adds progress bar support for fault tolerant execution.

Additional context and related issues

Fix #13072

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Section
* Add progress bar support for fault tolerant execution. ({issue}`13072`)

@cla-bot cla-bot bot added the cla-signed label Feb 9, 2023
@github-actions github-actions bot added the jdbc Relates to Trino JDBC driver label Feb 9, 2023
@@ -264,6 +266,7 @@ public QueryStats(
this.peakTaskRevocableMemory = requireNonNull(peakTaskRevocableMemory, "peakTaskRevocableMemory is null");
this.peakTaskTotalMemory = requireNonNull(peakTaskTotalMemory, "peakTaskTotalMemory is null");
this.scheduled = scheduled;
this.faultTolerantStageScheduled = faultTolerantStageScheduled;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a bit to much for this single usecase. Maybe instead let's have a flag if query is using FTE and a counter how many stages are already scheduled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought about that. However, that may cause confusion to the end users as we won't be able to know whether we are using FTE here

StatementStats.builder()
.setState(state.toString())
.setQueued(state == QUEUED)
.setElapsedTimeMillis(elapsedTime.toMillis())
.setQueuedTimeMillis(queuedTime.toMillis())
.build(),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is a big deal that we always return false here.
We can also make value OptionalBoolean.
Or we can have a String retryMode field where we can put
NONE, QUERY, TASK and UNKNOWN. We can then use UNKNOWN in this context

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formulas looks like. Let's settle on how we mark query as FTE in stats.

@linzebing linzebing force-pushed the fte-progress-bar branch 2 times, most recently from d8a47d9 to 66958fb Compare February 23, 2023 22:37
Copy link
Contributor

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nits

if (!scheduled || totalSplits == 0) {
return OptionalDouble.empty();
}
return OptionalDouble.of(min(100, (completedSplits * 100.0) / totalSplits));
}

public int getFaultTolerantExecutionRunningPercentage()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe extract the implementation into a helper method (you can pass a boolean flag on what to compute)

stats.getTotalSplits());
String progressBar;
if (stats.isFaultTolerantExecution()) {
progressBar = formatProgressBar(progressWidth, progressPercentage, stats.getFaultTolerantExecutionRunningPercentage(), 100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe have stats#getRunningPercentage that has a condition based on what execution mode is used?

Copy link
Member Author

@linzebing linzebing Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really feasible if we don't want to change the signature of formatProgressBar:

if (stats.isFaultTolerantExecution()) {
     progressBar = formatProgressBar(progressWidth, progressPercentage, stats.getFaultTolerantExecutionRunningPercentage(), 100);
 }
else {
    progressBar = formatProgressBar(progressWidth,
        stats.getCompletedSplits(),
        max(0, stats.getRunningSplits()),
        stats.getTotalSplits());
}

return getFaultTolerantExecutionSplitPercentage(false, true);
}

private OptionalDouble getFaultTolerantExecutionSplitPercentage(boolean includeCompletedSplits, boolean includeRunningSplits)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this one guarantee that progress will not decrease? totalSplits can grow for stage, so I do not believe it is guaranteed. Am I missing something, or we just not care?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't know totalSplits can grow per stage. But still, seems this is the best thing we can do now.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question

@losipiuk
Copy link
Member

I tested it out on tcph.sf10 with:

SELECT
  c.custkey,
  c.name,
  sum(l.extendedprice * (1 - l.discount)) AS revenue,
  c.acctbal,
  n.name,
  c.address,
  c.phone,
  c.comment
FROM
  "lineitem" AS l,
  "orders" AS o,
  "customer" AS c,
  "nation" AS n
WHERE
  c.custkey = o.custkey
  AND l.orderkey = o.orderkey
  AND o.orderdate >= DATE '1993-10-01'
  AND o.orderdate < DATE '1993-10-01' + INTERVAL '3' MONTH
  AND l.returnflag = 'R'
  AND c.nationkey = n.nationkey
GROUP BY
  c.custkey,
  c.name,
  c.acctbal,
  c.phone,
  n.name,
  c.address,
  c.comment
ORDER BY
  revenue DESC
LIMIT 20
;

For quite a few iterations the running splits are reported as 0 which make progress bar not hav '>' at the end. It is kinda accurate but does not look pretty.
I think the question is why we have 0 running splits for prolonged period of time and still get progress. @arhimondr ideas?

@arhimondr
Copy link
Contributor

I think the question is why we have 0 running splits for prolonged period of time and still get progress.

Interesting. I don't really know how is it possible to have 0 running splits for prolonged period. Did we check what do running tasks report?

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should take this opportunity to move the progress percentage logic into the server. Rather than including a faultTolerantExecution flag, we simply add a progressPercentage property to the StatementStats constructor.

Many years ago, we added the getProgressPercentage() method with @JsonProperty, so the server returns it and this is used by the web UI and maybe other clients as well. We should complete this transition by moving the computation into the constructor, so that StatementStats directly accepts the computed value.

This will mean that any clients using the property will automatically get the FTE behavior.

@electrum
Copy link
Member

Looking at this closer, seems like we need to add a runningPercentage stat as well.

Does the scheduled flag make sense for FTE? The CLI currently does

if (stats.isScheduled()) {

but we could change that to

if (stats.getProgressPercentage().isPresent()) {

since that already includes the scheduled flag.

@linzebing
Copy link
Member Author

@electrum : we can't really. The logic for formatting a progress bar is different for streaming mode and FTE mode (to prevent progress bar from going back and forth): FTE mode uses runningPercentage but streaming mode doesn't. This means we anyways need to include the faultTolerantExecution flag.

@electrum
Copy link
Member

What if we add runningPercentage for both modes and use it if present, otherwise falling back to the old mode (for new clients talking to older servers)?

The goal here is to keep as much of the logic on the server as possible. We don't want every client to have to understand all of these details. Does this make sense?

@linzebing
Copy link
Member Author

OK, I will address this some time later

@linzebing linzebing requested a review from dedep March 30, 2023 06:20
Copy link
Contributor

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % There's a chance something may break we don't know of once we remove the scheduled field. Unless there's a good reason why to remove it I would recommend keeping it.

@linzebing linzebing force-pushed the fte-progress-bar branch 3 times, most recently from 598df8b to 6dbe38e Compare March 31, 2023 03:07
@linzebing linzebing force-pushed the fte-progress-bar branch 2 times, most recently from fe73abe to be73172 Compare April 3, 2023 14:33
@losipiuk losipiuk merged commit 51f10c6 into trinodb:master Apr 15, 2023
@github-actions github-actions bot added this to the 414 milestone Apr 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed jdbc Relates to Trino JDBC driver
Development

Successfully merging this pull request may close these issues.

Add progress bar support for fault tolerant execution
4 participants