-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add progress bar support for fault tolerant execution #16036
Conversation
@@ -264,6 +266,7 @@ public QueryStats( | |||
this.peakTaskRevocableMemory = requireNonNull(peakTaskRevocableMemory, "peakTaskRevocableMemory is null"); | |||
this.peakTaskTotalMemory = requireNonNull(peakTaskTotalMemory, "peakTaskTotalMemory is null"); | |||
this.scheduled = scheduled; | |||
this.faultTolerantStageScheduled = faultTolerantStageScheduled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a bit to much for this single usecase. Maybe instead let's have a flag if query is using FTE and a counter how many stages are already scheduled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have thought about that. However, that may cause confusion to the end users as we won't be able to know whether we are using FTE here
trino/core/trino-main/src/main/java/io/trino/dispatcher/QueuedStatementResource.java
Lines 282 to 287 in 49e4c06
StatementStats.builder() | |
.setState(state.toString()) | |
.setQueued(state == QUEUED) | |
.setElapsedTimeMillis(elapsedTime.toMillis()) | |
.setQueuedTimeMillis(queuedTime.toMillis()) | |
.build(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is a big deal that we always return false
here.
We can also make value OptionalBoolean.
Or we can have a String retryMode
field where we can put
NONE
, QUERY
, TASK
and UNKNOWN
. We can then use UNKNOWN
in this context
b60577b
to
27ecf31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formulas looks like. Let's settle on how we mark query as FTE in stats.
d8a47d9
to
66958fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % nits
if (!scheduled || totalSplits == 0) { | ||
return OptionalDouble.empty(); | ||
} | ||
return OptionalDouble.of(min(100, (completedSplits * 100.0) / totalSplits)); | ||
} | ||
|
||
public int getFaultTolerantExecutionRunningPercentage() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe extract the implementation into a helper method (you can pass a boolean flag on what to compute)
stats.getTotalSplits()); | ||
String progressBar; | ||
if (stats.isFaultTolerantExecution()) { | ||
progressBar = formatProgressBar(progressWidth, progressPercentage, stats.getFaultTolerantExecutionRunningPercentage(), 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe have stats#getRunningPercentage
that has a condition based on what execution mode is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really feasible if we don't want to change the signature of formatProgressBar
:
if (stats.isFaultTolerantExecution()) {
progressBar = formatProgressBar(progressWidth, progressPercentage, stats.getFaultTolerantExecutionRunningPercentage(), 100);
}
else {
progressBar = formatProgressBar(progressWidth,
stats.getCompletedSplits(),
max(0, stats.getRunningSplits()),
stats.getTotalSplits());
}
core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java
Outdated
Show resolved
Hide resolved
66958fb
to
42ebf47
Compare
return getFaultTolerantExecutionSplitPercentage(false, true); | ||
} | ||
|
||
private OptionalDouble getFaultTolerantExecutionSplitPercentage(boolean includeCompletedSplits, boolean includeRunningSplits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this one guarantee that progress will not decrease? totalSplits can grow for stage, so I do not believe it is guaranteed. Am I missing something, or we just not care?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't know totalSplits
can grow per stage. But still, seems this is the best thing we can do now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question
42ebf47
to
69180a1
Compare
I tested it out on tcph.sf10 with:
For quite a few iterations the running splits are reported as 0 which make progress bar not hav '>' at the end. It is kinda accurate but does not look pretty. |
Interesting. I don't really know how is it possible to have 0 running splits for prolonged period. Did we check what do running tasks report? |
69180a1
to
54f4cf0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should take this opportunity to move the progress percentage logic into the server. Rather than including a faultTolerantExecution
flag, we simply add a progressPercentage
property to the StatementStats
constructor.
Many years ago, we added the getProgressPercentage()
method with @JsonProperty
, so the server returns it and this is used by the web UI and maybe other clients as well. We should complete this transition by moving the computation into the constructor, so that StatementStats
directly accepts the computed value.
This will mean that any clients using the property will automatically get the FTE behavior.
Looking at this closer, seems like we need to add a Does the if (stats.isScheduled()) { but we could change that to if (stats.getProgressPercentage().isPresent()) { since that already includes the |
@electrum : we can't really. The logic for formatting a progress bar is different for streaming mode and FTE mode (to prevent progress bar from going back and forth): FTE mode uses |
What if we add The goal here is to keep as much of the logic on the server as possible. We don't want every client to have to understand all of these details. Does this make sense? |
OK, I will address this some time later |
54f4cf0
to
d003de8
Compare
core/trino-main/src/main/java/io/trino/server/protocol/ProtocolUtil.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/server/protocol/ProtocolUtil.java
Show resolved
Hide resolved
d003de8
to
0fe651d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % There's a chance something may break we don't know of once we remove the scheduled
field. Unless there's a good reason why to remove it I would recommend keeping it.
598df8b
to
6dbe38e
Compare
fe73abe
to
be73172
Compare
be73172
to
28150ed
Compare
Description
Unlike pipelined execution, fault tolerant execution doesn't execute stages all at once. This means in the middle of execution, some stages will be in PLANNED state, which is seen as not scheduled. Therefore, no progress will be displayed until the very end for fault tolerant execution.
This PR adds progress bar support for fault tolerant execution.
Additional context and related issues
Fix #13072
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: