Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task retries related Web UI improvements #12099

Merged
merged 6 commits into from
Apr 27, 2022

Conversation

losipiuk
Copy link
Member

@losipiuk losipiuk commented Apr 22, 2022

Description

Render task failures related info per stage in web UI

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Web UI

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Web UI
* Group information about tasks in by stage. ({issue}`12099`)
* Show aggregated statistics for failed tasks, for queries that are executed with `retry-policy` set to `TASK`. ({issue}`12099`)

@cla-bot cla-bot bot added the cla-signed label Apr 22, 2022
@losipiuk losipiuk requested review from arhimondr and linzebing April 22, 2022 11:57
@losipiuk
Copy link
Member Author

image

@findepi findepi added the ui Web UI label Apr 22, 2022
@losipiuk losipiuk changed the title Render task failures related info per stage in web UI Task retries related Web UI improvements Apr 22, 2022
@losipiuk
Copy link
Member Author

image

@arhimondr
Copy link
Contributor

With task level retries we should expect for stages to have more tasks than usual. It would be great to group task information per stage. Here is similar improvement from Presto for the reference: prestodb/presto#13158

It would also be great to group "logical tasks" together and show number of retries and CPU time spent / wasted on a "logical task".

Additionally it would be great to show memory estimate / reservation / peak reservation for each task, largest task in a stage (on a stage level) and largest task in a query.

@losipiuk
Copy link
Member Author

losipiuk commented Apr 22, 2022

With task level retries we should expect for stages to have more tasks than usual. It would be great to group task information per stage. Here is similar improvement from Presto for the reference: prestodb/presto#13158

It would also be great to group "logical tasks" together and show number of retries and CPU time spent / wasted on a "logical task".

Additionally it would be great to show memory estimate / reservation / peak reservation for each task, largest task in a stage (on a stage level) and largest task in a query.

Makes sense. Would you prefer to have layout with tasks grouped per stage always present. Or only when we have task level retries enabled? I think changing it to always groups tasks by stages would be less confusing to users - but tell me what you think.

@martint opinion here. I like what @arhimondr proposes - but it is somewhat invasive. Is it ok to just do the change like that to the UI?

@arhimondr
Copy link
Contributor

I think changing it to always groups tasks by stages would be less confusing to users - but tell me what you think.

Yeah, I agree. I thing there's no real reason why we shouldn't always try to group them.

@losipiuk losipiuk force-pushed the lo/batch-ui-improvements branch from 71bb414 to b6e6b79 Compare April 25, 2022 12:14
@losipiuk
Copy link
Member Author

image

@losipiuk
Copy link
Member Author

@arhimondr, @martint, @linzebing PTAL

@losipiuk losipiuk force-pushed the lo/batch-ui-improvements branch from b6e6b79 to c7727ab Compare April 25, 2022 12:57
Copy link
Contributor

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me so far.

Do you think it would make sense to also update the home page (with the list of queries running) to include number of failed tasks (and maybe wasted CPU time?) for each query to make it easier to identify queries that experienced a task failure?

@@ -79,6 +79,7 @@
private final boolean completeInfo;
private final Optional<ResourceGroupId> resourceGroupId;
private final Optional<QueryType> queryType;
private final String retryPolicy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not the enum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK - I think I for some reason thought that QueryInfo is in SPI and you cannot do that. WIll change.

@losipiuk losipiuk force-pushed the lo/batch-ui-improvements branch from c7727ab to df7bdf7 Compare April 26, 2022 12:03
@losipiuk
Copy link
Member Author

Do you think it would make sense to also update the home page (with the list of queries running) to include number of failed tasks (and maybe wasted CPU time?) for each query to make it easier to identify queries that experienced a task failure?

Good idea. I would go with just failed tasks. If that metrics goes up you probably want to click the query anyway and look at other, more detailed stats (also I have no clue what icon to use for wasted CPU - the icons we have so far are not very easy to comprehend already ;) )

@losipiuk
Copy link
Member Author

image

@losipiuk losipiuk requested a review from arhimondr April 26, 2022 14:22
@losipiuk losipiuk force-pushed the lo/batch-ui-improvements branch from 1ed0bac to 1db62fa Compare April 27, 2022 11:28
@losipiuk losipiuk merged commit 0c699e8 into trinodb:master Apr 27, 2022
@github-actions github-actions bot added this to the 379 milestone Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

4 participants