-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task retries related Web UI improvements #12099
Conversation
With task level retries we should expect for stages to have more tasks than usual. It would be great to group task information per stage. Here is similar improvement from Presto for the reference: prestodb/presto#13158 It would also be great to group "logical tasks" together and show number of retries and CPU time spent / wasted on a "logical task". Additionally it would be great to show memory estimate / reservation / peak reservation for each task, largest task in a stage (on a stage level) and largest task in a query. |
Makes sense. Would you prefer to have layout with tasks grouped per stage always present. Or only when we have task level retries enabled? I think changing it to always groups tasks by stages would be less confusing to users - but tell me what you think. @martint opinion here. I like what @arhimondr proposes - but it is somewhat invasive. Is it ok to just do the change like that to the UI? |
Yeah, I agree. I thing there's no real reason why we shouldn't always try to group them. |
71bb414
to
b6e6b79
Compare
@arhimondr, @martint, @linzebing PTAL |
b6e6b79
to
c7727ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me so far.
Do you think it would make sense to also update the home page (with the list of queries running) to include number of failed tasks (and maybe wasted CPU time?) for each query to make it easier to identify queries that experienced a task failure?
@@ -79,6 +79,7 @@ | |||
private final boolean completeInfo; | |||
private final Optional<ResourceGroupId> resourceGroupId; | |||
private final Optional<QueryType> queryType; | |||
private final String retryPolicy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not the enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK - I think I for some reason thought that QueryInfo is in SPI and you cannot do that. WIll change.
c7727ab
to
df7bdf7
Compare
Good idea. I would go with just failed tasks. If that metrics goes up you probably want to click the query anyway and look at other, more detailed stats (also I have no clue what icon to use for wasted CPU - the icons we have so far are not very easy to comprehend already ;) ) |
1ed0bac
to
1db62fa
Compare
Description
Render task failures related info per stage in web UI
improvement
Web UI
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: