[APM] Improve UI of list of services page and match with latest service map design #262

alex-fedotyev · 2020-04-30T15:07:26Z

Summary of the problem (If there are multiple problems or use cases, prioritize them)
(7/15 - updating mock up to a better quality + adding health chart column)

APM Home page currently shows the list of all services.
As a user, I want to quickly get to the service which is having a problem or based on another interest.

Proposal is to improve the overall UI of the page and match the service map latest design as well as introduce sparklines to the page:

Remove "Agent" column which lists technology type (like java or node.js)
Add icon similar to those used on the service map to visually show technology (based on agent type)
Add service health indicators (similar to [APM] Service maps health indicators: Indicate alert-based health status on Service nodes kibana#64144)
Add service health spark chart to include timeline of alerts + ML anomalies for each service
Add sparklines to each metric in order to show the trend of the metric in the selected timeframe

User stories
As a user, I want to visually identify when any of services are experiencing outages like higher response time, higher error rates and/or significant drop in requests volume.

List known (technical) restrictions and requirements

If in doubt, don’t hesitate to reach out to the #observability-design Slack channel.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-04-30T15:07:28Z

Pinging @elastic/observability-design (design)

alex-fedotyev · 2020-07-16T00:20:38Z

Added a better quality mock up, plus added one more column to display health as spark chart.
Health chart would be a timeline of alerts + ML anomalies raised for each service during selected time frame.

Open question:

Currently on the service map service health is displayed as a color indicator which is calculated based on the worst state across selected time range.
Would it be proper to match that and show health state as color for each service together with health timeline spark chart, of showing trend would be enough?

formgeist · 2020-08-05T12:59:04Z

Here's a first initial draft of a hi-res mock of the proposed updates.

A few comments to the design;

I opted to display the health as a status that fall into the same severity level statuses that we have from ML and what we should possibly also show in the service map (as those are the levels we're basing our current statuses on). Not sure if a health trend is feasible at this time, but happy to make an example of what that trend chart will look like as well.
The sparklines and example values are not correct, obviously, but this is more just to get the layout and overall elements in place.

Thoughts on this?

graphaelli · 2020-08-05T13:38:43Z

I really like recovering the space used for the agent name and the addition of health - did you consider health in the first column? Not advocating for that but interested in thoughts on it.

Are the metrics next to the sparklines the latest value or the average over the period shown? Is there any interaction with the sparkline, like hover?

alex-fedotyev · 2020-08-05T16:57:59Z

Looks great already!

Regarding health indicators:

Using same health indicators as on the service map looks awesome!
I think going forward we should consider showing health trend sparkchart to make the UX more useful for multi-hour or multi-day time ranges. Quickly seeing whether services were "red" all time, or only last 15 minutes is very informative. Additionally showing health trend on a service map inside a popup would make it aligned. @formgeist - could you put together a mock how that could look?
@graphaelli My initial thought was to move health indicators to the left as well. But now when I think that health indicators would include health trend sparkchart, it feels that it is better to keep it together with other metrics and sparklines.

Metrics:

Good question about showing last value vs average value. I think that showing averages is more useful as it enables better column sorting logic, from workflow perspective user goals are align time range selection to the time when there were production issues, and then to select slowest or most erroneous services.
Interactions could be rather simple, like hover shows a numeric value at that time. I imagine that it would be cool to mouse sync across all charts, but not sure that this is reasonable with sparklines. Have you encountered any interesting designs of interactions with sparklines?

alex-fedotyev · 2020-08-05T19:12:31Z

Suggestion - would it be better to show total # of calls vs TPM?

This though is following my comment above on showing avg duration and avg % of errors, instead of last value.
The goal of sorting by "Calls" is to show which services processed more requests in selected time frame, but using calls/min has a risk of showing <1 values for less loaded services or staging environments, while total # of calls will always be a true number.

nehaduggal · 2020-08-05T23:42:31Z

I love the design!

I second Gil's feedback of moving the chart indicator to the left. Probably if we add a spark chart to track the health it might make more sense to have a column. For maps we were planning on adding alert based indicators too - how would that be represented on this health chart? Would the sparkline chart track both? Or would we add another column?

The agent types on the services page are aesthetically pleasing and aligned with the service maps view but from a user perspective I am not sure if they are actually helpful. By moving it to the left we are definitely claiming some pixels back and improving the look of this page but I was wondering if users would lose much if we took those off completely?

eyalkoren · 2020-08-06T05:45:04Z

This looks really good!

Regarding latest measurement vs. average (or max or whatever) - maybe it makes sense to assume two different workflows: one is as @alex-fedotyev described- start an issue investigation based on a time range, where average/max makes sense as it allows you to sort and drill down.
The second workflow is using this view (or service map, or metrics for this matter) as a dashboard constantly presenting the updated state/health of the environment. I think that using auto-refresh is a good indication of that, so maybe we can use it - if auto-refresh is enabled, show latest measurement, otherwise show aggregated value.

Regarding agent icons - I think they are both nice visually and useful.

formgeist · 2020-08-12T10:48:06Z

Design update, 12 Aug 2020

First and foremost, thank you for all the positive and constructive feedback I've received on the first initial mock. I think a lot of the same things were echoed, but there were possibly also some conflicting ideas. I've tried to distill most of them into changes that I've made to the updated mocks.

In the updated design, I've made the following changes;

Moved the health status to the first column in the table.
Showing an example of non-Elastic agent e.g. Jaeger, which will use the default service type icon. (Specifically for Jaeger, we could choose to include the Jaeger icon, because it's officially supported).
Changed the health status indication to a badge. The current EuiHealth indicator is typically a dot and label, but I find the three statuses harder to distinguish, so that's why we're pursuing changing it to a EuiBadge. We'll propose the EUI team that we might reconsider our current EuiHealth component design and see this as an enhancement.
Combined the agent and name columns so Agent name is no longer a separate column (which thereby cannot be sorted either). We weren't sure how often a user would sort the table by agent name, and additional they have the option to filter by the agent name using the UI filters available on the left.
The sparkline charts next to each metric have been converted to a line rather than an area. This was due to reduction of noise when viewing the charts especially in tables with a lot of rows per page.
Additionally, I suggest that we won't have any interactions for the sparklines at this time. Both in order to reduce scope and because the user is able to go to each individual service landing page to get a better view of them.

Open questions which haven't been addressed yet;

What column should we sort by default? Health? Alphabetically until the users choose to sort by either of the columns available?
What happens if all services don't have ML anomaly detection? Do we simply hide the health status if all services report unknown and show a callout indicating that ML anomaly detection would be able to give this indication in the table?
Calls per min. vs. total calls as suggested by Alex

I ❤️ all the ideas provided in the comments above, but I'm also aware of the time constraint we currently have with completing this design to allow the devs to pick it up and get cracking on it. I'm sure there'll be more feedback once we have a working version of the current proposal.

alex-fedotyev · 2020-08-12T20:16:04Z

Re: Sorting

I think that sorting by load (requests) descending would be a good default option.
In my experience, I often found the most important services amongst top 10-20 services by load, and conversely response time or error rate could mean different for different services and may not bring the most important ones to the first page (depending on the service role, 50ms could be too slow and 3 seconds is just fine!).

graphaelli · 2020-08-12T20:33:17Z

I buy the reasoning @alex-fedotyev.

What happens if all services don't have ML anomaly detection? Do we simply hide the health status if all services report unknown and show a callout indicating that ML anomaly detection would be able to give this indication in the table?

That sounds a lot better than an unknown in every row.

Calls per min. vs. total calls as suggested by Alex

I think a rate is preferable to the total. I like per min since that's what we show now.

formgeist · 2020-08-19T13:39:53Z

Based on the feedback received from the last update, I've made some minor changes and extended mocks to show different states.

Example of the health indication not available/all unknown, but without the callout as proposed

Example of a callout enticing the user to add anomaly detection to get health indication on their Services list

nehaduggal · 2020-08-19T14:23:28Z

The sorting should be two-fold. First arranged by health status to see the red services on top and then arranged by load. In case a user doesn't have ML enabled then it should default to load.

formgeist · 2020-08-19T20:51:27Z

Moving the design to implementation to be tracked in elastic/kibana#75252

alex-fedotyev added the design label Apr 30, 2020

alex-fedotyev changed the title ~~Improve UI of list of services page and match with latest service map~~ Improve UI for list of services page and match with latest service map Apr 30, 2020

alex-fedotyev changed the title ~~Improve UI for list of services page and match with latest service map~~ Improve UI of list of services page and match with latest service map design Apr 30, 2020

formgeist added [zube]: Inbox Team:apm labels May 4, 2020

felixbarny mentioned this issue May 19, 2020

APM Services landing experience #268

Closed

katrin-freihofner added [zube]: Ready and removed [zube]: Inbox labels Jun 8, 2020

katrin-freihofner assigned katrin-freihofner and formgeist and unassigned katrin-freihofner Jun 8, 2020

formgeist added [zube]: In Progress and removed [zube]: Ready labels Aug 5, 2020

formgeist added [zube]: In Review and removed [zube]: In Progress labels Aug 12, 2020

dgieselaar mentioned this issue Aug 18, 2020

[APM] Service inventory redesign: Add health status and sparkline trends for each metric on the Services list page elastic/kibana#75252

Closed

formgeist changed the title ~~Improve UI of list of services page and match with latest service map design~~ [APM] Improve UI of list of services page and match with latest service map design Aug 19, 2020

formgeist closed this as completed Aug 19, 2020

zube bot added [zube]: Done and removed [zube]: In Review labels Aug 19, 2020

katrin-freihofner removed the [zube]: Done label Oct 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[APM] Improve UI of list of services page and match with latest service map design #262

[APM] Improve UI of list of services page and match with latest service map design #262

alex-fedotyev commented Apr 30, 2020 •

edited

Loading

elasticmachine commented Apr 30, 2020

alex-fedotyev commented Jul 16, 2020 •

edited

Loading

formgeist commented Aug 5, 2020

graphaelli commented Aug 5, 2020

alex-fedotyev commented Aug 5, 2020 •

edited

Loading

alex-fedotyev commented Aug 5, 2020

nehaduggal commented Aug 5, 2020

eyalkoren commented Aug 6, 2020 •

edited

Loading

formgeist commented Aug 12, 2020

alex-fedotyev commented Aug 12, 2020

graphaelli commented Aug 12, 2020

formgeist commented Aug 19, 2020

nehaduggal commented Aug 19, 2020

formgeist commented Aug 19, 2020

[APM] Improve UI of list of services page and match with latest service map design #262

[APM] Improve UI of list of services page and match with latest service map design #262

Comments

alex-fedotyev commented Apr 30, 2020 • edited Loading

elasticmachine commented Apr 30, 2020

alex-fedotyev commented Jul 16, 2020 • edited Loading

formgeist commented Aug 5, 2020

graphaelli commented Aug 5, 2020

alex-fedotyev commented Aug 5, 2020 • edited Loading

alex-fedotyev commented Aug 5, 2020

nehaduggal commented Aug 5, 2020

eyalkoren commented Aug 6, 2020 • edited Loading

formgeist commented Aug 12, 2020

alex-fedotyev commented Aug 12, 2020

graphaelli commented Aug 12, 2020

formgeist commented Aug 19, 2020

nehaduggal commented Aug 19, 2020

formgeist commented Aug 19, 2020

alex-fedotyev commented Apr 30, 2020 •

edited

Loading

alex-fedotyev commented Jul 16, 2020 •

edited

Loading

alex-fedotyev commented Aug 5, 2020 •

edited

Loading

eyalkoren commented Aug 6, 2020 •

edited

Loading