Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infra UI] : Enhanced Disk Space Support for Hosts #164151

Closed
1 task
Tracked by #179044
roshan-elastic opened this issue Aug 17, 2023 · 4 comments
Closed
1 task
Tracked by #179044

[Infra UI] : Enhanced Disk Space Support for Hosts #164151

roshan-elastic opened this issue Aug 17, 2023 · 4 comments
Labels
enhancement New value added to drive a business result Feature:Metrics UI Metrics UI feature Feature:ObsHosts Hosts feature within Observability needs-refinement A reason and acceptance criteria need to be defined for this issue Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team

Comments

@roshan-elastic
Copy link

roshan-elastic commented Aug 17, 2023

🔗 Key Links

Issues

Note : This will be completed once the epic has been refined and issues created

Issues/Tasks

Preview Give feedback

📖 Description

We are obscuring when a disk (mount point) may be running out of memory because we are averaging the space left across all mount points on the host (and losing the fact that an individual mount point may be running out of space).

This issue is to update the Host experience to allow the SRE to easily see when a host has a mount point is close to running out of space and allow them to analyse this.

Background

A consistent use case called out by our users is the ability to 'catch' hosts before they run out of storage so they can address it before it causes the host to stop functioning correctly.

To cater for this, we show the average(system.filesystem.used.pct) prominently around the hosts experience:

Hosts List
image

Overview fly-out - KPI tile
image

Overview fly-out metrics
image

Disk Space by mount point (asset detail view)

What's the problem?

When a host has multiple mount points, multiple documents are emitted - one per mount point:

image

This allows Elastic to understand the space available per mount point:

Space per mount point for a host
image

However, when we show the space available across the host - we show the average() which means we don't allow the user to know that a volume is close to running out of space (it is disguised by the average of all mount points):

Average space per mount point for a host
image

Here is an example from @Danouchka:

Example on his host
Image

💡 Solution Proposal

This is a starting point for ideas, there are likely better solutions

We swap out the current disk space usage metric average(system.filesystem.used.pct) for a max(system.filesystem.used.pct) so the user can always see at a glance if a host is about to run out of space:

Sample data illustrating the difference between average() and max()
image

Limitations

If we show max() instead of average(), users won't easily be able to dig into which mount point is running out of space (it will be available in the host detail view but they won't be able to find it).

How might we allow them to easily debug which mount point is close to running out of space?

✔️ Acceptance criteria

What must this feature have?

1. Must Have

Must be delivered in this issue in order for the release to be valuable

Name Description Notes
The Host List, fly-out and detail views must allow the SRE to know if space for a mount point on a host is running low at a glance - -

2. Should Have

Name Description Notes
The SRE should be able to easily debug which mount point is running out of space - Once the SRE sees a mount point is running out of space, they will want to debug which one it is and understand why.

3. Could Have

Would be nice to have but not critical

Name Description Notes
- - -

4. Will Not Have (for now)

Explicitly will not be looked at within this issue

Name Description Notes
Predictions of when it might run out - @Danouchka (senior solutions architect) has a very powerful illustration leveraging ML which predicts when mount points are running out of space (I'll try and get a video showing this)

🚗 Use Cases

A selection of use cases to think about

  • As an SRE, I need to understand whether hosts have any mount points which could (or have) run out of space soon so I can ensure this is handled before it impacts the performance of the host.

📈 Telemetry Process

  • Telemetry requirements must be part of the acceptance criteria (above) (defined by the Epic creator, e.g. the Product Manager) during refinement.
  • See Telemetry Process for full details/process/implementation conventions.
@roshan-elastic roshan-elastic added Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services needs-refinement A reason and acceptance criteria need to be defined for this issue Feature:ObsHosts Hosts feature within Observability labels Aug 17, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@smith smith added Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team and removed Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services labels Nov 14, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@roshan-elastic roshan-elastic changed the title [Infra UI] : Disk Space Available - Handle Mount Points [Infra UI] : Enhanced Disk Space Support for Hosts Nov 14, 2023
@botelastic
Copy link

botelastic bot commented May 12, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the stale Used to mark issues that were closed for being stale label May 12, 2024
@smith smith added the enhancement New value added to drive a business result label May 18, 2024
@botelastic botelastic bot removed the stale Used to mark issues that were closed for being stale label May 18, 2024
@roshan-elastic
Copy link
Author

A lot of this has been delivered. Can create new issues for more disk work when needed

@roshan-elastic roshan-elastic closed this as not planned Won't fix, can't repro, duplicate, stale Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Metrics UI Metrics UI feature Feature:ObsHosts Hosts feature within Observability needs-refinement A reason and acceptance criteria need to be defined for this issue Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team
Projects
None yet
Development

No branches or pull requests

3 participants