[Infra UI] : Enhanced Disk Space Support for Hosts #164151
Labels
enhancement
New value added to drive a business result
Feature:Metrics UI
Metrics UI feature
Feature:ObsHosts
Hosts feature within Observability
needs-refinement
A reason and acceptance criteria need to be defined for this issue
Team:obs-ux-infra_services
Observability Infrastructure & Services User Experience Team
🔗 Key Links
Issues
Issues/Tasks
📖 Description
We are obscuring when a disk (mount point) may be running out of memory because we are averaging the space left across all mount points on the host (and losing the fact that an individual mount point may be running out of space).
This issue is to update the Host experience to allow the SRE to easily see when a host has a mount point is close to running out of space and allow them to analyse this.
Background
A consistent use case called out by our users is the ability to 'catch' hosts before they run out of storage so they can address it before it causes the host to stop functioning correctly.
To cater for this, we show the
average(system.filesystem.used.pct)
prominently around the hosts experience:Hosts List

Overview fly-out - KPI tile

Overview fly-out metrics

Disk Space by mount point (asset detail view)
What's the problem?
When a host has multiple mount points, multiple documents are emitted - one per mount point:
This allows Elastic to understand the space available per mount point:
Space per mount point for a host

However, when we show the space available across the host - we show the
average()
which means we don't allow the user to know that a volume is close to running out of space (it is disguised by the average of all mount points):Average space per mount point for a host

Here is an example from @Danouchka:
Example on his host

💡 Solution Proposal
We swap out the current disk space usage metric
average(system.filesystem.used.pct)
for amax(system.filesystem.used.pct)
so the user can always see at a glance if a host is about to run out of space:Sample data illustrating the difference between average() and max()

Limitations
If we show
max()
instead ofaverage()
, users won't easily be able to dig into which mount point is running out of space (it will be available in the host detail view but they won't be able to find it).How might we allow them to easily debug which mount point is close to running out of space?
✔️ Acceptance criteria
What must this feature have?
1. Must Have
Must be delivered in this issue in order for the release to be valuable
2. Should Have
3. Could Have
Would be nice to have but not critical
4. Will Not Have (for now)
Explicitly will not be looked at within this issue
🚗 Use Cases
A selection of use cases to think about
📈 Telemetry Process
The text was updated successfully, but these errors were encountered: