-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] host metrics #2129
[RFC] host metrics #2129
Conversation
This RFC adds a proposal to bring host metrics to ECS. These metrics should build the foundation to deliver a minimal set of metrics related to a host. The current list looks as following: * host.cpu.system.norm.pct * host.cpu.user.norm.pct * host.fsstats.total_size.used (in bytes) * host.fsstats.total_size.total (in bytes) * host.fsstats.total_size.used.pct * host.load.norm.1 * host.load.norm.5 * host.load.norm.15 * host.memory.actual.used.bytes * host.memory.actual.used.pct * host.memory.total * host.network.egress.bytes * host.network.ingress.bytes One of the main challenges around this RFC is if we should prefix with `host.*` or `system.*`. See some more details in the RFC itself. It would be great to hear opinions around it.
Converted from draft to review to start conversations. |
What's the relationship between this new proposal and RFC 0005 - host metric fields? cc @kaiyan-sheng who authored RFC 0005.
RFC 0005 established a small group of metric fields under |
Thanks for pointing out https://github.com/elastic/ecs/blob/main/rfcs/text/0005-host-metric-fields.md @ebeahan It definitively points in the direction of using There is an overlap with RFC 5 here and it expands on it. I was aware of the network and disk fields but missed the cpu field. @kaiyan-sheng Can you comment on where these fields are used today? How does it compare to the fields proposed here? |
Can I get some reviews on this PR to get things moving? |
I found this issue created to switch over to using some of the new ECS host fields. It looks like So, from the list we can remove:
|
rfcs/text/0037-host-metrics.md
Outdated
|
||
## Concerns | ||
|
||
Currently Elastic Agent and metricbeat ship data host/system metrics under the `system.*` prefix. This would change it to `host.*`. One of the reasons for this is that some metrics for network already exist under this prefix in ECS. Another advantage is that some of these fields might use newer field types like `gauge` and `counter` delivered by TSDB in Elasticsearch which is possible without a breaking change. One of the big advantages is it needs to be figured out how to migrate to it with the existing shippers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently Elastic Agent and metricbeat ship data host/system metrics under the
system.*
prefix. This would change it tohost.*
. One of the reasons for this is that some metrics for network already exist under this prefix in ECS. Another advantage is that some of these fields might use newer field types likegauge
andcounter
delivered by TSDB in Elasticsearch which is possible without a breaking change
Should this be under "Scope of impact"?
One of the big advantages is it needs to be figured out how to migrate to it with the existing shippers.
I'm confused by this sentence. How is this a big advantage? Seems more like a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The s/advantage/concern
was a pretty big typo. Fixed it now.
The other part I moved under scope of impact. It somehow is a mix between both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some comments and nits, but overall no objections to the premise of the proposal and merging the RFC as stage 0.
- name: host.fsstats.total_size.used | ||
type: long | ||
format: bytes | ||
time_series_metric: gauge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the proposal itself, but I don't believe ECS supports a time_series_metric
attribute today; we'll need to add in support for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ECS have a spec for the fields.yml file where this can be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no one source of truth for the ECS spec. I filed #2176 capturing the requirements needed to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the issue. For the schema, maybe some parts can be copied over from the package-spec where the same fields.yml structure exists: https://github.com/elastic/package-spec
|
||
### CPU ### | ||
|
||
# The CPU metrics must indicate under how much load the system is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we need any guidance how these new proposed host.cpu.*
fields are related to the existing host.cpu.usage
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I added this to the list of concerns.
Co-authored-by: Eric Beahan <[email protected]>
Co-authored-by: Eric Beahan <[email protected]>
@neptunian Would you remove these from the initial proposal because they are not used or do you think we should keep them as they will be useful? |
Maybe I'm missing something. |
The egress fields I only put in for completness: https://github.com/elastic/ecs/pull/2129/files#diff-eb1c37e580b8c563a0079437420aa4214f94d05d5b683b743b34ca09b0695446R175 If it is confusing, can also remove these. For the cpu usage, I think it is important for users to see what cpu usage of the kernel vs user space is. If we decide we don't need this, the |
@ruflin I see, thanks.
Sounds good to me. If it's helpful, |
Will move forward merging as stage 0 with the two approvers. We'll continuing refining the details and addressing any outstanding concerns in subsequent stages. |
Thanks @ebeahan for getting this in. |
This RFC adds a proposal to bring host metrics to ECS. These metrics should build the foundation to deliver a minimal set of metrics related to a host. The current list looks as following:
One of the main challenges around this RFC is if we should prefix with
host.*
orsystem.*
. See some more details in the RFC itself. It would be great to hear opinions around it.