[RFC] host metrics #2129

ruflin · 2023-01-05T14:01:51Z

This RFC adds a proposal to bring host metrics to ECS. These metrics should build the foundation to deliver a minimal set of metrics related to a host. The current list looks as following:

host.cpu.system.norm.pct
host.cpu.user.norm.pct
host.fsstats.total_size.used (in bytes)
host.fsstats.total_size.total (in bytes)
host.fsstats.total_size.used.pct
host.load.norm.1
host.load.norm.5
host.load.norm.15
host.memory.actual.used.bytes
host.memory.actual.used.pct
host.memory.total
host.network.egress.bytes
host.network.ingress.bytes

One of the main challenges around this RFC is if we should prefix with host.* or system.*. See some more details in the RFC itself. It would be great to hear opinions around it.

This RFC adds a proposal to bring host metrics to ECS. These metrics should build the foundation to deliver a minimal set of metrics related to a host. The current list looks as following: * host.cpu.system.norm.pct * host.cpu.user.norm.pct * host.fsstats.total_size.used (in bytes) * host.fsstats.total_size.total (in bytes) * host.fsstats.total_size.used.pct * host.load.norm.1 * host.load.norm.5 * host.load.norm.15 * host.memory.actual.used.bytes * host.memory.actual.used.pct * host.memory.total * host.network.egress.bytes * host.network.ingress.bytes One of the main challenges around this RFC is if we should prefix with `host.*` or `system.*`. See some more details in the RFC itself. It would be great to hear opinions around it.

ruflin · 2023-01-09T09:35:57Z

Converted from draft to review to start conversations.

ebeahan · 2023-01-13T19:54:18Z

What's the relationship between this new proposal and RFC 0005 - host metric fields?

cc @kaiyan-sheng who authored RFC 0005.

One of the main challenges around this RFC is if we should prefix with host.* or system.*. See some more details in the RFC itself. It would be great to hear opinions around it.

RFC 0005 established a small group of metric fields under host.*. With those fields already added to the spec, I'd propose continue to use host.*.

ruflin · 2023-01-16T08:09:47Z

Thanks for pointing out https://github.com/elastic/ecs/blob/main/rfcs/text/0005-host-metric-fields.md @ebeahan It definitively points in the direction of using host.* for the fields. For now I will to with the assumption we keep the host.* prefix until someone objects.

There is an overlap with RFC 5 here and it expands on it. I was aware of the network and disk fields but missed the cpu field. @kaiyan-sheng Can you comment on where these fields are used today? How does it compare to the fields proposed here?

ruflin · 2023-02-20T13:01:09Z

Can I get some reviews on this PR to get things moving?

neptunian · 2023-02-22T12:53:56Z

Can you comment on where these fields are used today? How does it compare to the fields proposed here?

I found this issue created to switch over to using some of the new ECS host fields. It looks like host.cpu.usage wasn't done. Currently it's calculated here as average of system.cpu.user.pct plus average of system.cpu.system.pct divided by system.cpu.cores. Comparing the result of this to host.cpu.usage is very close and would be exact if we were using the normalized system values instead. We should remove this calculation and switch to using host.cpu.usage.

So, from the list we can remove:

host.cpu.system.norm.pct
host.cpu.user.norm.pct
host.network.egress.bytes
host.network.ingress.bytes

neptunian · 2023-02-22T13:02:07Z

rfcs/text/0037-host-metrics.md

+
+## Concerns
+
+Currently Elastic Agent and metricbeat ship data host/system metrics under the `system.*` prefix. This would change it to `host.*`. One of the reasons for this is that some metrics for network already exist under this prefix in ECS. Another advantage is that some of these fields might use newer field types like `gauge` and `counter` delivered by TSDB in Elasticsearch which is possible without a breaking change. One of the big advantages is it needs to be figured out how to migrate to it with the existing shippers.


Currently Elastic Agent and metricbeat ship data host/system metrics under the system.* prefix. This would change it to host.*. One of the reasons for this is that some metrics for network already exist under this prefix in ECS. Another advantage is that some of these fields might use newer field types like gauge and counter delivered by TSDB in Elasticsearch which is possible without a breaking change

Should this be under "Scope of impact"?

One of the big advantages is it needs to be figured out how to migrate to it with the existing shippers.

I'm confused by this sentence. How is this a big advantage? Seems more like a concern.

The s/advantage/concern was a pretty big typo. Fixed it now.

The other part I moved under scope of impact. It somehow is a mix between both.

ebeahan

I made some comments and nits, but overall no objections to the premise of the proposal and merging the RFC as stage 0.

rfcs/text/0037-host-metrics.md

ebeahan · 2023-02-23T22:58:51Z

rfcs/text/0037/host.yml

+- name: host.fsstats.total_size.used
+  type: long
+  format: bytes
+  time_series_metric: gauge


Not related to the proposal itself, but I don't believe ECS supports a time_series_metric attribute today; we'll need to add in support for it.

Does ECS have a spec for the fields.yml file where this can be added?

There's no one source of truth for the ECS spec. I filed #2176 capturing the requirements needed to support.

Thanks for the issue. For the schema, maybe some parts can be copied over from the package-spec where the same fields.yml structure exists: https://github.com/elastic/package-spec

ebeahan · 2023-02-23T23:03:57Z

rfcs/text/0037/host.yml

+
+### CPU ###
+
+# The CPU metrics must indicate under how much load the system is. 


Would we need any guidance how these new proposed host.cpu.* fields are related to the existing host.cpu.usage?

Yes. I added this to the list of concerns.

Co-authored-by: Eric Beahan <[email protected]>

ruflin · 2023-02-24T13:38:37Z

So, from the list we can remove:

host.cpu.system.norm.pct
host.cpu.user.norm.pct
host.network.egress.bytes
host.network.ingress.bytes

@neptunian Would you remove these from the initial proposal because they are not used or do you think we should keep them as they will be useful?

neptunian · 2023-02-24T15:09:19Z

So, from the list we can remove:

host.cpu.system.norm.pct
host.cpu.user.norm.pct
host.network.egress.bytes
host.network.ingress.bytes
@neptunian Would you remove these from the initial proposal because they are not used or do you think we should keep them as they will be useful?

Maybe I'm missing something. host.network.egress.bytes and host.network.ingress.bytes are already ECS fields. For the CPU usage it looks like we could use host.cpu.usage which is already an ECS field in place of host.cpu.system.norm.pct and host.cpu.user.norm.pct.

ruflin · 2023-02-27T08:14:08Z

The egress fields I only put in for completness: https://github.com/elastic/ecs/pull/2129/files#diff-eb1c37e580b8c563a0079437420aa4214f94d05d5b683b743b34ca09b0695446R175 If it is confusing, can also remove these.

For the cpu usage, I think it is important for users to see what cpu usage of the kernel vs user space is. If we decide we don't need this, the host.cpu.usage value can be used. The part that is not clear to me, what exactly do we show for host.cpu.usage? Unfortunately the definition does not contain any details around it (which should be fixed).

neptunian · 2023-02-28T15:47:42Z

The egress fields I only put in for completness: https://github.com/elastic/ecs/pull/2129/files#diff-eb1c37e580b8c563a0079437420aa4214f94d05d5b683b743b34ca09b0695446R175 If it is confusing, can also remove these.

@ruflin I see, thanks.

For the cpu usage, I think it is important for users to see what cpu usage of the kernel vs user space is. If we decide we don't need this, the host.cpu.usage value can be used. The part that is not clear to me, what exactly do we show for host.cpu.usage? Unfortunately the definition does not contain any details around it (which should be fixed).

Sounds good to me. If it's helpful, host.cpu.usage looks to be the equivalent of system.cpu.total.pct / system.cpu.cores.

ebeahan · 2023-03-01T18:04:59Z

Will move forward merging as stage 0 with the two approvers. We'll continuing refining the details and addressing any outstanding concerns in subsequent stages.

ruflin · 2023-03-02T06:09:03Z

Thanks @ebeahan for getting this in.

ruflin self-assigned this Jan 5, 2023

ruflin added 6 commits January 5, 2023 15:20

add load metrics

eda890a

add note around windows

67ebdf0

add memory metrics

ec76080

add network metrics

c96e5ac

fix network metrics prefix

cd0309b

add additional links to RFC

1b584ad

ruflin mentioned this pull request Jan 5, 2023

Schema for metrics #474

Open

ruflin added 2 commits January 5, 2023 15:46

add PR number to rfc

f05721c

add missing fields and list of reviewers

47ae362

ruflin marked this pull request as ready for review January 9, 2023 09:35

ruflin requested a review from a team as a code owner January 9, 2023 09:35

ruflin requested a review from neptunian January 9, 2023 09:36

neptunian reviewed Feb 22, 2023

View reviewed changes

ebeahan approved these changes Feb 23, 2023

View reviewed changes

ruflin and others added 3 commits February 24, 2023 14:33

Update rfcs/text/0037-host-metrics.md

8d448ce

Co-authored-by: Eric Beahan <[email protected]>

Update rfcs/text/0037-host-metrics.md

9dcf41b

Co-authored-by: Eric Beahan <[email protected]>

Merge branch 'main' into host-metrics

9163411

ruflin added 2 commits February 24, 2023 14:42

cleanup based on reviews

2c8061a

add concern around host.cpu.usage

53de171

neptunian approved these changes Feb 28, 2023

View reviewed changes

ruflin and others added 2 commits March 1, 2023 16:45

Merge branch 'main' into host-metrics

2d8d2e8

set date for stage 0

09c5029

ebeahan merged commit d5d48c9 into elastic:main Mar 1, 2023

ruflin deleted the host-metrics branch March 2, 2023 06:08

ChrsMark mentioned this pull request Jul 20, 2023

Resource attributes for network addresses of a host open-telemetry/semantic-conventions#131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] host metrics #2129

[RFC] host metrics #2129

ruflin commented Jan 5, 2023 •

edited

Loading

ruflin commented Jan 9, 2023

ebeahan commented Jan 13, 2023

ruflin commented Jan 16, 2023

ruflin commented Feb 20, 2023

neptunian commented Feb 22, 2023

neptunian Feb 22, 2023 •

edited

Loading

ruflin Feb 24, 2023

ebeahan left a comment

ebeahan Feb 23, 2023

ruflin Feb 24, 2023

ebeahan Feb 28, 2023

ruflin Mar 1, 2023

ebeahan Feb 23, 2023

ruflin Feb 24, 2023

ruflin commented Feb 24, 2023

neptunian commented Feb 24, 2023 •

edited

Loading

ruflin commented Feb 27, 2023 •

edited

Loading

neptunian commented Feb 28, 2023

ebeahan commented Mar 1, 2023

ruflin commented Mar 2, 2023


		## Concerns

		Currently Elastic Agent and metricbeat ship data host/system metrics under the `system.` prefix. This would change it to `host.`. One of the reasons for this is that some metrics for network already exist under this prefix in ECS. Another advantage is that some of these fields might use newer field types like `gauge` and `counter` delivered by TSDB in Elasticsearch which is possible without a breaking change. One of the big advantages is it needs to be figured out how to migrate to it with the existing shippers.


		### CPU ###

		# The CPU metrics must indicate under how much load the system is.

[RFC] host metrics #2129

[RFC] host metrics #2129

Conversation

ruflin commented Jan 5, 2023 • edited Loading

ruflin commented Jan 9, 2023

ebeahan commented Jan 13, 2023

ruflin commented Jan 16, 2023

ruflin commented Feb 20, 2023

neptunian commented Feb 22, 2023

neptunian Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

ruflin Feb 24, 2023

Choose a reason for hiding this comment

ebeahan left a comment

Choose a reason for hiding this comment

ebeahan Feb 23, 2023

Choose a reason for hiding this comment

ruflin Feb 24, 2023

Choose a reason for hiding this comment

ebeahan Feb 28, 2023

Choose a reason for hiding this comment

ruflin Mar 1, 2023

Choose a reason for hiding this comment

ebeahan Feb 23, 2023

Choose a reason for hiding this comment

ruflin Feb 24, 2023

Choose a reason for hiding this comment

ruflin commented Feb 24, 2023

neptunian commented Feb 24, 2023 • edited Loading

ruflin commented Feb 27, 2023 • edited Loading

neptunian commented Feb 28, 2023

ebeahan commented Mar 1, 2023

ruflin commented Mar 2, 2023

ruflin commented Jan 5, 2023 •

edited

Loading

neptunian Feb 22, 2023 •

edited

Loading

neptunian commented Feb 24, 2023 •

edited

Loading

ruflin commented Feb 27, 2023 •

edited

Loading