-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System metrics semantic conventions #937
Conversation
99e7891
to
87bc60c
Compare
Conventions from [OTEP 119](open-telemetry/oteps#119)
87bc60c
to
1040fc2
Compare
Is there a definition of "system" and "process" anywhere, and how it relates to the different Resources that could be present? For instance, the system metrics being output will be very different depending on what that system might be: physical, virtual, Kube Pod, Kube Container, etc. I couldn't see a way that the system metrics can be tied back into the "type" of a system it came from, but I may have missed it. Coming from a JVM background, would anything related to JVM metrics fall under the "process" metrics section? Will the process metrics be coming in a later PR? |
The "type" info should be in the Resource attached to these metrics, which should follow these resource semantic conventions (e.g. attributes for k8s and containers).
I believe all of the JVM metrics would be under the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very useful document, thank you.
**time** instruments are a special case of **usage** metrics, where the | ||
**limit** can usually be calculated as the sum of **time** over all label | ||
values. **utilization** can also be calculated and useful, for example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand what this tries to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, the sum over all state labels of system.cpu.time
(idle, user, etc.) gives system.cpu.limit
Does that make sense? Happy to remove this too if it's not very useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Co-authored-by: Tigran Najaryan <[email protected]>
Co-authored-by: Tigran Najaryan <[email protected]>
Co-authored-by: Tigran Najaryan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I especially found the general semantic conventions addition to be userful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
TIL of UCUM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just a few nits
Co-authored-by: James Bebbington <[email protected]>
Co-authored-by: Joshua MacDonald <[email protected]>
🎉 |
* System metrics semantic conventions Conventions from [OTEP 119](open-telemetry/oteps#119) * change process count to UpDownSumObserver * fix system.cpu.utilization, use better example * first several comments * add description columns, update units to UCUM * markdown-toc * clarify OS process level metrics * clarify load average exapmle * move general conventions + OTEP 108 into README.md * renamed swap -> paging * add addition fs labels * fix links * fix link * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Tigran Najaryan <[email protected]> * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Tigran Najaryan <[email protected]> * Apply suggestions from code review Co-authored-by: Tigran Najaryan <[email protected]> * fix tigran comments * add disk io_time and operation_time * add descriptions/footnotes for dropped packets and net errors * lint, more info for net dropped packets/errors * "dropped_packets" -> "dropped" * Apply suggestions from James' code review Co-authored-by: James Bebbington <[email protected]> * comments from James' code review * clarify windows perf counter * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Joshua MacDonald <[email protected]> * reflow text Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: James Bebbington <[email protected]> Co-authored-by: Joshua MacDonald <[email protected]>
Fixes #818
Changes
Adds system metric conventions/instruments from open-telemetry/oteps#119 to the spec. This PR does not include process level metrics (just a placeholder and TODO), which I can do in a separate PR. In addition to what is in open-telemetry/oteps#119:
system.process.count
(requested here).UpDownSumObserver
toValueObserver
(discussion in Writing system metrics conventions into the specification #818)Related issues #818
Related oteps open-telemetry/oteps#119