Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entities semantic conventions: host #1752

Open
dmitryax opened this issue Jan 16, 2025 · 1 comment
Open

Entities semantic conventions: host #1752

dmitryax opened this issue Jan 16, 2025 · 1 comment
Labels
enhancement New feature or request experts needed This issue or pull request is outside an area where general approvers feel they can approve triage:needs-triage

Comments

@dmitryax
Copy link
Member

dmitryax commented Jan 16, 2025

Is your change request related to a problem? Please describe.

Entities SIG has reached the point when we need to start defining the semantic conventions for entities. Given that the tooling isn't ready for that, we could start the discussion in the issues. This one is intended to define attributes for the host entity.

The most important goal is to come up with a combination of the identifying attributes that must be sufficient to uniquely identify a host within the parent entity.

Describe the solution you'd like

Definition: A host is defined as a computing instance. For example, physical servers, virtual machines, switches or disk array.

Parent entity: data center (?)

Identifying attributes:

Attribute Type Description Examples
host.id string Unique host ID taken from machine-id [1] fdbf79e8af94cb7f9e8df36789187052

Non-identifying attributes:
All other attributes that already defined in https://github.com/open-telemetry/semantic-conventions/blob/8d9d4a1ce84b060e4d13b5f44b3495648767deb5/docs/attributes-registry/host.md

Notes:

[1] The current definition of host.id attribute prioritizes the value assigned by a cloud provide to a corresponding VM, if applicable. However, this introduces the problem of sharing one attribute by different entities (a host and a cloud VM). This problem has to be resolved before we can define conventions for the host entity type. We might need to change the recommendation for the value source and keep host.id only to the value taken from the machine-id.

@dmitryax dmitryax added enhancement New feature or request experts needed This issue or pull request is outside an area where general approvers feel they can approve triage:needs-triage labels Jan 16, 2025
@christos68k
Copy link
Member

christos68k commented Jan 16, 2025

For some more context regarding machine-id see: #581. The Elastic Universal Profiling product comes with deployment instructions that map machine-id inside the container, and host.id is populated from it (albeit not verbatim, see hashing note below). This enables stable correlation across thousands of deployed agents that would otherwise not be possible. Enabling this volume mount is not uncommon in production and even Docker on macOS supports it (ie. since there's no /etc/machine-id on the macOS host, Docker virtualizes it and makes it available inside the container).

Regarding using machine-id, we should hash the value and NOT use it verbatim, as according to machine-id(5):

It should be considered "confidential", and must not be exposed in untrusted environments, in
 particular on the network. If a stable unique identifier that is tied to the machine is needed
 for some application, the machine ID or any part of it must not be used directly. Instead the
 machine ID should be hashed with a cryptographic, keyed hash function, using a fixed,
 application-specific key.

We could use UUIDv5 as there's precedent in #312.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request experts needed This issue or pull request is outside an area where general approvers feel they can approve triage:needs-triage
Projects
Status: Todo
Development

No branches or pull requests

2 participants