-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add agent semantic conventions #950
base: main
Are you sure you want to change the base?
Add agent semantic conventions #950
Conversation
@open-telemetry/collector-approvers @open-telemetry/collector-contrib-approvers @open-telemetry/opamp-spec-approvers @open-telemetry/opamp-go-approvers Please review. |
Resolves open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 We need a way to record more information about agents than is currently possible using existing semantic conventions. Otel Collector in particular today uses service.name,service.instance.id,service.version attributes to report its own telemetry. These are useful but not sufficient, particularly we are missing the information about which distribution of Otel Collector it is. agent.type/agent.version/agent.id conventions are also aligned with ECS: https://www.elastic.co/guide/en/ecs/current/ecs-agent.html With introduction of this conventions the following attributes change in Otel Collector's own telemetry output: service.name -> agent.type service.version -> agent.version service.instance.id -> agent.id agent.distro will be added as one more property, the equivalent of which did not exist in the past.
c79fca9
to
084df5d
Compare
<!-- semconv registry.agent(omit_requirement_level) --> | ||
| Attribute | Type | Description | Examples | Stability | | ||
|---|---|---|---|---| | ||
| `agent.distro` | string | Agent distribution. [1] | `github.com/signalfx/splunk-otel-collector`; `github.com/aws-observability/aws-otel-collector` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to at least discuss what the value for this would be for core and contrib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use the github repo values, e.g. github.com/open-telemetry/opentelemetry-collector-contrib
for contrib. Do you think there are other, better options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with those, in terms of implementation, would we add the distro value as a field to component.BuildInfo
or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we restrict the number of examples ( columns)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with those, in terms of implementation, would we add the distro value as a field to
component.BuildInfo
or something similar?
Yes, I think so. That's where we put the short commands like "otelcol" today so we can add the corresponding distro string as another field of BuildInfo
struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we restrict the number of examples ( columns)?
@samiura I am not sure I understand. I have 2 examples for agent.distro
currently. Please clarify what you would like to restrict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not resolving to wait on @samiura's comment, but consider my comment resolved here :)
| Attribute | Type | Description | Examples | Stability | | ||
|---|---|---|---|---| | ||
| `agent.distro` | string | Agent distribution. [1] | `github.com/signalfx/splunk-otel-collector`; `github.com/aws-observability/aws-otel-collector` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
| `agent.id` | string | Unique identifier of agent instance. [2] | `627cc493-f310-47de-96bd-71410b7dec09` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference with service.instance.id
if any? Is this persistent across restarts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference with service.instance.id if any?
In the case when telemetry is being collected for the OTel Collector itself there's maybe no difference.
However, the idea of agent.*
fields in ECS is also to have the possibility attach that meta-information to telemetry data for a different, observed service.
For example, let's say the OTel collector collects metrics (or so) from an external service (let's say NGINX).
Then the telemetry data could contain:
service.name: NGINX
service.instance.id: 123
agent.type: io.opentelemetry.collector
agent.id: xyz
So with that we can differentiate the observer service from the observed service.
@tigrannajaryan not sure if this PR had also this kind of scenario in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference with service.instance.id if any?
It is the agent's instance id. Conceptually the same thing.
Is this persistent across restarts?
Preferably it should, but I don't want to mandate that in semconv, since some agents may not have persistence capability. We can add recommendations.
So with that we can differentiate the observer service from the observed service.
It was not a goal for me, but appears useful. When recording both of these attribute sets you essentially allow answering the question "who collected this telemetry?". My one concern would be that it bloats all telemetry coming from the Collector, so I am not sure if we should do it.
Can you clarify if this should also encompass additional attributes, such as additional metadata related to the collector build or the OS the collector runs on, or is that out of scope and defined elsewhere? |
@atoulme For things like OS I think we should use existing conventions, e.g. If there are other additional attributes that you would want to record for Collector we should either add them as more |
@open-telemetry/opamp-spec-approvers @open-telemetry/opamp-go-approvers please take a look and comment. |
prefix: agent | ||
type: resource | ||
brief: > | ||
An agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the risk of being verbose, could we give a sentence or two to describe how we define "agent" in this context? The docs seem to define agent as a deployment pattern, but I think these conventions are agnostic of a deployment model and use a different definition for the term.
I do see that this doesn't seem to appear anywhere in the readme output, so maybe there's no point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evan-bradley I am not sure how much do we want to narrow down the definition of the agent to be honest. We could say "data collection agent" like OpAMP does, if that's useful.
This is a draft that I am opening to start discussion about agent semantic conventions. We will need input from semconv SIG and from Collector SIG.
Resolves #396
Contributes to open-telemetry/opamp-spec#131
Alternate to #575 which did not get traction.
Changes
We need a way to record more information about agents than is currently possible using existing semantic conventions. Otel Collector in particular today uses
service.name
,service.instance.id
,service.version attributes
to report its own telemetry. These are useful but not sufficient, particularly we are missing the information about which distribution of Otel Collector it is.The proposed
agent.type
,agent.version
,agent.id
conventions are also aligned with ECS: https://www.elastic.co/guide/en/ecs/current/ecs-agent.htmlWith introduction of this conventions the following attributes change in Otel Collector's own telemetry output:
service.name
->agent.type
service.version
->agent.version
service.instance.id
->agent.id
agent.distro
is added as one more property, the equivalent of which did not exist in the past.Usage
Case 1. Own Telemetry.
The agents will use these attributes in the Resource of the telemetry they about themselves.
Case 2. OpAMP.
These will be used as identifying attributes of the agent in OpAMP.
Merge requirement checklist
I will add changelog/schema after we agree to move forward with this proposal.
[chore]