Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapers - Capture host and port of monitored endpoint as resource attributes #7081

Closed
djaglowski opened this issue Jan 7, 2022 · 9 comments · Fixed by #8266
Closed

Scrapers - Capture host and port of monitored endpoint as resource attributes #7081

djaglowski opened this issue Jan 7, 2022 · 9 comments · Fixed by #8266
Assignees
Labels
comp:prometheus Prometheus related issues enhancement New feature or request spec:metrics

Comments

@djaglowski
Copy link
Member

Metric scrapers typically pull data from a known host and port. The metrics produced by a scraper would be more useful if placed in the context of the resource they describe. Therefore, scrapers should capture these values as resource attributes, when possible.

The semantic conventions define the attributes net.host.name and net.host.port here, and I believe these are appropriate for this use case.

Additional Context

Prometheus produces telemetry with basically the same information via the instance label, although this is a combined field, defined as <host>:<port> part of the target's URL that was scraped.

The prometheusreceiver splits the instance and saves the host and port as host.name and port. We likely want to use the same attribute names in both the prometheus and non-prometheus cases, but perhaps some prometheus experts can shed light on whether there is a distinction here I'm missing.

@djaglowski djaglowski added enhancement New feature or request spec:metrics labels Jan 7, 2022
@djaglowski
Copy link
Member Author

@Aneurysm9, @dashpole, as code owners of the prometheusreceiver, do you have an opinion on this?

@dashpole
Copy link
Contributor

dashpole commented Jan 7, 2022

Are the net semantic conventions only for tracing, since they are in the trace directory?

host.name and net.host.name seem identical. Are there other scraping receivers in the collector that we can follow?

@djaglowski
Copy link
Member Author

Are the net semantic conventions only for tracing, since they are in the trace directory?

This is a good point. I don't think we are necessarily beholden to them in the context of metrics. My assumption is that we should align across signal types wherever reasonable though.

host.name and net.host.name seem identical. Are there other scraping receivers in the collector that we can follow?

I'm not aware of any others.

@Aneurysm9
Copy link
Member

Resource attributes should be common to all signal types, I would think, as they describe the resource that produced the signal. I'm not sure that the host.name resource attribute is correct, since it is an address from the caller's perspective and may not relate to the hostname of the target system at all.

That said, there don't seem to be any resource attributes that are appropriate here and, to the extent that we're deriving them from information available on the scraping side and not having the resource identify itself, we probably will not have any good options.

@djaglowski
Copy link
Member Author

an address from the caller's perspective and may not relate to the hostname of the target system at all.

That's true, but I think it would often be correct. When it's not, I would think it's still the closest we can get to meaningfully identifying the resource from which the metrics was pulled. To speculate a bit - is this why prometheus uses the more generic term instance, to allow for a degree of uncertainty?

@jsuereth
Copy link
Contributor

Are the net semantic conventions only for tracing, since they are in the trace directory?

They're actually used for HTTP metrics as well.

In this instance, net.peer.ip and net.peer.port are more representative of what's going on (pulling from a remote process/server).

@dashpole
Copy link
Contributor

They're actually used for HTTP metrics as well.

Ah, perfect. Then I think those conventions should apply to the prometheus receiver.

In this instance, net.peer.ip and net.peer.port are more representative of what's going on (pulling from a remote process/server).

I would agree if this were a metric about the prometheus receiver itself, but i'm not sure it makes sense for metrics scraped from an application.

If the collector is doing resource detection on a metric produced by an application, I think attributes should be detected from the perspective of the application, rather than the perspective of the collector. The prometheus endpoint is technically the peer of the collector, but I would find it confusing to see a net.peer.ip resource attribute on a go_goroutines metric (my goroutine has a peer?). Additionally, that would mean that changing where resource detection happens (application vs collector) would change the resources detected. That doesn't impact prometheus, as prometheus format doesn't have a notion of resource labels, and thus can't do resource detection in the application, but it would matter for OTLP, for example.

@quentinmit
Copy link
Contributor

If the canonical labels are net.host and net.port, how would you annotate metrics retrieved from a Unix socket? The socket path in net.host and no port specified?

@djaglowski
Copy link
Member Author

If the collector is doing resource detection on a metric produced by an application, I think attributes should be detected from the perspective of the application, rather than the perspective of the collector.

+1. I think this supports the original proposal of using net.host.name and net.host.port?

how would you annotate metrics retrieved from a Unix socket? The socket path in net.host and no port specified?

I would be in favor of setting net.host.name from os.Hostname, and also capturing the socket path in an appropriately specific attribute. Trace semantic conventions suggest using net.peer.name for a socket path (along with setting net.transport to unix).

@dashpole dashpole added the comp:prometheus Prometheus related issues label Mar 9, 2022
@dashpole dashpole self-assigned this Mar 9, 2022
animetauren pushed a commit to animetauren/opentelemetry-collector-contrib that referenced this issue Apr 4, 2023
…y#7081)

This change simplifies the generated pdata code to not wrap orig fields in the internal package for structs that are not being used by other packages.

The code generator is adjusted to generate wrapped or unwrapped code for only for structs that need it based on the package name. The only exception is `Slice` struct that was pulled from the generator because:
- We don't have and don't expect to have any new slices that are used by other packages.
- The `Slice` struct have two additional methods `AsRaw` and `FromRaw` that are not generated and defined in a separate file which is a bit confusing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:prometheus Prometheus related issues enhancement New feature or request spec:metrics
Projects
None yet
5 participants