Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Receiver: failed to register service discovery metrics #32123

Closed
cpheps opened this issue Apr 2, 2024 · 5 comments · Fixed by #32202
Closed

Prometheus Receiver: failed to register service discovery metrics #32123

cpheps opened this issue Apr 2, 2024 · 5 comments · Fixed by #32202
Labels
bug Something isn't working receiver/prometheus Prometheus receiver

Comments

@cpheps
Copy link
Contributor

cpheps commented Apr 2, 2024

Component(s)

receiver/prometheus

What happened?

Description

When using the collector as an API (the process doesn't correlate to the lifetime of the collector) the Prometheus receiver returns an error when its Start function is called twice within the same process.

In our collector distribution we wrap the collector and run it as an API. We are able to change out the config and restart the collector via opamp without the process going down. When you reload a config twice that thas a prometheus receiver you hit the following error:

cannot start pipelines: failed to register service discovery metrics: failed to create service discovery metrics

The CreateAndRegisterSDMetrics method called here causes the issue as SDMetrics can only be registered once per process according to the library here.

I guess a starting question does the receiver use the Service Discovery metrics or can those be omitted?

Collector version

v0.97.0

Environment information

Environment

OS: macOS 14.4.1 (23E224)
Compiler(if manually compiled): go version go1.22.0 darwin/arm64

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@cpheps cpheps added bug Something isn't working needs triage New item requiring triage labels Apr 2, 2024
@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label Apr 2, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

dashpole commented Apr 2, 2024

We plan to add the service discovery metrics in the future.

We should probably unregister metrics during shutdown to fix this. DiscovererMetrics has an Unregister function:

https://github.com/prometheus/prometheus/blob/9b7de4778732ca2f2ab5028e9d1955109f440c4c/discovery/manager.go#L70

https://github.com/prometheus/prometheus/blob/9b7de4778732ca2f2ab5028e9d1955109f440c4c/discovery/discovery.go#L45

You can actually register the same metric twice in a process, as long as it has a constant label that is unique. We use the receiver name to do this today:

prometheus.Labels{"receiver": set.ID.String()},
. But if you create two receivers with the same name during the lifetime of the process, that will currently break

@cpheps
Copy link
Contributor Author

cpheps commented Apr 2, 2024

But if you create two receivers with the same name during the lifetime of the process, that will currently break

I think this is what we're seeing. I can take a look at unregistering service discovery metrics on shutdown.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Apr 3, 2024
@cpheps
Copy link
Contributor Author

cpheps commented Apr 4, 2024

So I'm not sure if unregistering metrics is fully possible. I did a spike where I unregistered the Service Discovery metrics then hit a snag in discovery.NewManager. It eventually ends up here where it registers new metrics. Because those metrics are never unregistered you get an error because the metrics are already registered. The error is logged and a nil manager is returned resulting in a panic.

I don't see a way to access those metrics to unregister them. It does seem like this is designed to be run once per process.

@dashpole So I don't think there's a way to reuse a register based on this. I wanted to propos adding a uuid or some random string onto the register name to guarantee it's unique every time. The downside of this is we are leaking registers over the lifetime of the process. Though this may not be any worse than using a config with receivers with entirely new names as the old ones would be lost too.

@dashpole
Copy link
Contributor

dashpole commented Apr 5, 2024

My attempt at a fix: prometheus/prometheus#13896

djaglowski pushed a commit that referenced this issue Apr 5, 2024
#32202)

**Description:**

Prometheus libraries will currently fail to initialize if they are
called more than once for the same component name. This is because it
attempts to register the same self-observability metrics, which isn't
allowed by the Prometheus client. To work around this, Unregister all
self-observability metrics during Shutdown.

This updates the prometheus dependency to an unreleased version to pull
in prometheus/prometheus#13897 and
prometheus/prometheus#13896.

**Link to tracking Issue:** <Issue number if applicable>

Fixes
#32123

**Testing:**

Unit test added.

**Documentation:**

N/A

cc @cpheps @djaglowski
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/prometheus Prometheus receiver
Projects
None yet
3 participants