Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus receiver and exporter don't handle multiple targets with the same metric. #2216

Closed
dashpole opened this issue Nov 25, 2020 · 2 comments

Comments

@dashpole
Copy link
Contributor

Describe the bug
If the same metric exists in multiple prometheus receiver targets, only one of the metrics appears in the prometheus exporter endpoint.

Steps to reproduce
Run the opentelemetry collector with the configuration below in a kubernetes cluster with 2 or more nodes. Use the kubelet_running_pod_count metric to demonstrate the issue, although it applies to all metrics.

The logs of the opentelemetry collector include the metric in question for all nodes, which shows that the scraping is succeeding for all targets:

$ kubectl logs otel-collector-0 | grep kubelet_running_pod_count -A 20
     -> Name: kubelet_running_pod_count
     -> Description: [ALPHA] Number of pods currently running
...
     -> kubernetes_io_hostname: gke-cluster-2-default-pool-49009f21-0b32
...
     -> Name: kubelet_running_pod_count
     -> Description: [ALPHA] Number of pods currently running
...
     -> kubernetes_io_hostname: gke-cluster-2-default-pool-49009f21-3hae
...
     -> Name: kubelet_running_pod_count
     -> Description: [ALPHA] Number of pods currently running
...
     -> kubernetes_io_hostname: gke-cluster-2-default-pool-49009f21-91rf
...
     -> Name: kubelet_running_pod_count
     -> Description: [ALPHA] Number of pods currently running
...
     -> kubernetes_io_hostname: gke-cluster-2-default-pool-49009f21-arjo

What did you expect to see?
If I curl the prometheus endpoint, I should see one metric stream for that metric for each node in my cluster.

What did you see instead?
I only see a single metric stream.

$ kubectl get --raw /api/v1/namespaces/opentelemetry/pods/otel-collector-0:8889/proxy/metrics | grep running_pod
# HELP collector_kubelet_running_pod_count [ALPHA] Number of pods currently running
# TYPE collector_kubelet_running_pod_count gauge
collector_kubelet_running_pod_count{...kubernetes_io_hostname="gke-cluster-2-default-pool-49009f21-3hae"...} 6

What version did you use?
Version: f583f6e

What config did you use?
Config (processors omitted):

receivers:
  prometheus:
    config:
      global:
        scrape_interval: 15s
      scrape_configs:
      # This is the example configuration for nodes from https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml
      - job_name: 'kubernetes-nodes'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)

exporters:
  logging:
    logLevel: debug
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: collector
service:
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [prometheus, logging]

Environment
OS: COS from Google. 4.19.112 linux kernel
Compiler(if manually compiled): go 1.15

Additional context
The prometheus receiver calls ConsumeMetrics on the sink for each Commit() call in the prometheus receiver. In practice, this seems to occur once for each scrape target.

The prometheus exporter keeps a map[descriptor]metric (from orijtech/prometheus-go-metrics-exporter), and overwrites the metric for a descriptor each time data for that descriptor is "exported" to it. This means if multiple scrape targets omit metrics with the same descriptor (name + labels), only the target that was written last, will show up in the endpoint, as it will overwrite previous scrapes.

It seems like there are two possible solutions:

  1. Make the receiver send all scrapes in a single "batched" ConsumeMetrics call.
  2. Make the exporter keep the latest metric for each unique combination of labels, instead of just the most recent scrape.
    a. Note that we would also have to address the memory leak this would cause with a TTL or something for each time series so they aren't kept around indefinitely.
@dashpole dashpole added the bug Something isn't working label Nov 25, 2020
@dashpole dashpole changed the title Prometheus receiver and exporter do handle multiple targets with the same metric. Prometheus receiver and exporter don't handle multiple targets with the same metric. Dec 17, 2020
@gillg
Copy link
Contributor

gillg commented Apr 15, 2021

I had the same issue, it's mainly because the receiver doesn't add labels for different jobs / instances (related to #2363)
So I tried to create a relabel config to add a uniq label per job but I still have only one metric.

Like this :

        - job_name: windows-exporter
          static_configs:
            - targets: ['localhost:9182']
          relabel_configs:
          # Trick because otel collector not expose the job
          - action: replace
            replacement: windows-exporter
            target_label: job_name

Result exemple :

# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines{job_name="es-exporter"} 20
go_goroutines{job_name="windows-exporter"} 9
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{job_name="windows-exporter",version="go1.15.6"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes{job_name="es-exporter"} 3.126616e+06
go_memstats_alloc_bytes{job_name="windows-exporter"} 5.838984e+06

Be carefull for an unknown reason for now, dont use job or instance as target_label, they will be dropped. Use a custom name instaead (and relabel it at prometheus server side)

@dashpole
Copy link
Contributor Author

dashpole commented Jul 8, 2021

We now include instance and job labels, so this issue shouldn't be present anymore.

@dashpole dashpole closed this as completed Jul 8, 2021
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
…open-telemetry#2216)

* Bump github.com/itchyny/gojq from 0.12.4 to 0.12.5 in /internal/tools

Bumps [github.com/itchyny/gojq](https://github.com/itchyny/gojq) from 0.12.4 to 0.12.5.
- [Release notes](https://github.com/itchyny/gojq/releases)
- [Changelog](https://github.com/itchyny/gojq/blob/main/CHANGELOG.md)
- [Commits](itchyny/gojq@v0.12.4...v0.12.5)

---
updated-dependencies:
- dependency-name: github.com/itchyny/gojq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Auto-fix go.sum changes in dependent modules

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: MrAlias <[email protected]>
Troels51 pushed a commit to Troels51/opentelemetry-collector that referenced this issue Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants