Prometheus receiver and exporter don't handle multiple targets with the same metric. #2216

dashpole · 2020-11-25T23:56:58Z

Describe the bug
If the same metric exists in multiple prometheus receiver targets, only one of the metrics appears in the prometheus exporter endpoint.

Steps to reproduce
Run the opentelemetry collector with the configuration below in a kubernetes cluster with 2 or more nodes. Use the kubelet_running_pod_count metric to demonstrate the issue, although it applies to all metrics.

The logs of the opentelemetry collector include the metric in question for all nodes, which shows that the scraping is succeeding for all targets:

$ kubectl logs otel-collector-0 | grep kubelet_running_pod_count -A 20
     -> Name: kubelet_running_pod_count
     -> Description: [ALPHA] Number of pods currently running
...
     -> kubernetes_io_hostname: gke-cluster-2-default-pool-49009f21-0b32
...
     -> Name: kubelet_running_pod_count
     -> Description: [ALPHA] Number of pods currently running
...
     -> kubernetes_io_hostname: gke-cluster-2-default-pool-49009f21-3hae
...
     -> Name: kubelet_running_pod_count
     -> Description: [ALPHA] Number of pods currently running
...
     -> kubernetes_io_hostname: gke-cluster-2-default-pool-49009f21-91rf
...
     -> Name: kubelet_running_pod_count
     -> Description: [ALPHA] Number of pods currently running
...
     -> kubernetes_io_hostname: gke-cluster-2-default-pool-49009f21-arjo

What did you expect to see?
If I curl the prometheus endpoint, I should see one metric stream for that metric for each node in my cluster.

What did you see instead?
I only see a single metric stream.

$ kubectl get --raw /api/v1/namespaces/opentelemetry/pods/otel-collector-0:8889/proxy/metrics | grep running_pod
# HELP collector_kubelet_running_pod_count [ALPHA] Number of pods currently running
# TYPE collector_kubelet_running_pod_count gauge
collector_kubelet_running_pod_count{...kubernetes_io_hostname="gke-cluster-2-default-pool-49009f21-3hae"...} 6

What version did you use?
Version: f583f6e

What config did you use?
Config (processors omitted):

receivers:
  prometheus:
    config:
      global:
        scrape_interval: 15s
      scrape_configs:
      # This is the example configuration for nodes from https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml
      - job_name: 'kubernetes-nodes'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)

exporters:
  logging:
    logLevel: debug
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: collector
service:
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [prometheus, logging]

Environment
OS: COS from Google. 4.19.112 linux kernel
Compiler(if manually compiled): go 1.15

Additional context
The prometheus receiver calls ConsumeMetrics on the sink for each Commit() call in the prometheus receiver. In practice, this seems to occur once for each scrape target.

The prometheus exporter keeps a map[descriptor]metric (from orijtech/prometheus-go-metrics-exporter), and overwrites the metric for a descriptor each time data for that descriptor is "exported" to it. This means if multiple scrape targets omit metrics with the same descriptor (name + labels), only the target that was written last, will show up in the endpoint, as it will overwrite previous scrapes.

It seems like there are two possible solutions:

Make the receiver send all scrapes in a single "batched" ConsumeMetrics call.
Make the exporter keep the latest metric for each unique combination of labels, instead of just the most recent scrape.
a. Note that we would also have to address the memory leak this would cause with a TTL or something for each time series so they aren't kept around indefinitely.

The text was updated successfully, but these errors were encountered:

gillg · 2021-04-15T11:24:09Z

I had the same issue, it's mainly because the receiver doesn't add labels for different jobs / instances (related to #2363)
So I tried to create a relabel config to add a uniq label per job but I still have only one metric.

Like this :

        - job_name: windows-exporter
          static_configs:
            - targets: ['localhost:9182']
          relabel_configs:
          # Trick because otel collector not expose the job
          - action: replace
            replacement: windows-exporter
            target_label: job_name

Result exemple :

# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines{job_name="es-exporter"} 20
go_goroutines{job_name="windows-exporter"} 9
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{job_name="windows-exporter",version="go1.15.6"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes{job_name="es-exporter"} 3.126616e+06
go_memstats_alloc_bytes{job_name="windows-exporter"} 5.838984e+06

Be carefull for an unknown reason for now, dont use job or instance as target_label, they will be dropped. Use a custom name instaead (and relabel it at prometheus server side)

dashpole · 2021-07-08T14:08:26Z

We now include instance and job labels, so this issue shouldn't be present anymore.

…open-telemetry#2216) * Bump github.com/itchyny/gojq from 0.12.4 to 0.12.5 in /internal/tools Bumps [github.com/itchyny/gojq](https://github.com/itchyny/gojq) from 0.12.4 to 0.12.5. - [Release notes](https://github.com/itchyny/gojq/releases) - [Changelog](https://github.com/itchyny/gojq/blob/main/CHANGELOG.md) - [Commits](itchyny/gojq@v0.12.4...v0.12.5) --- updated-dependencies: - dependency-name: github.com/itchyny/gojq dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * Auto-fix go.sum changes in dependent modules Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: MrAlias <[email protected]>

dashpole added the bug Something isn't working label Nov 25, 2020

andrewhsu added priority:p2 Medium release:after-ga spec:metrics area:exporter area:receiver labels Dec 3, 2020

dashpole changed the title ~~Prometheus receiver and exporter do handle multiple targets with the same metric.~~ Prometheus receiver and exporter don't handle multiple targets with the same metric. Dec 17, 2020

jmacd mentioned this issue Dec 18, 2020

Metrics: Requirements for safe attribute removal open-telemetry/opentelemetry-specification#1297

Open

bogdandrutu added the area:prometheus label Mar 10, 2021

dashpole closed this as completed Jul 8, 2021

Troels51 pushed a commit to Troels51/opentelemetry-collector that referenced this issue Jul 5, 2024

[BUILD] Upgrade opentelemetry-proto to 1.0.0 (open-telemetry#2216)

60794ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus receiver and exporter don't handle multiple targets with the same metric. #2216

Prometheus receiver and exporter don't handle multiple targets with the same metric. #2216

dashpole commented Nov 25, 2020

gillg commented Apr 15, 2021 •

edited

Loading

dashpole commented Jul 8, 2021

Prometheus receiver and exporter don't handle multiple targets with the same metric. #2216

Prometheus receiver and exporter don't handle multiple targets with the same metric. #2216

Comments

dashpole commented Nov 25, 2020

gillg commented Apr 15, 2021 • edited Loading

dashpole commented Jul 8, 2021

gillg commented Apr 15, 2021 •

edited

Loading