Target Allocator not working as expected with ServiceMonitor CRDs #925

jaronoff97 · 2022-06-13T15:45:31Z

With the merging of #836, I was able to get the target allocator to successfully pull in PodMonitor and ServiceMonitor CRDs. Each collector is then using their generated http_sd_config as shown below:

apiVersion: v1
data:
  collector.yaml: |
    receivers:
      prometheus:
        config:
          global:
            scrape_interval: 1m
            scrape_timeout: 10s
            evaluation_interval: 1m
          scrape_configs:
          - job_name: serviceMonitor/testapp/testapp/0
            honor_timestamps: true
            scrape_interval: 30s
            scrape_timeout: 30s
            metrics_path: /metrics
            scheme: http
            follow_redirects: true
            relabel_configs:
            ...
            http_sd_configs:
            - follow_redirects: false
              url: http://lightstep-collector-targetallocator:80/jobs/serviceMonitor/testapp/testapp/0/targets?collector_id=$POD_NAME

At first glance, this looks fine, but looking at the logs from the collector-o pod:


2022-06-13T15:36:33.208Z	error	refresh/refresh.go:98	Unable to refresh target groups	{"kind": "receiver", "name": "prometheus", "discovery": "http", "err": "server returned HTTP status 404 Not Found"}

To figure out why we the collector was getting a 404, i started up a small curl pod and ran:

> curl http://lightstep-collector-targetallocator:80/jobs/serviceMonitor/testapp/testapp/0/targets?collector_id=lightstep-collector-collector-0
404 page not found

When running a curl on the jobs endpoint, we can see that the job exists, however it has a different link:

{
...
    "serviceMonitor/testapp/testapp/0":
    {
        "_link": "/jobs/serviceMonitor%2Ftestapp%2Ftestapp%2F0/targets"
    }
}

The issue in the collector configuration seems to be that we aren't encoding the ServiceMonitor CRD name's / as %2F.

From there, I queried this link and got a correct response:

{
    "lightstep-collector-collector-0":
    {
        "_link": "/jobs/serviceMonitor%2Ftestapp%2Ftestapp%2F0/targets?collector_id=lightstep-collector-collector-0",
        "targets":
        [
            {
                "targets":
                [
                    "XXX.XXX.XXX.XXX:8080"
                ],
                "labels":
                {
                    ...
                }
            }
        ]
    }
}

Finally, I queried the _link provided to be sure that once we fix the encoding bug, things will work as expected, however I got an empty list back:

> curl http://lightstep-collector-targetallocator:80/jobs/ser
viceMonitor%2Ftestapp%2Ftestapp%2F0/targets?collector_id=lightstep-collector-col
lector-0
[]

So ultimately, it seems there are two issues:

The servicemonitor names need to be encoded in the scrape config's http_sd_config
The collector_id query parameter is broken (something should be fixed here)

The text was updated successfully, but these errors were encountered:

jaronoff97 · 2022-06-13T15:45:47Z

I'm going to be looking in to fixing these issues, but wanted them to be tracked.

pavolloffay added the area:target-allocator Issues for target-allocator label Jun 13, 2022

jaronoff97 mentioned this issue Jun 14, 2022

Fix parameter encoding issue #930

Merged

pavolloffay closed this as completed in #930 Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Target Allocator not working as expected with ServiceMonitor CRDs #925

Target Allocator not working as expected with ServiceMonitor CRDs #925

jaronoff97 commented Jun 13, 2022

jaronoff97 commented Jun 13, 2022

Target Allocator not working as expected with ServiceMonitor CRDs #925

Target Allocator not working as expected with ServiceMonitor CRDs #925

Comments

jaronoff97 commented Jun 13, 2022

jaronoff97 commented Jun 13, 2022