Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target Allocator not working as expected with ServiceMonitor CRDs #925

Closed
jaronoff97 opened this issue Jun 13, 2022 · 1 comment · Fixed by #930
Closed

Target Allocator not working as expected with ServiceMonitor CRDs #925

jaronoff97 opened this issue Jun 13, 2022 · 1 comment · Fixed by #930
Labels
area:target-allocator Issues for target-allocator

Comments

@jaronoff97
Copy link
Contributor

With the merging of #836, I was able to get the target allocator to successfully pull in PodMonitor and ServiceMonitor CRDs. Each collector is then using their generated http_sd_config as shown below:

apiVersion: v1
data:
  collector.yaml: |
    receivers:
      prometheus:
        config:
          global:
            scrape_interval: 1m
            scrape_timeout: 10s
            evaluation_interval: 1m
          scrape_configs:
          - job_name: serviceMonitor/testapp/testapp/0
            honor_timestamps: true
            scrape_interval: 30s
            scrape_timeout: 30s
            metrics_path: /metrics
            scheme: http
            follow_redirects: true
            relabel_configs:
            ...
            http_sd_configs:
            - follow_redirects: false
              url: http://lightstep-collector-targetallocator:80/jobs/serviceMonitor/testapp/testapp/0/targets?collector_id=$POD_NAME

At first glance, this looks fine, but looking at the logs from the collector-o pod:


2022-06-13T15:36:33.208Z	error	refresh/refresh.go:98	Unable to refresh target groups	{"kind": "receiver", "name": "prometheus", "discovery": "http", "err": "server returned HTTP status 404 Not Found"}

To figure out why we the collector was getting a 404, i started up a small curl pod and ran:

> curl http://lightstep-collector-targetallocator:80/jobs/serviceMonitor/testapp/testapp/0/targets?collector_id=lightstep-collector-collector-0
404 page not found

When running a curl on the jobs endpoint, we can see that the job exists, however it has a different link:

{
...
    "serviceMonitor/testapp/testapp/0":
    {
        "_link": "/jobs/serviceMonitor%2Ftestapp%2Ftestapp%2F0/targets"
    }
}

The issue in the collector configuration seems to be that we aren't encoding the ServiceMonitor CRD name's / as %2F.

From there, I queried this link and got a correct response:

{
    "lightstep-collector-collector-0":
    {
        "_link": "/jobs/serviceMonitor%2Ftestapp%2Ftestapp%2F0/targets?collector_id=lightstep-collector-collector-0",
        "targets":
        [
            {
                "targets":
                [
                    "XXX.XXX.XXX.XXX:8080"
                ],
                "labels":
                {
                    ...
                }
            }
        ]
    }
}

Finally, I queried the _link provided to be sure that once we fix the encoding bug, things will work as expected, however I got an empty list back:

> curl http://lightstep-collector-targetallocator:80/jobs/ser
viceMonitor%2Ftestapp%2Ftestapp%2F0/targets?collector_id=lightstep-collector-col
lector-0
[]

So ultimately, it seems there are two issues:

  • The servicemonitor names need to be encoded in the scrape config's http_sd_config
  • The collector_id query parameter is broken (something should be fixed here)
@jaronoff97
Copy link
Contributor Author

I'm going to be looking in to fixing these issues, but wanted them to be tracked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:target-allocator Issues for target-allocator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants