add scraping for Prometheus endpoint in Kubernetes #3901

sorenmat · 2018-03-18T10:31:00Z

This will allow users to add Prometheus annotations to pods
in Kubernetes, and then have telegraf scan for them, and add them
to the list of endpoints to collect metrics from.

Required for all PRs:

Signed CLA.
Associated README.md updated.
Has appropriate unit tests.

sorenmat · 2018-03-19T10:10:19Z

While reading #272 @abraithwaite @danielnelson might be interested in this.

abraithwaite

This is so cool! I'm excited to try this. Thanks for putting in the time!

abraithwaite · 2018-03-19T15:03:41Z

plugins/inputs/prometheus/kubernetes.go

+	}
+}
+
+func scrapeURL(s *v1.Service) *string {


Why does this need to be *string and not just string ?

So we can do a nil check on the value. if nil then no prometheus.io/scrape was present or set to false

sorenmat · 2018-03-20T12:31:06Z

created a docker image with the change in sorenmat/telegraf:2 if you need to test it easily :)

lpic10 · 2018-03-20T16:14:12Z

@sorenmat I'm giving a try, but the scrape fails to resolve services running in different namespaces, as the discovery is not adding the namespace name to the url.

In another subject, shouldn't the URL for the k8s API server be configurable ?

sorenmat · 2018-03-20T16:25:03Z

I will address the first issue 😃
I think the second one should be done in another PR. There are many options/features you can add if you think about it, once the basic feature is merged people can add more features IMO

This will allow users to add Prometheus annotations in services in Kubernetes, and have telegraf scan for them and add them to the list of endpoints to collect metrics from

sorenmat · 2018-03-22T08:55:47Z

I've done a bit of redesign of this:
Scraping pods instead of services, because that's what you want. it doesn't make sense to scrape services since they are load balanced, and you then only get a sample of each pod under the service.
Add some tags to the metrics so you can aggregate them together on the metric server.

Tried to add scraping to 500 pods in our test system, didn't add much load to the telegraf process, so it seems like it scales quite well.

lpic10 · 2018-03-22T09:50:41Z

Yes, that makes sense, I thought that was the case already.

Do you have a docker image with these changes?

sorenmat · 2018-03-22T10:02:25Z

@lpic10 yes sorenmat/telegraf:4

lpic10 · 2018-03-22T10:32:01Z

Seems something is not working, I don't see anything in logs related to k8s/prometheus and no collection is done at all.

sorenmat · 2018-03-22T10:38:19Z

@lpic10 did you move the annotation from the service object to the pod object in your yaml files?

lpic10 · 2018-03-22T10:47:46Z

Ok, maybe what I was wanting for was for the k8s discovery to find the endpoints related to a service and retrieve metrics from all of them, and not directly discovering the pods.

I think prometheus k8s service discovery can do both, maybe someone with experience on it could comment.

sorenmat · 2018-03-22T10:52:07Z

@lpic10 Yeah, after googling kubernetes yaml files, it does seem like people are adding the annotation on the service, which I guess is making sense.
I will try to update it, to scan for services with the scrape annotation and the find the services pods and collect metrics form those.

sorenmat · 2018-04-17T07:06:35Z

Looking more into this, and thinking about it.
I think adding the annotations to the pods make the most sense, it is the pods that you are scraping data from, and not the service. By adding it on the pod vs service you also get the benefit of scraping pods without a service, which could be useful in many cases.

abraithwaite · 2018-04-17T15:35:21Z

I think that makes perfect sense as well.

danielnelson

Looks cool, here is my review:

Side note: these changes increase the size of the telegraf binary in Linux from 27M to 41M, (with -w -s linker flags). Note sure if this is a problem, I need to think about it.

danielnelson · 2018-04-28T00:57:05Z

plugins/inputs/prometheus/prometheus.go

+	// Should we scrape Kubernetes services for prometheus annotations
+	KubernetesScraping bool `toml:"kubernetes_scraping"`
+	lock               *sync.Mutex
+	KubernetesPods     []Target


Can this be set in the config file? If not make it unexported, move to bottom of struct too please.

danielnelson · 2018-04-28T00:58:23Z

plugins/inputs/prometheus/prometheus.go

@@ -96,6 +111,7 @@ type URLAndAddress struct {
 	OriginalURL *url.URL
 	URL         *url.URL
 	Address     string
+	Tags        map[string]string
 }


Might be a good idea to rename this struct... perhaps we merge this with the Target struct?

danielnelson · 2018-04-28T01:07:23Z

plugins/inputs/prometheus/prometheus.go

+	return nil
+}
+
+func (p *Prometheus) Stop() {}


Need to stop the kubernetes pod monitor. Easiest way to test this is by sending a SIGHUP, the config could be completely different after reload.

danielnelson · 2018-04-28T01:07:48Z

plugins/inputs/prometheus/prometheus.go

 func init() {
 	inputs.Add("prometheus", func() telegraf.Input {
-		return &Prometheus{ResponseTimeout: internal.Duration{Duration: time.Second * 3}}
+		return &Prometheus{ResponseTimeout: internal.Duration{Duration: time.Second * 3}, lock: &sync.Mutex{}}


Nitpick: can you wrap this line?

danielnelson · 2018-04-28T01:08:07Z

plugins/inputs/prometheus/prometheus.go

@@ -157,15 +187,6 @@ func (p *Prometheus) Gather(acc telegraf.Accumulator) error {
 	return nil
 }

-var tr = &http.Transport{


danielnelson · 2018-04-28T01:10:06Z

plugins/inputs/prometheus/prometheus.go

+  # prometheus.io/scrape: Enable scraping for this pod
+  # prometheus.io/path: If the metrics path is not /metrics, define it with this annotation.
+  # prometheus.io/port: If port is not 9102 use this annotation
+  # kubernetes_scraping = true


kubernetes_scraping seems too ambiguous, maybe: monitor_kubernetes_pods?

danielnelson · 2018-04-28T01:30:17Z

plugins/inputs/prometheus/prometheus.go

+	for _, pod := range p.KubernetesPods {
+		URL, err := url.Parse(pod.url)
+		if err != nil {
+			log.Printf("prometheus: Could not parse url %s, skipping it. Error: %s", pod.url, err)


I see that this is based on some existing log messages, but could you do me a favor and update them to look more like:

"E! [inputs.prometheus] could not parse URL %q: %v"`

danielnelson · 2018-04-28T01:30:55Z

plugins/inputs/prometheus/kubernetes.go

+func registerPod(pod *v1.Pod, p *Prometheus) {
+	url := scrapeURL(pod)
+	if url != nil {
+		log.Printf("Will scrape metrics from %v\n", *url)


For these log at debug level: "D! [inputs.prometheus] adding %q to scrape targets"

danielnelson · 2018-04-28T01:34:03Z

plugins/inputs/prometheus/kubernetes.go

+		defer p.lock.Unlock()
+		log.Printf("Registred a delete request for %v in namespace '%v'\n", pod.Name, pod.Namespace)
+		var result []Target
+		for _, v := range p.KubernetesPods {


I think we should use a map[string]bool for the targets

danielnelson · 2018-04-28T01:37:08Z

plugins/inputs/prometheus/prometheus.go

+			continue
+		}
+		podURL := p.AddressToURL(URL, URL.Hostname())
+		allURLs = append(allURLs, URLAndAddress{URL: podURL, Address: URL.Hostname(), OriginalURL: URL, Tags: pod.tags})


If we merge URLAndAddress with Target then we can do this logic in kubernetes.go and all we need to do here is add the items to the lists.

danielnelson · 2018-05-01T02:17:18Z

I've been thinking about this and I feel the extra binary size is too much to pay for this feature, perhaps we can write a lightweight client that does only what we need.

sorenmat · 2018-05-01T05:04:22Z

I'm not 100% sure why the binary size would matter that much. I can see that it 'feels' bad, but if it doesn't eat more memory or CPU does it really matter that much?

I do think that re-implementing a client, that continuously watches the K8s API and knows about the auth features and can parse the various JSON formats will be a huge undertaking.
I also doubt it would result in a smaller binary to be honest, but it might :)
But we would have to play catch up with the k8s API all the time.
IMHO I don't think that's worth it.

sorenmat · 2018-05-01T05:15:56Z

I just found this https://github.com/ericchiang/k8s
Seems like it might solve the problem, will try to re-implement this feature with that library to see if it helps with the size issue :)

danielnelson · 2018-05-02T02:43:58Z

Thanks @sorenmat, I know for many the binary size is not an issue but for some embedded use cases it is a growing problem.

sorenmat · 2018-05-22T07:29:44Z

@danielnelson This is ready for another review, when you have the time :)

plugins/inputs/prometheus/kubernetes.go

danielnelson · 2018-05-23T21:23:28Z

plugins/inputs/prometheus/kubernetes.go

+	in := make(chan payload)
+	go func() {
+		var pod corev1.Pod
+		watcher, err := client.Watch(context.Background(), "", &pod)


I wasn't able to find this in the library documentation, does the watch send all pods initially?

Ran a test on our cluster and it started registering all pods. So yes :)

Hi I think this patch is very useful. so i have tested it.
Existing pods are all good. But if i create a new pod, it does not work for new pod.

ok, will investigate. Thanks for the feedback

glinton · 2018-09-06T22:55:55Z

@sorenmat Do you think you'll have time soon to address the most recent feedback items?

russorat · 2018-10-29T21:15:52Z

fixing this via #4920

sorenmat force-pushed the promethus_scan branch 2 times, most recently from b9f5a0c to bf8f916 Compare March 19, 2018 10:04

abraithwaite reviewed Mar 19, 2018

View reviewed changes

danielnelson added this to the 1.7.0 milestone Mar 19, 2018

danielnelson added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Mar 19, 2018

Add scraping for Prometheus endpoint in Kubernetes

8d1a552

This will allow users to add Prometheus annotations in services in Kubernetes, and have telegraf scan for them and add them to the list of endpoints to collect metrics from

sorenmat force-pushed the promethus_scan branch from bf8f916 to 8d1a552 Compare March 22, 2018 08:52

danielnelson reviewed Apr 28, 2018

View reviewed changes

sorenmat force-pushed the promethus_scan branch 2 times, most recently from e959ef5 to b947d33 Compare May 8, 2018 12:03

fixup! Add scraping for Prometheus endpoint in Kubernetes

7cdc0e7

sorenmat force-pushed the promethus_scan branch from b947d33 to 7cdc0e7 Compare May 8, 2018 12:11

danielnelson reviewed May 23, 2018

View reviewed changes

danielnelson modified the milestones: 1.7.0, 1.8.0 Jun 3, 2018

danielnelson modified the milestones: 1.8.0, 1.9.0 Sep 7, 2018

glinton self-assigned this Sep 24, 2018

russorat added the area/k8s label Sep 27, 2018

danielnelson added the area/prometheus label Oct 8, 2018

danielnelson mentioned this pull request Oct 8, 2018

Add integration for service discovery & kv config stores (dynamic config) #272

Open

glinton mentioned this pull request Oct 24, 2018

Add scraping for Prometheus endpoint in Kubernetes #4920

Merged

russorat removed this from the 1.9.0 milestone Oct 29, 2018

danielnelson closed this in #4920 Nov 5, 2018

add scraping for Prometheus endpoint in Kubernetes #3901

add scraping for Prometheus endpoint in Kubernetes #3901

Conversation

sorenmat commented Mar 18, 2018 • edited Loading

Required for all PRs:

sorenmat commented Mar 19, 2018

abraithwaite left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorenmat commented Mar 20, 2018 • edited Loading

lpic10 commented Mar 20, 2018

sorenmat commented Mar 20, 2018

sorenmat commented Mar 22, 2018

lpic10 commented Mar 22, 2018

sorenmat commented Mar 22, 2018

lpic10 commented Mar 22, 2018

sorenmat commented Mar 22, 2018

lpic10 commented Mar 22, 2018

sorenmat commented Mar 22, 2018

sorenmat commented Apr 17, 2018

abraithwaite commented Apr 17, 2018

danielnelson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielnelson commented May 1, 2018

sorenmat commented May 1, 2018

sorenmat commented May 1, 2018

danielnelson commented May 2, 2018

sorenmat commented May 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glinton commented Sep 6, 2018

russorat commented Oct 29, 2018

sorenmat commented Mar 18, 2018 •

edited

Loading

sorenmat commented Mar 20, 2018 •

edited

Loading