Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add scraping for Prometheus endpoint in Kubernetes #3901

Closed
wants to merge 2 commits into from

Conversation

sorenmat
Copy link

@sorenmat sorenmat commented Mar 18, 2018

This will allow users to add Prometheus annotations to pods
in Kubernetes, and then have telegraf scan for them, and add them
to the list of endpoints to collect metrics from.

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@sorenmat sorenmat force-pushed the promethus_scan branch 2 times, most recently from b9f5a0c to bf8f916 Compare March 19, 2018 10:04
@sorenmat
Copy link
Author

While reading #272 @abraithwaite @danielnelson might be interested in this.

Copy link

@abraithwaite abraithwaite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so cool! I'm excited to try this. Thanks for putting in the time!

}
}

func scrapeURL(s *v1.Service) *string {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be *string and not just string ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can do a nil check on the value. if nil then no prometheus.io/scrape was present or set to false

@danielnelson danielnelson added this to the 1.7.0 milestone Mar 19, 2018
@danielnelson danielnelson added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Mar 19, 2018
@sorenmat
Copy link
Author

sorenmat commented Mar 20, 2018

created a docker image with the change in sorenmat/telegraf:2 if you need to test it easily :)

@lpic10
Copy link
Contributor

lpic10 commented Mar 20, 2018

@sorenmat I'm giving a try, but the scrape fails to resolve services running in different namespaces, as the discovery is not adding the namespace name to the url.

In another subject, shouldn't the URL for the k8s API server be configurable ?

@sorenmat
Copy link
Author

I will address the first issue 😃
I think the second one should be done in another PR. There are many options/features you can add if you think about it, once the basic feature is merged people can add more features IMO

This will allow users to add Prometheus annotations in services
in Kubernetes, and have telegraf scan for them and add them
to the list of endpoints to collect metrics from
@sorenmat
Copy link
Author

I've done a bit of redesign of this:
Scraping pods instead of services, because that's what you want. it doesn't make sense to scrape services since they are load balanced, and you then only get a sample of each pod under the service.
Add some tags to the metrics so you can aggregate them together on the metric server.

Tried to add scraping to 500 pods in our test system, didn't add much load to the telegraf process, so it seems like it scales quite well.

@lpic10
Copy link
Contributor

lpic10 commented Mar 22, 2018

Yes, that makes sense, I thought that was the case already.

Do you have a docker image with these changes?

@sorenmat
Copy link
Author

@lpic10 yes sorenmat/telegraf:4

@lpic10
Copy link
Contributor

lpic10 commented Mar 22, 2018

Seems something is not working, I don't see anything in logs related to k8s/prometheus and no collection is done at all.

@sorenmat
Copy link
Author

@lpic10 did you move the annotation from the service object to the pod object in your yaml files?

@lpic10
Copy link
Contributor

lpic10 commented Mar 22, 2018

Ok, maybe what I was wanting for was for the k8s discovery to find the endpoints related to a service and retrieve metrics from all of them, and not directly discovering the pods.

I think prometheus k8s service discovery can do both, maybe someone with experience on it could comment.

@sorenmat
Copy link
Author

@lpic10 Yeah, after googling kubernetes yaml files, it does seem like people are adding the annotation on the service, which I guess is making sense.
I will try to update it, to scan for services with the scrape annotation and the find the services pods and collect metrics form those.

@sorenmat
Copy link
Author

Looking more into this, and thinking about it.
I think adding the annotations to the pods make the most sense, it is the pods that you are scraping data from, and not the service. By adding it on the pod vs service you also get the benefit of scraping pods without a service, which could be useful in many cases.

@abraithwaite
Copy link

I think that makes perfect sense as well.

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cool, here is my review:

Side note: these changes increase the size of the telegraf binary in Linux from 27M to 41M, (with -w -s linker flags). Note sure if this is a problem, I need to think about it.

// Should we scrape Kubernetes services for prometheus annotations
KubernetesScraping bool `toml:"kubernetes_scraping"`
lock *sync.Mutex
KubernetesPods []Target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be set in the config file? If not make it unexported, move to bottom of struct too please.

@@ -96,6 +111,7 @@ type URLAndAddress struct {
OriginalURL *url.URL
URL *url.URL
Address string
Tags map[string]string
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to rename this struct... perhaps we merge this with the Target struct?

return nil
}

func (p *Prometheus) Stop() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to stop the kubernetes pod monitor. Easiest way to test this is by sending a SIGHUP, the config could be completely different after reload.

func init() {
inputs.Add("prometheus", func() telegraf.Input {
return &Prometheus{ResponseTimeout: internal.Duration{Duration: time.Second * 3}}
return &Prometheus{ResponseTimeout: internal.Duration{Duration: time.Second * 3}, lock: &sync.Mutex{}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: can you wrap this line?

@@ -157,15 +187,6 @@ func (p *Prometheus) Gather(acc telegraf.Accumulator) error {
return nil
}

var tr = &http.Transport{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

# prometheus.io/scrape: Enable scraping for this pod
# prometheus.io/path: If the metrics path is not /metrics, define it with this annotation.
# prometheus.io/port: If port is not 9102 use this annotation
# kubernetes_scraping = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes_scraping seems too ambiguous, maybe: monitor_kubernetes_pods?

for _, pod := range p.KubernetesPods {
URL, err := url.Parse(pod.url)
if err != nil {
log.Printf("prometheus: Could not parse url %s, skipping it. Error: %s", pod.url, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is based on some existing log messages, but could you do me a favor and update them to look more like:

"E! [inputs.prometheus] could not parse URL %q: %v"`

func registerPod(pod *v1.Pod, p *Prometheus) {
url := scrapeURL(pod)
if url != nil {
log.Printf("Will scrape metrics from %v\n", *url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these log at debug level: "D! [inputs.prometheus] adding %q to scrape targets"

defer p.lock.Unlock()
log.Printf("Registred a delete request for %v in namespace '%v'\n", pod.Name, pod.Namespace)
var result []Target
for _, v := range p.KubernetesPods {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a map[string]bool for the targets

continue
}
podURL := p.AddressToURL(URL, URL.Hostname())
allURLs = append(allURLs, URLAndAddress{URL: podURL, Address: URL.Hostname(), OriginalURL: URL, Tags: pod.tags})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we merge URLAndAddress with Target then we can do this logic in kubernetes.go and all we need to do here is add the items to the lists.

@danielnelson
Copy link
Contributor

I've been thinking about this and I feel the extra binary size is too much to pay for this feature, perhaps we can write a lightweight client that does only what we need.

@sorenmat
Copy link
Author

sorenmat commented May 1, 2018

I'm not 100% sure why the binary size would matter that much. I can see that it 'feels' bad, but if it doesn't eat more memory or CPU does it really matter that much?

I do think that re-implementing a client, that continuously watches the K8s API and knows about the auth features and can parse the various JSON formats will be a huge undertaking.
I also doubt it would result in a smaller binary to be honest, but it might :)
But we would have to play catch up with the k8s API all the time.
IMHO I don't think that's worth it.

@sorenmat
Copy link
Author

sorenmat commented May 1, 2018

I just found this https://github.com/ericchiang/k8s
Seems like it might solve the problem, will try to re-implement this feature with that library to see if it helps with the size issue :)

@danielnelson
Copy link
Contributor

Thanks @sorenmat, I know for many the binary size is not an issue but for some embedded use cases it is a growing problem.

@sorenmat sorenmat force-pushed the promethus_scan branch 2 times, most recently from e959ef5 to b947d33 Compare May 8, 2018 12:03
@sorenmat
Copy link
Author

@danielnelson This is ready for another review, when you have the time :)

plugins/inputs/prometheus/kubernetes.go Show resolved Hide resolved
plugins/inputs/prometheus/kubernetes.go Show resolved Hide resolved
plugins/inputs/prometheus/kubernetes.go Show resolved Hide resolved
plugins/inputs/prometheus/kubernetes.go Show resolved Hide resolved
plugins/inputs/prometheus/kubernetes.go Show resolved Hide resolved
plugins/inputs/prometheus/kubernetes.go Show resolved Hide resolved
plugins/inputs/prometheus/kubernetes.go Show resolved Hide resolved
plugins/inputs/prometheus/kubernetes.go Show resolved Hide resolved
plugins/inputs/prometheus/kubernetes.go Show resolved Hide resolved
in := make(chan payload)
go func() {
var pod corev1.Pod
watcher, err := client.Watch(context.Background(), "", &pod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find this in the library documentation, does the watch send all pods initially?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran a test on our cluster and it started registering all pods. So yes :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi I think this patch is very useful. so i have tested it.
Existing pods are all good. But if i create a new pod, it does not work for new pod.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will investigate. Thanks for the feedback

@danielnelson danielnelson modified the milestones: 1.7.0, 1.8.0 Jun 3, 2018
@glinton
Copy link
Contributor

glinton commented Sep 6, 2018

@sorenmat Do you think you'll have time soon to address the most recent feedback items?

@russorat
Copy link
Contributor

fixing this via #4920

@russorat russorat removed this from the 1.9.0 milestone Oct 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/k8s area/prometheus feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants