-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add scraping for Prometheus endpoint in Kubernetes #3901
Conversation
b9f5a0c
to
bf8f916
Compare
While reading #272 @abraithwaite @danielnelson might be interested in this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so cool! I'm excited to try this. Thanks for putting in the time!
} | ||
} | ||
|
||
func scrapeURL(s *v1.Service) *string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be *string
and not just string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can do a nil check on the value. if nil then no prometheus.io/scrape
was present or set to false
created a docker image with the change in sorenmat/telegraf:2 if you need to test it easily :) |
@sorenmat I'm giving a try, but the scrape fails to resolve services running in different namespaces, as the discovery is not adding the namespace name to the url. In another subject, shouldn't the URL for the k8s API server be configurable ? |
I will address the first issue 😃 |
This will allow users to add Prometheus annotations in services in Kubernetes, and have telegraf scan for them and add them to the list of endpoints to collect metrics from
bf8f916
to
8d1a552
Compare
I've done a bit of redesign of this: Tried to add scraping to 500 pods in our test system, didn't add much load to the telegraf process, so it seems like it scales quite well. |
Yes, that makes sense, I thought that was the case already. Do you have a docker image with these changes? |
@lpic10 yes sorenmat/telegraf:4 |
Seems something is not working, I don't see anything in logs related to k8s/prometheus and no collection is done at all. |
@lpic10 did you move the annotation from the service object to the pod object in your yaml files? |
Ok, maybe what I was wanting for was for the k8s discovery to find the endpoints related to a service and retrieve metrics from all of them, and not directly discovering the pods. I think prometheus k8s service discovery can do both, maybe someone with experience on it could comment. |
@lpic10 Yeah, after googling kubernetes yaml files, it does seem like people are adding the annotation on the service, which I guess is making sense. |
Looking more into this, and thinking about it. |
I think that makes perfect sense as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks cool, here is my review:
Side note: these changes increase the size of the telegraf binary in Linux from 27M to 41M, (with -w -s
linker flags). Note sure if this is a problem, I need to think about it.
// Should we scrape Kubernetes services for prometheus annotations | ||
KubernetesScraping bool `toml:"kubernetes_scraping"` | ||
lock *sync.Mutex | ||
KubernetesPods []Target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be set in the config file? If not make it unexported, move to bottom of struct too please.
@@ -96,6 +111,7 @@ type URLAndAddress struct { | |||
OriginalURL *url.URL | |||
URL *url.URL | |||
Address string | |||
Tags map[string]string | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a good idea to rename this struct... perhaps we merge this with the Target struct?
return nil | ||
} | ||
|
||
func (p *Prometheus) Stop() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to stop the kubernetes pod monitor. Easiest way to test this is by sending a SIGHUP, the config could be completely different after reload.
func init() { | ||
inputs.Add("prometheus", func() telegraf.Input { | ||
return &Prometheus{ResponseTimeout: internal.Duration{Duration: time.Second * 3}} | ||
return &Prometheus{ResponseTimeout: internal.Duration{Duration: time.Second * 3}, lock: &sync.Mutex{}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: can you wrap this line?
@@ -157,15 +187,6 @@ func (p *Prometheus) Gather(acc telegraf.Accumulator) error { | |||
return nil | |||
} | |||
|
|||
var tr = &http.Transport{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
# prometheus.io/scrape: Enable scraping for this pod | ||
# prometheus.io/path: If the metrics path is not /metrics, define it with this annotation. | ||
# prometheus.io/port: If port is not 9102 use this annotation | ||
# kubernetes_scraping = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubernetes_scraping
seems too ambiguous, maybe: monitor_kubernetes_pods
?
for _, pod := range p.KubernetesPods { | ||
URL, err := url.Parse(pod.url) | ||
if err != nil { | ||
log.Printf("prometheus: Could not parse url %s, skipping it. Error: %s", pod.url, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this is based on some existing log messages, but could you do me a favor and update them to look more like:
"E! [inputs.prometheus] could not parse URL %q: %v"`
func registerPod(pod *v1.Pod, p *Prometheus) { | ||
url := scrapeURL(pod) | ||
if url != nil { | ||
log.Printf("Will scrape metrics from %v\n", *url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these log at debug level: "D! [inputs.prometheus] adding %q to scrape targets
"
defer p.lock.Unlock() | ||
log.Printf("Registred a delete request for %v in namespace '%v'\n", pod.Name, pod.Namespace) | ||
var result []Target | ||
for _, v := range p.KubernetesPods { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use a map[string]bool
for the targets
continue | ||
} | ||
podURL := p.AddressToURL(URL, URL.Hostname()) | ||
allURLs = append(allURLs, URLAndAddress{URL: podURL, Address: URL.Hostname(), OriginalURL: URL, Tags: pod.tags}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we merge URLAndAddress
with Target
then we can do this logic in kubernetes.go and all we need to do here is add the items to the lists.
I've been thinking about this and I feel the extra binary size is too much to pay for this feature, perhaps we can write a lightweight client that does only what we need. |
I'm not 100% sure why the binary size would matter that much. I can see that it 'feels' bad, but if it doesn't eat more memory or CPU does it really matter that much? I do think that re-implementing a client, that continuously watches the K8s API and knows about the auth features and can parse the various JSON formats will be a huge undertaking. |
I just found this https://github.com/ericchiang/k8s |
Thanks @sorenmat, I know for many the binary size is not an issue but for some embedded use cases it is a growing problem. |
e959ef5
to
b947d33
Compare
@danielnelson This is ready for another review, when you have the time :) |
in := make(chan payload) | ||
go func() { | ||
var pod corev1.Pod | ||
watcher, err := client.Watch(context.Background(), "", &pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to find this in the library documentation, does the watch send all pods initially?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran a test on our cluster and it started registering all pods. So yes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi I think this patch is very useful. so i have tested it.
Existing pods are all good. But if i create a new pod, it does not work for new pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will investigate. Thanks for the feedback
@sorenmat Do you think you'll have time soon to address the most recent feedback items? |
fixing this via #4920 |
This will allow users to add Prometheus annotations to pods
in Kubernetes, and then have telegraf scan for them, and add them
to the list of endpoints to collect metrics from.
Required for all PRs: