Kubernetes input plugin not working (deprecated /stats/summary endpoint?) #6959

ghost · 2020-01-31T08:32:58Z

Relevant telegraf.conf:

[[inputs.kubernetes]]
      url = "https://kubernetes.default.svc"
      bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"
      insecure_skip_verify = true

System info:

Ubuntu 18.04
k3s v1.17.2+k3s1
Telegraf image: telegraf:1.12.2

Steps to reproduce:

Configure the Kubernetes input plugin in a Telegraf container.

Expected behavior:

The plugin should colect the Kubernetes metrics.

Actual behavior:

The Telegraf plugin log shows that Kubernetes API server returned a 403 Forbiden error code. After adding to the RBAC Service Account of the pod the following rules:

rules:
  - nonResourceURLs: ["/stats", "/stats/*"]
      verbs: ["get", "list"]

the error is 404. No metrics are being collected.

Additional info:

The input plugin kube_intentory seems to be working just fine but the plugin kubernetes is not capable of obtaining any metric, as described. Looking at the code, the kubernetes input pluging calls the /stats/summary Kubernetes API server endpoint.

/stats/summary endpoint was planned to be depracated (kubernetes/kubernetes#68522) but it seems that it is already removed.

The text was updated successfully, but these errors were encountered:

danielnelson · 2020-01-31T19:49:08Z

We should put together some documentation about what needs done to switch to the replacement and anyway we can smooth the transition. I could definitely use some help from the community on this.

I am assuming similar metrics can be captured with the prometheus input plugin. It would be good to gather a listing of the new metrics because switching over will likely change all metrics and break dashboards/alerts.

It also looks like it should also be possible to use the --enable-cadvisor-endpoints flag to reenable the endpoint, it would be good to describe how this can be set as well.

masual · 2020-02-03T11:43:17Z

Hello @danielnelson , thank you for your reply. The cadvisor endpoint support will be removed in Kubernetes 1.19 (kubernetes/kubernetes#76660) so I would recommend using the --enable-cadvisor-endpoints flag only as a temporary fix. I think the way to go is to query the metrics server API (https://github.com/kubernetes-sigs/metrics-server) throught the standard Kubernetes API to obtain pod metrics.

nsteinmetz · 2020-02-22T11:40:54Z

@danielnelson for managed kubernetes, not sure you can ask to add this flag so even as a temporary fix, it won't work for many (most ?) people

@masual : so it would mean we need to deploy the metrics server first to use then this plugin ? or should we use only kube_inventory plugin ?

nsteinmetz · 2020-02-26T20:57:05Z

I could make it work with the help of @rawkode:

As endpoint, you need:

[[inputs.kubernetes]]
    url = "https://kubernetes.default.svc.cluster.local/api/v1/nodes/$NODE_NAME/proxy/"
    bearer_token = "/run/secrets/kubernetes.io/serviceaccount/token"
    insecure_skip_verify = true

be sure to have:

env:
  - name: NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName

and as ClusterRole (I use ClusterRoleAggregations):

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: influx:stats:viewer
  labels:
    rbac.authorization.k8s.io/aggregate-view-telegraf-stats: "true"
rules:
  - apiGroups: [""]
    resources: ["nodes/proxy"]
    verbs: ["get", "watch", "list"]

Tested on k8s 1.17.0 on OVH K8S Managed Service

nsteinmetz · 2020-02-26T20:58:31Z

... and available soon as an helm chart for the deployment of telegraf as a daemonset => influxdata/helm-charts#16

jmorcar · 2020-04-03T08:57:13Z

I have the same problem, I follow these recommentations, but same error:
Error:
2020-04-03T08:38:00Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found

Is there any solution or another documentation to fix the problem?

I Checked I have configured rbac permissions, this output:

Name:         telegraf-cluster-reader
Labels:       rbac.authorization.k8s.io/aggregate-view-telegraf=true
rbac.authorization.k8s.io/aggregate-view-telegraf-stats=true
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"labels":{"rbac.authorization.k8s.io/aggreg...
PolicyRule:
Resources          Non-Resource URLs  Resource Names  Verbs

deployments        []                 []              [get watch list]
nodes/proxy        []                 []              [get watch list]
nodes              []                 []              [get watch list]
persistentvolumes  []                 []              [get watch list]
pods               []                 []              [get watch list]
statefulsets       []                 []              [get watch list]
[/stats/*]         []              [get]
[/stats]           []              [get]
[/stats/*]         []              [list]
[/stats]           []              [list]
[/stats/*]         []              [watch]
[/stats]           []              [watch]`

I have this config applied in yamls:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: telegraf-reader
  namespace: default
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: telegraf-cluster-reader
  labels:
    rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
    rbac.authorization.k8s.io/aggregate-view-telegraf-stats: "true"
rules:
  - nonResourceURLs: ["/stats", "/stats/*"]
    verbs: ["get", "watch", "list"]
  - apiGroups: [""]
    resources: ["persistentvolumes", "nodes", "pods", "deployments", "statefulsets", "nodes/proxy"]
    verbs: ["get", "watch", "list"]
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: telegraf-reader-role
aggregationRule:
  clusterRoleSelectors:
    - matchLabels:
        rbac.authorization.k8s.io/aggregate-view-telegraf-stats: "true"
    - matchLabels:
        rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
    - matchLabels:
        rbac.authorization.k8s.io/aggregate-to-view: "true"
rules: [] # Rules are automatically filled in by the controller manager.
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: telegraf-reader-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: telegraf-reader-role
subjects:
  - kind: ServiceAccount
    name: telegraf-reader
    namespace: default

Mi Pod use this, + use token via secrets applied in configMap, other plugins like kube_inventory works fine with this:

    spec:
      serviceAccountName: telegraf-reader
      containers:
        - env:
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName

nsteinmetz · 2020-04-03T11:17:52Z

@jmorcar have a look at what we did for telegraf-ds chart as we get it working => https://github.com/influxdata/helm-charts/tree/master/charts/telegraf-ds

ellieayla · 2020-04-03T13:45:47Z

[[inputs.kubernetes]]
      url = "https://kubernetes.default.svc"

I think the plugin is expecting a URL to the Node's API, not the API-server's API. So the telegraph container runs on every node, in a daemonset, configured with something like url = "https://$NODEIP:10250", with the environment variable coming from the downward API.

jmorcar · 2020-04-03T15:05:27Z

I have checked right now with NODE IP variable, here HOSTIP, captured via fieldPath: status.hostIP, but is answer is Forbidden:

# curl https://$HOSTIP:10250/stats/summary --header "Authorization: Bearer $TOKEN" --insecureForbidden (user=system:serviceaccount:default:telegraf-reader, verb=get, resource=nodes, subresource=stats)

While if I use the previous command I posted, the query is permitted with data:

# curl https://kubernetes/stats/summary --header "Authorization: Bearer $TOKEN" --insecure
{
  "paths": [
    "/apis",
    "/apis/",
    "/apis/apiextensions.k8s.io",
    "/apis/apiextensions.k8s.io/v1",
    "/apis/apiextensions.k8s.io/v1beta1",
    "/healthz",
    "/healthz/etcd",
    "/healthz/log",
    "/healthz/ping",
    "/healthz/poststarthook/crd-informer-synced",
    "/healthz/poststarthook/generic-apiserver-start-informers",
    "/healthz/poststarthook/start-apiextensions-controllers",
    "/healthz/poststarthook/start-apiextensions-informers",
    "/livez",
    "/livez/etcd",
    "/livez/log",
    "/livez/ping",
    "/livez/poststarthook/crd-informer-synced",
    "/livez/poststarthook/generic-apiserver-start-informers",
    "/livez/poststarthook/start-apiextensions-controllers",
    "/livez/poststarthook/start-apiextensions-informers",
    "/metrics",
    "/openapi/v2",
    "/readyz",
    "/readyz/etcd",
    "/readyz/log",
    "/readyz/ping",
    "/readyz/poststarthook/crd-informer-synced",
    "/readyz/poststarthook/generic-apiserver-start-informers",
    "/readyz/poststarthook/start-apiextensions-controllers",
    "/readyz/poststarthook/start-apiextensions-informers",
    "/readyz/shutdown",
    "/version"
  ]

(Both queries are exec inside the container Telegraf and use the service account created in yaml definition)

For the creation the serviceaccount , telegraf-reader , I followed the guide posted by kube_inventory plugin in GitHub. I checked telegraf-reader has privilegies to query resources like /api/v1/namespaces/default/pods...for that I created ClusterRole and rolebindings.

Before that, it was when all answers of any resource query was Forbidden, but not right now, so URL should be the problem.

I checked "Kubernetes.default.svc" is same "kubernetes" short name, both are the ClusterIP for default to the Kubernetes cluster.

I will have to check source code to kubernetes input plugin for telegraf to find the exac query return a "404 not found"

jmorcar · 2020-04-03T15:08:43Z

@jmorcar have a look at what we did for telegraf-ds chart as we get it working => https://github.com/influxdata/helm-charts/tree/master/charts/telegraf-ds

I don't found the ClusterRole or role bindings definitions on template charts, so I think the deploy will have the Forbidden error. I posted a suggest to include this documentation in charts because, yaml definition calling the service account is not sufficient, if you don't created RBAC permissions before.

nsteinmetz · 2020-04-03T15:20:53Z

@jmorcar,

here is the role and rolebinding

The telegraf-ds chart works fine for me - did you try it on your cluster ?

jmorcar · 2020-04-03T17:32:32Z

Thanks! I have applied now... and same problem:

2020-04-03T17:21:20Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found
2020-04-03T17:21:30Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found
2020-04-03T17:21:40Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found
2020-04-03T17:21:50Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found
2020-04-03T17:22:00Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found

rawkode · 2020-04-03T18:38:04Z

@jmorcar if you are going through the Kubernetes API, you need the proxy endpoint.

It's usually best to go through the NODEIP from the downwardAPI.

I see mentions of that above, but I couldn't work out what problem you had with that approach.

By any chance are you on GKE? They do block access to the Kubelet this way (last time I checked)

jmorcar · 2020-04-06T14:45:35Z

Thanks at all, I found the problem, I was using a Deployment defintion, instead of Daemonset. Related problem when you change to daemonset is like commented @alanjcastonguay or @rawkode , you have to use NODEIP:10250, like this:

[[inputs.kubernetes]]
url = "https://$HOSTIP:10250"
bearer_token = "/run/secrets/kubernetes.io/serviceaccount/token"
insecure_skip_verify = true

So I have changed my yaml for the official helm chart like recommended @nsteinmetz because I had to change/add too params in my yaml. The official chart is OK, deploy in the namespace that you need and collect all metrics ok.

Conclusion:
IF you need to monitor a kubernetes cluster the better option is deploy offical helm chart telegraf-ds. This monitorize by Node inside the cluster (deploy a telegraf agent in each one via daemonset) with only one deploy definition.

https://github.com/influxdata/helm-charts/tree/master/charts/telegraf-ds

hershdhillon · 2020-09-14T22:35:55Z

Try creating a Service Account and ClusterRoleBinding for telegraf using the yaml configuration below. Mind the namespace.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: telegraf
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: metric-scanner-kubelet-api-admin
subjects:
- kind: ServiceAccount
  name: telegraf
  namespace: influxdb
roleRef:
  kind: ClusterRole
  name: system:kubelet-api-admin
  apiGroup: rbac.authorization.k8s.io

Faced similar issue, after applying the yaml telegraf was able to authenticate in the cluster to scrape the metrics.

manucloud9 · 2021-02-11T17:43:28Z

I am using telegraf-ds chart but getting below error in the pod logs.

2021-02-11T17:32:50Z W! [inputs.kubernetes] Collection took longer than expected; not complete after interval of 10s
2021-02-11T17:33:00Z W! [inputs.kubernetes] Collection took longer than expected; not complete after interval of 10s

JeongsikKang · 2021-12-08T15:47:19Z

I worked fine.

#-----------------------------------------------
# 1. ServiceAccount
#-----------------------------------------------
apiVersion: v1
kind: ServiceAccount
metadata:
  name: telegraf-ds
  labels:
    app.kubernetes.io/name: telegraf-ds

---
#-----------------------------------------------
# 2. ClusterRole
#-----------------------------------------------
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: influx-stats-viewer
  labels:
    app.kubernetes.io/name: telegraf-ds
rules:
  - apiGroups: ["metrics.k8s.io"]
    resources: ["pods"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["nodes/proxy", "nodes/stats"]
    verbs: ["get", "list", "watch"]
---
#-----------------------------------------------
# 3. ClusterRoleBinding
#-----------------------------------------------
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: influx-telegraf-viewer
  labels:
    app.kubernetes.io/name: telegraf-ds
subjects:
  - kind: ServiceAccount
    name: telegraf-ds 
    namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: influx-stats-viewer

---
#-----------------------------------------------
# 4. ConfigMap
#-----------------------------------------------
apiVersion: v1
kind: ConfigMap
metadata:
  name: telegraf-ds
  labels:
    app.kubernetes.io/name: telegraf-ds
data:
  telegraf.conf: |+

    [agent]
      collection_jitter = "0s"
      debug = false
      flush_interval = "10s"
      flush_jitter = "0s"
      hostname = "$HOSTNAME"
      interval = "10s"
      logfile = ""
      metric_batch_size = 1000
      metric_buffer_limit = 10000
      omit_hostname = false
      precision = ""
      quiet = false
      round_interval = true


    [[outputs.influxdb]]
      database = "telegraf-ds"
      insecure_skip_verify = false
      password = "blahblah"
      retention_policy = ""
      timeout = "5s"
      urls = [
        "http://xxx.xxx.xxx.xxx:8086"
      ]
      user_agent = "telegraf"
      username = "k8s"

    [[inputs.diskio]]
    [[inputs.kernel]]
    [[inputs.mem]]
    [[inputs.net]]
    [[inputs.processes]]
    [[inputs.swap]]
    [[inputs.system]]

    [[inputs.cpu]]
    percpu = true
    totalcpu = true
    collect_cpu_time = false
    report_active = false

    [[inputs.disk]]
    ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

    [[inputs.docker]]
    endpoint = "unix:///var/run/docker.sock"

    [[inputs.kubernetes]]
    url = "https://$HOSTIP:10250"
    bearer_token = "/run/secrets/kubernetes.io/serviceaccount/token"
    insecure_skip_verify = true
---
#-----------------------------------------------
# 5. DaemonSet
#-----------------------------------------------
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: telegraf-ds
  labels:
    app.kubernetes.io/name: telegraf-ds
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: telegraf-ds
  template:
    metadata:
      labels:
        app.kubernetes.io/name: telegraf-ds
    spec:
      serviceAccountName: telegraf-ds
      containers:
      - name: telegraf-ds
        image: telegraf:1.20.2
        imagePullPolicy: "IfNotPresent"
        resources:
          limits:
            cpu: 1
            memory: 2Gi
          requests:
            cpu: 0.1
            memory: 256Mi
        env:
        - name: HOSTIP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: HOSTIP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: HOST_PROC
          value: /hostfs/proc
        - name: HOST_SYS
          value: /hostfs/sys
        - name: HOST_MOUNT_PREFIX
          value: /hostfs
        volumeMounts:
        - name: varrunutmpro
          mountPath: /var/run/utmp
          readOnly: true
        - name: hostfsro
          mountPath: /hostfs
          readOnly: true
        - name: docker-socket
          mountPath: /var/run/docker.sock
        - name: config
          mountPath: /etc/telegraf
      volumes:
      - name: hostfsro
        hostPath:
          path: /
      - name: docker-socket
        hostPath:
          path: /var/run/docker.sock
      - name: varrunutmpro
        hostPath:
          path: /var/run/utmp
      - name: config
        configMap:
          name:  telegraf-ds

sspaink · 2022-01-24T22:20:53Z

Closing, from the discussion it seems this issue is resolved (there have been significant changes to the k8 input plugin and dependencies updated) and also has a viable workaround by using the official helm chart. Please re-open if this isn't the case.

danielnelson added area/k8s docs Issues related to Telegraf documentation and configuration descriptions labels Jan 31, 2020

This was referenced Apr 3, 2020

Allow setting of service account for telegraf-ds influxdata/tick-charts#105

Open

Allow service account to be set for Telegraf-DS influxdata/tick-charts#106

Closed

sspaink closed this as completed Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes input plugin not working (deprecated /stats/summary endpoint?) #6959

Kubernetes input plugin not working (deprecated /stats/summary endpoint?) #6959

ghost commented Jan 31, 2020

danielnelson commented Jan 31, 2020

masual commented Feb 3, 2020

nsteinmetz commented Feb 22, 2020

nsteinmetz commented Feb 26, 2020

nsteinmetz commented Feb 26, 2020

jmorcar commented Apr 3, 2020

nsteinmetz commented Apr 3, 2020

ellieayla commented Apr 3, 2020

jmorcar commented Apr 3, 2020

jmorcar commented Apr 3, 2020

nsteinmetz commented Apr 3, 2020

jmorcar commented Apr 3, 2020

rawkode commented Apr 3, 2020

jmorcar commented Apr 6, 2020 •

edited

Loading

hershdhillon commented Sep 14, 2020 •

edited

Loading

manucloud9 commented Feb 11, 2021

JeongsikKang commented Dec 8, 2021 •

edited

Loading

sspaink commented Jan 24, 2022

Kubernetes input plugin not working (deprecated /stats/summary endpoint?) #6959

Kubernetes input plugin not working (deprecated /stats/summary endpoint?) #6959

Comments

ghost commented Jan 31, 2020

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

danielnelson commented Jan 31, 2020

masual commented Feb 3, 2020

nsteinmetz commented Feb 22, 2020

nsteinmetz commented Feb 26, 2020

nsteinmetz commented Feb 26, 2020

jmorcar commented Apr 3, 2020

nsteinmetz commented Apr 3, 2020

ellieayla commented Apr 3, 2020

jmorcar commented Apr 3, 2020

jmorcar commented Apr 3, 2020

nsteinmetz commented Apr 3, 2020

jmorcar commented Apr 3, 2020

rawkode commented Apr 3, 2020

jmorcar commented Apr 6, 2020 • edited Loading

hershdhillon commented Sep 14, 2020 • edited Loading

manucloud9 commented Feb 11, 2021

JeongsikKang commented Dec 8, 2021 • edited Loading

sspaink commented Jan 24, 2022

jmorcar commented Apr 6, 2020 •

edited

Loading

hershdhillon commented Sep 14, 2020 •

edited

Loading

JeongsikKang commented Dec 8, 2021 •

edited

Loading