Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerts failing out of the box: K8SControllerManagerDown and K8SSchedulerDown (on Kargo) #61

Closed
rutsky opened this issue Mar 10, 2017 · 9 comments

Comments

@rutsky
Copy link
Contributor

rutsky commented Mar 10, 2017

I see similar behavior to #23: after deploying prometheus operator on Kubernetes cluster deployed via Kargo K8SControllerManagerDown and K8SSchedulerDown alerts are firing:

2017-03-10_13 59 09_1

Here is my targets:

2017-03-10_13 59 43_1

Prometheus configuration:

alerting:
  alertmanagers:
  - kubernetes_sd_configs:
    - role: endpoints
    relabel_configs:
    - action: keep
      regex: alertmanager-main
      source_labels:
      - __meta_kubernetes_service_name
    - action: keep
      regex: monitoring
      source_labels:
      - __meta_kubernetes_namespace
    - action: keep
      regex: web
      source_labels:
      - __meta_kubernetes_endpoint_port_name
    scheme: http

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
- /etc/prometheus/rules/*.rules

scrape_configs:
- job_name: kubelets
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    # Skip verification until we have resolved why the certificate validation
    # for the kubelet on API server nodes fail.
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - role: node

# Scrapes the endpoint lists for the Kubernetes API server, kube-state-metrics,
# and node-exporter, which we all consider part of a default setup.
- job_name: standard-endpoints
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    # As for kubelets, certificate validation fails for the API server (node)
    # and we circumvent it for now.
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - role: endpoints

  relabel_configs:
  - action: keep
    source_labels: [__meta_kubernetes_service_name]
    regex: prometheus|node-exporter|kube-state-metrics
  - action: replace
    source_labels: [__meta_kubernetes_service_name]
    target_label: job

# Scrapes the endpoint lists for the kube-dns server. Which we consider
# part of a default setup.
- job_name: kube-components
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - role: endpoints

  relabel_configs:
  - action: replace
    source_labels: [__meta_kubernetes_service_label_k8s_app]
    target_label: job
  - action: keep
    source_labels: [__meta_kubernetes_service_name]
    regex: ".*-prometheus-discovery"
  - action: keep
    source_labels: [__meta_kubernetes_endpoint_port_name]
    regex: "http-metrics.*|https-metrics.*"
  - action: replace
    source_labels: [__meta_kubernetes_endpoint_port_name]
    regex: "https-metrics.*"
    target_label: __scheme__
    replacement: https

Labels:

$ kubectl -n kube-system get ep --show-labels
NAME                        ENDPOINTS                                                  AGE       LABELS
default-http-backend        10.233.85.8:8080                                           17d       k8s-app=default-http-backend
dnsmasq                     10.233.72.2:53,10.233.77.2:53,10.233.83.2:53 + 5 more...   17d       k8s-app=dnsmasq,kubernetes.io/cluster-service=true
elasticsearch-logging       10.233.77.7:9200,10.233.85.4:9200                          17d       k8s-app=elasticsearch-logging,kubernetes.io/cluster-service=true,kubernetes.io/name=Elasticsearch
heapster                    10.233.77.5:8082                                           17d       kubernetes.io/name=Heapster,task=monitoring
ingress-controller-leader   <none>                                                     17d       <none>
kibana-logging              10.233.85.11:5601                                          17d       k8s-app=kibana-logging,kubernetes.io/cluster-service=true,kubernetes.io/name=Kibana
kube-controller-manager     <none>                                                     17d       <none>
kube-scheduler              <none>                                                     17d       <none>
kubedns                     10.233.85.3:53,10.233.85.3:53                              17d       k8s-app=kubedns,kubernetes.io/cluster-service=true,kubernetes.io/name=kubedns
kubernetes-dashboard        10.233.85.6:9090                                           17d       app=kubernetes-dashboard
monitoring-grafana          10.233.72.6:3000                                           17d       kubernetes.io/name=monitoring-grafana
monitoring-influxdb         10.233.72.4:8086                                           17d       kubernetes.io/name=monitoring-influxdb,task=monitoring
$ kubectl -n kube-system get svc --show-labels
NAME                    CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE       LABELS
default-http-backend    10.233.50.202   <none>        80/TCP          17d       k8s-app=default-http-backend
dnsmasq                 10.233.0.2      <none>        53/TCP,53/UDP   17d       k8s-app=dnsmasq,kubernetes.io/cluster-service=true
elasticsearch-logging   10.233.17.19    <none>        9200/TCP        17d       k8s-app=elasticsearch-logging,kubernetes.io/cluster-service=true,kubernetes.io/name=Elasticsearch
heapster                10.233.20.165   <none>        80/TCP          17d       kubernetes.io/name=Heapster,task=monitoring
kibana-logging          10.233.7.135    <none>        5601/TCP        17d       k8s-app=kibana-logging,kubernetes.io/cluster-service=true,kubernetes.io/name=Kibana
kubedns                 10.233.0.3      <none>        53/UDP,53/TCP   17d       k8s-app=kubedns,kubernetes.io/cluster-service=true,kubernetes.io/name=kubedns
kubernetes-dashboard    10.233.27.199   <nodes>       80:31167/TCP    17d       app=kubernetes-dashboard
monitoring-grafana      10.233.38.226   <none>        80/TCP          17d       kubernetes.io/name=monitoring-grafana
monitoring-influxdb     10.233.27.33    <none>        8086/TCP        17d       kubernetes.io/name=monitoring-influxdb,task=monitoring
kubectl -n kube-system get pods --show-labels
NAME                                    READY     STATUS    RESTARTS   AGE       LABELS
default-http-backend-2657704409-f958m   1/1       Running   0          8d        k8s-app=default-http-backend,pod-template-hash=2657704409
dnsmasq-dfnq5                           1/1       Running   0          17d       k8s-app=dnsmasq
dnsmasq-f6gjj                           1/1       Running   0          17d       k8s-app=dnsmasq
dnsmasq-h0x93                           1/1       Running   0          17d       k8s-app=dnsmasq
dnsmasq-h8f72                           1/1       Running   0          17d       k8s-app=dnsmasq
elasticsearch-logging-v1-1zm9f          1/1       Running   2          8d        k8s-app=elasticsearch-logging,kubernetes.io/cluster-service=true,version=v2.4.1
elasticsearch-logging-v1-62t37          1/1       Running   0          17d       k8s-app=elasticsearch-logging,kubernetes.io/cluster-service=true,version=v2.4.1
flannel-wetta-kuber01                   1/1       Running   1          17d       app=flannel,version=v0.1
flannel-wetta-kuber02                   1/1       Running   1          17d       app=flannel,version=v0.1
flannel-wetta-kuber03                   1/1       Running   1          8d        app=flannel,version=v0.1
flannel-wetta-noaaweather               1/1       Running   1          17d       app=flannel,version=v0.1
fluentd-es-v1.22-5rcp8                  1/1       Running   0          17d       k8s-app=fluentd-es,kubernetes.io/cluster-service=true,version=v1.22
fluentd-es-v1.22-6pt68                  1/1       Running   0          17d       k8s-app=fluentd-es,kubernetes.io/cluster-service=true,version=v1.22
fluentd-es-v1.22-rp33r                  1/1       Running   0          17d       k8s-app=fluentd-es,kubernetes.io/cluster-service=true,version=v1.22
fluentd-es-v1.22-s3v4s                  1/1       Running   0          17d       k8s-app=fluentd-es,kubernetes.io/cluster-service=true,version=v1.22
heapster-564189836-szfbx                1/1       Running   0          15d       k8s-app=heapster,pod-template-hash=564189836,task=monitoring
kibana-logging-3982467769-hwbm0         1/1       Running   0          20h       k8s-app=kibana-logging,pod-template-hash=3982467769
kube-apiserver-wetta-kuber01            1/1       Running   0          17d       k8s-app=kube-apiserver,kargo=v2
kube-apiserver-wetta-kuber02            1/1       Running   0          17d       k8s-app=kube-apiserver,kargo=v2
kube-controller-manager-wetta-kuber01   1/1       Running   1          17d       k8s-app=kube-controller
kube-controller-manager-wetta-kuber02   1/1       Running   0          17d       k8s-app=kube-controller
kube-proxy-wetta-kuber01                1/1       Running   1          17d       k8s-app=kube-proxy
kube-proxy-wetta-kuber02                1/1       Running   1          17d       k8s-app=kube-proxy
kube-proxy-wetta-kuber03                1/1       Running   1          8d        k8s-app=kube-proxy
kube-proxy-wetta-noaaweather            1/1       Running   1          17d       k8s-app=kube-proxy
kube-scheduler-wetta-kuber01            1/1       Running   0          17d       k8s-app=kube-scheduler
kube-scheduler-wetta-kuber02            1/1       Running   1          17d       k8s-app=kube-scheduler
kubedns-m6x4j                           3/3       Running   0          17d       k8s-app=kubedns,kubernetes.io/cluster-service=true,version=v19
kubernetes-dashboard-3203831700-hb0qp   1/1       Running   0          17d       app=kubernetes-dashboard,pod-template-hash=3203831700
monitoring-grafana-1176657932-k2d53     1/1       Running   0          8d        k8s-app=grafana,pod-template-hash=1176657932,task=monitoring
monitoring-influxdb-957705310-6qdkn     1/1       Running   0          15d       k8s-app=influxdb,pod-template-hash=957705310,task=monitoring
nginx-ingress-controller-1jff3          1/1       Running   0          17d       k8s-app=nginx-ingress-controller
nginx-ingress-controller-5dz1x          1/1       Running   0          17d       k8s-app=nginx-ingress-controller
nginx-proxy-wetta-kuber03               1/1       Running   1          8d        k8s-app=kube-nginx
nginx-proxy-wetta-noaaweather           1/1       Running   1          17d       k8s-app=kube-nginx
@brancz
Copy link
Collaborator

brancz commented Mar 10, 2017

Actually your setup looks good, you just need to create the services in manifests/k8s/self-hosted/. That should work, but you may also want to update to the latest versions of the manifests as the configuration is now generated by the Prometheus Operator, so no manual configuration maintenance necessary anymore 🎉 🙂 .

Just as a reminder the content of this repo has moved to the Prometheus Operator repository. We can solve this issue here, but future issues should be opened there (feel free to tag those with [kube-prometheus]).

@rutsky
Copy link
Contributor Author

rutsky commented Mar 10, 2017

Creating services from manifests/k8s/self-hosted/ almost solves this issue, thanks @brancz !

"almost" because in Kargo Schduler has k8s-app=kube-controller label instead of excepted by manifests k8s-app=kube-controller-manager:
https://github.com/kubernetes-incubator/kargo/blob/dbf1329/roles/kubernetes/master/templates/manifests/kube-controller-manager.manifest.j2#L7

Is this Kargo bug? Or there is no "standard" on those labels?

@brancz
Copy link
Collaborator

brancz commented Mar 10, 2017

As both bootkube and kops have the k8s-app=kube-controller-manager label I think this should be adapted in kargo. (as far as I can tell the manifests in kubernetes/kubernetes also use kube-controller-manager)

@brancz
Copy link
Collaborator

brancz commented Mar 10, 2017

Unfortunately there is no standard on those labels yet, it's on my list to start pushing for them.

@brancz
Copy link
Collaborator

brancz commented Mar 10, 2017

For the reference: kubernetes-retired/contrib#2420 (comment)

@rutsky
Copy link
Contributor Author

rutsky commented Mar 10, 2017

@brancz thanks, I opened issue on Kargo.
Have you opened ticket on the Kubernetes repo to introduce some standard regarding labels?

@brancz
Copy link
Collaborator

brancz commented Mar 10, 2017

Quite literally opened it seconds ago 🙂 . Let's see how it goes.

@rutsky
Copy link
Contributor Author

rutsky commented Mar 10, 2017

Thanks! This issue is resolved.

@rutsky rutsky closed this as completed Mar 10, 2017
@brancz
Copy link
Collaborator

brancz commented Mar 10, 2017

Thanks for reminding me to open that issue! 😁 Let's hope we'll get something good out of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants