-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two alerts failing out of the box: K8SControllerManagerDown and K8SSchedulerDown #23
Comments
The
Or if you cannot disclose all of that information this should give us all the information applicable:
We typically test the content of this repository with clusters created with bootkube, but it would be great if we can get a section/guides for kops as it's pretty widely adopted as well. |
Thanks for the quick response.
|
I believe I have seen this before, the problem I think is that kops doesn't label the kubernetes component pods correctly with If what I am guessing is correct then we should push on kops side to use upstream manifests like bootkube does. |
This was a similar issue - to add that label to kube-proxy: kubernetes/kops#617 |
I didn't see that one, thanks for pointing it out! I opened kubernetes/kops#1226 to start a discussion on it. Hopefully we will get those soon then. In the mean time I think you'll have to either ssh onto those servers, change the templates and restart them (which makes the objects be recreated from the templates IIRC; disclaimer: not super familiar with kops) or comment out/remove those alerts for now. (also remember that changes to single machines will disappear when recreating machines from the ASG unless you make the changes to the ASG) |
I just noticed etcd is not appearing in my prometheus targets either. |
Oh, and kube-dns. Should we update kubernetes/kops#1226 ? |
It seems like we won't have an answer before the holidays. So I'll keep pushing in the new year. But yes I will keep pushing for a consistent labelling strategy, we'll add the respective manifests for Prometheus to properly discover the components here once we have that consistent labelling. I don't mind maintaining a set of manifests for kops, bootkube, etc. as long as each of those labelling strategies make sense and exist. So far so goot :) happy holidays! |
I added the labels on the master (/etc/kubernetes/manifests/kube-controller-manager,kube-scheduler), and then ran |
Yep that's the plan as soon as we have consistent labeling in upstream kops. |
Labelling added to kops: kubernetes/kops#1314 |
I don't have a v1.5.x kops cluster handy, but I'll create the manifests with a best effort and then it would be great if you could test them. |
With pleasure. Thanks |
In fact I think the manifests from |
Actually it seems that the |
I'll create a 1.5.x K8s cluster with the latest KOPS soon to test, thanks. Right now I updated the labels on my 1.4.6 master and it looks good except:
|
The The |
@brancz I don't know whether we want to continue this thread on kops/kube-prometheus work - let me know if there's a better place. Maybe we should open a new issue. --- github/kube-prometheus ‹master› » ./hack/cluster-monitoring/deploy
namespace "monitoring" created
deployment "prometheus-operator" created
the server doesn't have a resource type "servicemonitor"
the server doesn't have a resource type "servicemonitor"
the server doesn't have a resource type "servicemonitor"
No resources found.
No resources found.
No resources found.
deployment "kube-state-metrics" created
service "kube-state-metrics" created
daemonset "node-exporter" created
service "node-exporter" created
configmap "grafana-dashboards" created
deployment "grafana" created
service "grafana" created
configmap "prometheus-k8s" created
configmap "prometheus-k8s-rules" created
service "prometheus-k8s" created
prometheus "prometheus-k8s" created
configmap "alertmanager-main" created
service "alertmanager-main" created
alertmanager "alertmanager-main" created |
Has this been solved on upstream kops? @dmcnaught |
I'm going to start with the kops - kube-prometheus config when kops 1.5 has been released. |
Great thanks for the update! Are you aware of an ETA? |
I've heard "soon" - it's currently in alpha4: https://github.com/kubernetes/kops/releases |
Great! Looking forward to "soon" 🙂 |
Me too. I thought it would be "sooner" 😉 |
Getting close with kops 1.5.0-alpha2 and k8s 1.5.2 ^ Just the api cert issue to go. 😄 |
Looks like this is also the case with clusters created via acs-engine on Azure. The labels on the controller-manager pod are:
|
Same with a cluster created using Kubeadm. |
@yann-soubeyrand for kubeadm clusters you need to enable the controller manager and scheduler to listen on all or at least the pod networking interface/ip. |
@brancz Thanks for the tip on modifying listening addresses which saved me some time ;-) However, I was mentioning the fact that the labeling done by Kubeadm is like rocketraman wrote above and therefore kube-prometheus was not able to discover the controller manager neither the scheduler nor etcd. |
@brancz Can confirm what @yann-soubeyrand and @rocketraman have said, |
I changed the bind address of controller manager and scheduler to Also, there is no data in grafana.. |
Installed with KOPS 1.4.1, K8s 1.4.6 on AWS.
It looks to me like the query is set to alert when there is one kube-scheduler (or kube-contoller-manager), which I don't understand.
I’m pretty new to prometheus queries and I’m not really sure how the
BY (cluster) == 0)
relates.Any pointers appreciated.
Thanks for the great project!
--Duncan
The text was updated successfully, but these errors were encountered: