Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set ContinueOnError for prometheus http handler. #1679

Merged
merged 1 commit into from
Jun 20, 2017

Conversation

Random-Liu
Copy link
Member

PR #1460 upgraded prometheus to v0.8.0.

However, prometheus v0.8.0 enforced some consistency check, which our current metrics could not pass, thus caused #1671 and kubernetes/kubernetes#47744.

This PR configure prometheus http handler to ContinueOnError so that the metrics could be collected/exposed regardless of the error. This is also the solution suggested in prometheus/client_golang#214.

This is only a quick fix for Kubernetes 1.7 release. We should figure out what in our metrics caused the inconsistency and fix it.

@dchen1107 @google/cadvisor @grobie

@dchen1107 dchen1107 self-requested a review June 20, 2017 19:04
@Random-Liu
Copy link
Member Author

Without this PR:

* collected metric container_tasks_state label:<name:"container_label_annotation_io_kubernetes_container_hash" value:"228e8d0c" > label:<name:"container_label_annotation_io_kubernetes_container_ports" value:"[{\"name\":\"dns-local\",\"containerPort\":10053,\"protocol\":\"UDP\"},{\"name\":\"dns-tcp-local\",\"containerPort\":10053,\"protocol\":\"TCP\"},{\"name\":\"metrics\",\"containerPort\":10055,\"protocol\":\"TCP\"}]" > label:<name:"container_label_annotation_io_kubernetes_container_restartCount" value:"0" > label:<name:"container_label_annotation_io_kubernetes_container_terminationMessagePath" value:"/dev/termination-log" > label:<name:"container_label_annotation_io_kubernetes_container_terminationMessagePolicy" value:"File" > label:<name:"container_label_annotation_io_kubernetes_pod_terminationGracePeriod" value:"30" > label:<name:"container_label_io_kubernetes_container_logpath" value:"/var/log/pods/432b46fd-55e0-11e7-b764-42010af00002/kubedns_0.log" > label:<name:"container_label_io_kubernetes_container_name" value:"kubedns" > label:<name:"container_label_io_kubernetes_docker_type" value:"container" > label:<name:"container_label_io_kubernetes_pod_name" value:"kube-dns-2673147055-32wcs" > label:<name:"container_label_io_kubernetes_pod_namespace" value:"kube-system" > label:<name:"container_label_io_kubernetes_pod_uid" value:"432b46fd-55e0-11e7-b764-42010af00002" > label:<name:"container_label_io_kubernetes_sandbox_id" value:"20c8e397df74c045ed8297bc53b91271dd84e9fbd23ba85d9007e9c21abba526" > label:<name:"id" value:"/kubepods/burstable/pod432b46fd-55e0-11e7-b764-42010af00002/893eb7e95242dd196284f7e10d5b04277c423ac2ff69c26284494b4706a92c21" > label:<name:"image" value:"sha256:ca8759c215c9c2377bee9425bb3ba547ebf85e511759652bdb5d2d980e4f4a21" > label:<name:"name" value:"k8s_kubedns_kube-dns-2673147055-32wcs_kube-system_432b46fd-55e0-11e7-b764-42010af00002_0" > label:<name:"state" value:"stopped" > gauge:<value:0 >  has label dimensions inconsistent with previously collected metrics in the same metric family

With this PR:

container_spec_cpu_quota{container_label_annotation_io_kubernetes_container_hash="d7c8bd4a",container_label_annotation_io_kubernetes_container_restartCount="0",container_label_annotation_io_kubernetes_container_terminationMessagePath="/dev/termination-log",container_label_annotation_io_kubernetes_container_terminationMessagePolicy="File",container_label_annotation_io_kubernetes_pod_terminationGracePeriod="30",container_label_io_kubernetes_container_logpath="/var/log/pods/51605191-55e0-11e7-b764-42010af00002/heapster-nanny_0.log",container_label_io_kubernetes_container_name="heapster-nanny",container_label_io_kubernetes_docker_type="container",container_label_io_kubernetes_pod_name="heapster-v1.3.0-1514755676-z24f5",container_label_io_kubernetes_pod_namespace="kube-system",container_label_io_kubernetes_pod_uid="51605191-55e0-11e7-b764-42010af00002",container_label_io_kubernetes_sandbox_id="89d4ce90493e28474622dc057c5132262198868b789e233e5a983632db580248",id="/kubepods/pod51605191-55e0-11e7-b764-42010af00002/457a9ce101c77b6bfa3a7f03c5072375853e11c7613a6f5bf1952abbcba2e64f",image="sha256:9b0815c8711889802a3081d1a609cc4251357e6ec0a28ac5963aac72bec67691",name="k8s_heapster-nanny_heapster-v1.3.0-1514755676-z24f5_kube-system_51605191-55e0-11e7-b764-42010af00002_0"} 5000

@dchen1107
Copy link
Collaborator

/lgtm

@dchen1107 dchen1107 merged commit f3d900b into google:master Jun 20, 2017
@Random-Liu Random-Liu deleted the fix-metrics branch June 20, 2017 19:09
dchen1107 referenced this pull request Jun 20, 2017
Set ContinueOnError for prometheus http handler.
@grobie
Copy link
Contributor

grobie commented Jun 21, 2017

Thanks @Random-Liu. We should fix these metrics indeed.

@bboreham
Copy link
Contributor

For the sake of anyone coming across this, I'll note this resulted in the output being reduced to some random subset of all metrics: it "continues" but it still drops the ones it doesn't like. See #1704 for more discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants