kubernetes.default.svc endpoint ip 172.31.x.x does not respond #552

markusdresch · 2018-07-24T15:01:47Z

What happened:
I installed coreos prometheus-operator, fixed connection issues to kube-dns but connection to apiserver still yields error "context deadline exceeded".

What you expected to happen:
Connection to apiserver should succeed.

How to reproduce it (as minimally and precisely as possible):
Install coreos prometheus-operator helm package, add ingress rule to view kube-prometheus and check /targets

Anything else we need to know?:
continuing issue from prometheus-operator/prometheus-operator#1522
prometheus wants to connect to endpoint https://172.31.31.195:443/metrics, but requests to that endpoint never terminate.

Environment:

Kubernetes version (use kubectl version): v1.10.5
Size of cluster (how many worker nodes are in the cluster?) 3
General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.) microservices

The text was updated successfully, but these errors were encountered:

markusdresch · 2018-07-25T06:51:29Z

An even simpler way to reproduce it. Get the apiserver endpoint:

> kubectl get endpoints kubernetes
NAME         ENDPOINTS           AGE
kubernetes   172.31.31.195:443   139d

Curl the endpoint from a pod:
> kubectl exec [somepod] -- curl https://172.31.31.195/metrics -k
Auth header can be passed too, but doesn't make a difference, since endpoint does not respond.

markusdresch · 2018-07-25T07:04:08Z

this issue is a consequence of the endpoint not responding: #504

markusdresch · 2018-07-25T07:56:11Z

i can't find a 172.* network anywhere in the AKS configuration. where does this endpoint IP come from? is there a way to change this IP to something accessible from within the AKS vnet?

weinong · 2018-08-07T20:15:21Z

@markusdresch Can you guys send details like subscriptionID, resource group, resource name and region to [email protected] for us to take a look?

markusdresch · 2018-08-08T07:35:02Z

Done. Well, the endpoint belongs to the kubernetes master node, but still the endpoint shown by kubectl should be accessible IMHO. No big issue though.

weinong · 2018-08-08T18:44:37Z

Hi @markusdresch

The endpoint IP of api server is really bogus. 
what happens is the kube-svc-redirect pods modifies the iptables on the node to route 10.0.0.1 traffic to azure-proxy which sends the traffic to the master.
Hence, accessing the EP IP will not work.

Always use svc IP!

Thanks

JoshCollinsMSFT · 2018-11-08T22:31:05Z

In case anyone was wondering how to get this to work and are a novice at configuring Prometheus (like I am), update the values.yaml of the stable/Prometheus helm chart as follows:

serverFiles:
…
  prometheus.yml:
…
    scrape_configs:
…
      - job_name: 'kubernetes-apiservers'
…
        relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https
          # replace the address with the kubernetes service cluster DNS address for AKS
          # since the endpoint address is not reachable
          - target_label: __address__
            replacement: kubernetes.default.svc:443

andig · 2018-11-12T13:37:04Z

@JoshCollinsMSFT is there a way to do this after prometheus-operator has been deployed? The helm chart at https://github.com/helm/charts/blob/master/stable/prometheus-operator/values.yaml does not seem to have this in values.yaml.
It would be nice not to have to monkey-patch the source files?

ams0 · 2018-11-18T20:33:31Z

I tried to pass this snippet (during or after helm install, doesn't matter):

prometheusOperator:
  prometheus:
    prometheusSpec:
      additionalScrapeConfigs: 
      - job_name: 'kubernetes-apiservers'
        relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https
          - target_label: __address__
            replacement: kubernetes.default.svc:443

but doesn't seem to have any tangible effect. @JoshCollinsMSFT can you paste your values.yaml?

reddare · 2019-01-24T20:18:14Z

@ams0 you need remove installation of prometheus-operator and kube-prometheus and install new from helm repo. coreos prometheus-operator repo deprecated now.

then you can use relabeling from @JoshCollinsMSFT and create new serviceMonitor for API or edit current.

this is my worked example

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: prometheus-operator-apiserver
    release: prometheus-operator
  name: prometheus-operator-apiserver-relabeled
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: false
      serverName: kubernetes
    relabelings:
    - sourceLabels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      action: keep
      regex: default;kubernetes;https
      # replace the address with the kubernetes service cluster DNS address for AKS
      # since the endpoint address is not reachable
    - targetLabel: __address__
      replacement: kubernetes.default.svc:443
  jobLabel: component
  namespaceSelector:
    matchNames:
    - default
  selector:
    matchLabels:
      component: apiserver
      provider: kubernetes

Default configuration cannot connect to the AKS control plane. Fix is to use explicit address. See: Azure/AKS#552

levilugato · 2019-05-13T14:06:37Z

same problem here, any updates ?

encyphered · 2019-05-20T06:41:03Z

@levilugato In my case, it works with @JoshCollinsMSFT's approach without reinstallation, using an upgrade command. But some key has been changed.

kubeApiServer:
  relabelings:
   - sourceLabels:
     - __meta_kubernetes_namespace
     - __meta_kubernetes_service_name
     - __meta_kubernetes_endpoint_port_name
     action: keep
     regex: default;kubernetes;https
   - targetLabel: __address__
     replacement: kubernetes.default.svc:443

levilugato · 2019-05-20T12:25:44Z

@encyphered where i put this ?

AliYusuf95 · 2019-06-11T10:30:39Z

@levilugato put it in helm values

weinong self-assigned this Aug 7, 2018

weinong closed this as completed Aug 8, 2018

weinong added the question label Aug 8, 2018

This was referenced Jan 29, 2019

Allow for relabeling in the apiserver ServiceMonitor helm/charts#10963

Closed

[stable/prometheus] allow for relabeling in the apiserver ServiceMonitor helm/charts#10964

Closed

dave-read pushed a commit to dave-read/prometheus-operator that referenced this issue Feb 14, 2019

re-label AKS API endpoint

1e12354

Default configuration cannot connect to the AKS control plane. Fix is to use explicit address. See: Azure/AKS#552

ghost locked as resolved and limited conversation to collaborators Aug 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes.default.svc endpoint ip 172.31.x.x does not respond #552

kubernetes.default.svc endpoint ip 172.31.x.x does not respond #552

markusdresch commented Jul 24, 2018

markusdresch commented Jul 25, 2018

markusdresch commented Jul 25, 2018 •

edited

Loading

markusdresch commented Jul 25, 2018 •

edited

Loading

weinong commented Aug 7, 2018

markusdresch commented Aug 8, 2018

weinong commented Aug 8, 2018

JoshCollinsMSFT commented Nov 8, 2018

andig commented Nov 12, 2018 •

edited

Loading

ams0 commented Nov 18, 2018

reddare commented Jan 24, 2019 •

edited

Loading

levilugato commented May 13, 2019

encyphered commented May 20, 2019

levilugato commented May 20, 2019

AliYusuf95 commented Jun 11, 2019

kubernetes.default.svc endpoint ip 172.31.x.x does not respond #552

kubernetes.default.svc endpoint ip 172.31.x.x does not respond #552

Comments

markusdresch commented Jul 24, 2018

markusdresch commented Jul 25, 2018

markusdresch commented Jul 25, 2018 • edited Loading

markusdresch commented Jul 25, 2018 • edited Loading

weinong commented Aug 7, 2018

markusdresch commented Aug 8, 2018

weinong commented Aug 8, 2018

JoshCollinsMSFT commented Nov 8, 2018

andig commented Nov 12, 2018 • edited Loading

ams0 commented Nov 18, 2018

reddare commented Jan 24, 2019 • edited Loading

levilugato commented May 13, 2019

encyphered commented May 20, 2019

levilugato commented May 20, 2019

AliYusuf95 commented Jun 11, 2019

markusdresch commented Jul 25, 2018 •

edited

Loading

markusdresch commented Jul 25, 2018 •

edited

Loading

andig commented Nov 12, 2018 •

edited

Loading

reddare commented Jan 24, 2019 •

edited

Loading