Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes.default.svc endpoint ip 172.31.x.x does not respond #552

Closed
markusdresch opened this issue Jul 24, 2018 · 14 comments
Closed

kubernetes.default.svc endpoint ip 172.31.x.x does not respond #552

markusdresch opened this issue Jul 24, 2018 · 14 comments
Assignees
Labels

Comments

@markusdresch
Copy link

What happened:
I installed coreos prometheus-operator, fixed connection issues to kube-dns but connection to apiserver still yields error "context deadline exceeded".

What you expected to happen:
Connection to apiserver should succeed.

How to reproduce it (as minimally and precisely as possible):
Install coreos prometheus-operator helm package, add ingress rule to view kube-prometheus and check /targets

Anything else we need to know?:
continuing issue from prometheus-operator/prometheus-operator#1522
prometheus wants to connect to endpoint https://172.31.31.195:443/metrics, but requests to that endpoint never terminate.

Environment:

  • Kubernetes version (use kubectl version): v1.10.5
  • Size of cluster (how many worker nodes are in the cluster?) 3
  • General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.) microservices
@markusdresch
Copy link
Author

An even simpler way to reproduce it. Get the apiserver endpoint:

> kubectl get endpoints kubernetes
NAME         ENDPOINTS           AGE
kubernetes   172.31.31.195:443   139d

Curl the endpoint from a pod:
> kubectl exec [somepod] -- curl https://172.31.31.195/metrics -k
Auth header can be passed too, but doesn't make a difference, since endpoint does not respond.

@markusdresch
Copy link
Author

markusdresch commented Jul 25, 2018

this issue is a consequence of the endpoint not responding: #504

@markusdresch
Copy link
Author

markusdresch commented Jul 25, 2018

i can't find a 172.* network anywhere in the AKS configuration. where does this endpoint IP come from? is there a way to change this IP to something accessible from within the AKS vnet?

@weinong
Copy link
Contributor

weinong commented Aug 7, 2018

@markusdresch Can you guys send details like subscriptionID, resource group, resource name and region to [email protected] for us to take a look?

@weinong weinong self-assigned this Aug 7, 2018
@markusdresch
Copy link
Author

Done. Well, the endpoint belongs to the kubernetes master node, but still the endpoint shown by kubectl should be accessible IMHO. No big issue though.

@weinong
Copy link
Contributor

weinong commented Aug 8, 2018

Hi @markusdresch

The endpoint IP of api server is really bogus. 
what happens is the kube-svc-redirect pods modifies the iptables on the node to route 10.0.0.1 traffic to azure-proxy which sends the traffic to the master.
Hence, accessing the EP IP will not work.

Always use svc IP!

Thanks

@JoshCollinsMSFT
Copy link

In case anyone was wondering how to get this to work and are a novice at configuring Prometheus (like I am), update the values.yaml of the stable/Prometheus helm chart as follows:

serverFiles:

  prometheus.yml:

    scrape_configs:

      - job_name: 'kubernetes-apiservers'

        relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https
          # replace the address with the kubernetes service cluster DNS address for AKS
          # since the endpoint address is not reachable
          - target_label: __address__
            replacement: kubernetes.default.svc:443

@andig
Copy link

andig commented Nov 12, 2018

@JoshCollinsMSFT is there a way to do this after prometheus-operator has been deployed? The helm chart at https://github.com/helm/charts/blob/master/stable/prometheus-operator/values.yaml does not seem to have this in values.yaml.
It would be nice not to have to monkey-patch the source files?

@ams0
Copy link

ams0 commented Nov 18, 2018

I tried to pass this snippet (during or after helm install, doesn't matter):

prometheusOperator:
  prometheus:
    prometheusSpec:
      additionalScrapeConfigs: 
      - job_name: 'kubernetes-apiservers'
        relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https
          - target_label: __address__
            replacement: kubernetes.default.svc:443

but doesn't seem to have any tangible effect. @JoshCollinsMSFT can you paste your values.yaml?

@reddare
Copy link

reddare commented Jan 24, 2019

@ams0 you need remove installation of prometheus-operator and kube-prometheus and install new from helm repo. coreos prometheus-operator repo deprecated now.

then you can use relabeling from @JoshCollinsMSFT and create new serviceMonitor for API or edit current.

this is my worked example

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: prometheus-operator-apiserver
    release: prometheus-operator
  name: prometheus-operator-apiserver-relabeled
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: false
      serverName: kubernetes
    relabelings:
    - sourceLabels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      action: keep
      regex: default;kubernetes;https
      # replace the address with the kubernetes service cluster DNS address for AKS
      # since the endpoint address is not reachable
    - targetLabel: __address__
      replacement: kubernetes.default.svc:443
  jobLabel: component
  namespaceSelector:
    matchNames:
    - default
  selector:
    matchLabels:
      component: apiserver
      provider: kubernetes

original and relabeled API

dave-read pushed a commit to dave-read/prometheus-operator that referenced this issue Feb 14, 2019
Default configuration cannot connect to the AKS control plane.   Fix is to use explicit address.  See: Azure/AKS#552
@levilugato
Copy link

same problem here, any updates ?

@encyphered
Copy link

@levilugato In my case, it works with @JoshCollinsMSFT's approach without reinstallation, using an upgrade command. But some key has been changed.

kubeApiServer:
  relabelings:
   - sourceLabels:
     - __meta_kubernetes_namespace
     - __meta_kubernetes_service_name
     - __meta_kubernetes_endpoint_port_name
     action: keep
     regex: default;kubernetes;https
   - targetLabel: __address__
     replacement: kubernetes.default.svc:443

@levilugato
Copy link

@encyphered where i put this ?

@AliYusuf95
Copy link

@levilugato put it in helm values

@ghost ghost locked as resolved and limited conversation to collaborators Aug 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

9 participants