Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Servicemonitor for prometheus exporter is referring to cluster port instead of metrics pod port #483

Closed
sanjay3290 opened this issue Oct 21, 2022 · 20 comments · Fixed by #539
Labels
metrics question Further information is requested
Milestone

Comments

@sanjay3290
Copy link

I have followed the solr operator documentation to configure SolrPrometheusExporter, however after creating the servicemonitor, the service endpoint is going inactive. After further troubleshooting, i realized the metric server is trying to connect to port 80 whereas the metrics server is running on port 8080. Is it possible to pass port into service monitor?

Get "http://x.x.x.x:80/metrics": dial tcp x.x.x.x:80: connect: connection refused

@HoustonPutman
Copy link
Contributor

Can you provide the yaml for the service monitor you created?

@HoustonPutman HoustonPutman added question Further information is requested metrics labels Oct 21, 2022
@sanjay3290
Copy link
Author

Hello @HoustonPutman, Below is the Servicemonitor yaml , i used the default provided in SolrOperator Documentation.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: solr-metrics
labels:
release: prometheus
spec:
selector:
matchLabels:
solr-prometheus-exporter: solr-dev-prom-exporter
namespaceSelector:
matchNames:
- solr-dev
endpoints:

  • port: solr-metrics
    interval: 20s

@HoustonPutman
Copy link
Contributor

So you are using a serviceMonitor, and the Solr metrics service is listening on port 80, or at least it should be... The pod is listening on port 8080, but the service forwards that 80 -> 8080 when sending the request to the pod.

I have almost the exact same thing working correctly.

What version of the prometheus stack are you running? Also can you provide information on your Kube cluster? (version, vendor, etc) I have a feeling there's an issue with your networking.

@sanjay3290
Copy link
Author

you are right, thats how its supposed to work. However the service endpoint in prometheus targets is referencing to http://podIP:90/metrics and due to that, the connection is getting refused. My other default service endpoints for prometheus are working as expected.

Prometheus :
Chart:prometheus-15.16.1
Version:2.39.1
Kubernetes:
AWS EKS, Version:1.22

@HoustonPutman
Copy link
Contributor

Are you sure you don't have a podMonitor defined as well?

Looks like there might be a bug in the prometheus operator? In the meantime you can use targetPort instead to set 8080. Here are the available options under endpoints.

@koboltmarky
Copy link

koboltmarky commented Dec 1, 2022

We have the same problem here. We are using the solr-operator 0.6 and prometheus 2.39.1 hosted in gke version 1.21.
We aren't using the prometheus operator.
I deployed the solr prometheus exporter with the following snippet:

apiVersion: solr.apache.org/v1beta1
kind: SolrPrometheusExporter
metadata:
  name: solr-prom-exporter
spec:
  customKubeOptions:
      resources:
        requests:
          cpu: 300m
          memory: 900Mi
  solrReference:
    basicAuthSecret: solr-cloud-k8s-oper-secret 
    cloud:
      name: "apache-solr"
  numThreads: 6

As you can see in the screenshot prometheus tries to connect to the pod on port 80 which is the wrong port.

Screenshot from 2022-12-01 16-17-22

Our workaround is to add a prometheus scraping annotation to the exporter pod:

spec:
  customKubeOptions:
    podOptions:
      annotations:
        prometheus.io/port: "8080"
        prometheus.io/path: /metrics
        prometheus.io/scrape: "true"
        prometheus.io/scheme: http

@HoustonPutman
Copy link
Contributor

In that screenshot, is the 10.110.6.70 IP address the service ClusterIP or the pod IP? If it's the service's then there is something wrong with kubernetes. If its the pod, then Prometheus shouldn't be trying to contact the pod at all, it should be contacting the service IP...

@sanjay3290
Copy link
Author

sanjay3290 commented Dec 1, 2022

We have the same problem here. We are using the solr-operator 0.6 and prometheus 2.39.1 hosted in gke version 1.21. We aren't using the prometheus operator. I deployed the solr prometheus exporter with the following snippet:

apiVersion: solr.apache.org/v1beta1
kind: SolrPrometheusExporter
metadata:
  name: solr-prom-exporter
spec:
  customKubeOptions:
      resources:
        requests:
          cpu: 300m
          memory: 900Mi
  solrReference:
    basicAuthSecret: solr-cloud-k8s-oper-secret 
    cloud:
      name: "apache-solr"
  numThreads: 6

As you can see in the screenshot prometheus tries to connect to the pod on port 80 which is the wrong port.

Screenshot from 2022-12-01 16-17-22

Our workaround is to add a prometheus scraping annotation to the exporter pod:

spec:
  customKubeOptions:
    podOptions:
      annotations:
        prometheus.io/port: "8080"
        prometheus.io/path: /metrics
        prometheus.io/scrape: "true"
        prometheus.io/scheme: http

even after adding pod annotation, prometheus still looking at port 80 on pod IP in my case. Something is seriously wrong with this.below is my exporter config.

apiVersion: solr.apache.org/v1beta1
kind: SolrPrometheusExporter
metadata:
  name: solr-prom-exporter
spec:
  customKubeOptions:
    podOptions:
      annotations:
        prometheus.io/port: "8080"
        prometheus.io/path: /metrics
        prometheus.io/scrape: "true"
        prometheus.io/scheme: http
      resources:
        requests:
          cpu: 300m
          memory: 900Mi
  solrReference:
    cloud:
      name: "eks"
  numThreads: 6

Screenshot 2022-12-01 at 3 54 04 PM

@koboltmarky
Copy link

In that screenshot, is the 10.110.6.70 IP address the service ClusterIP or the pod IP? If it's the service's then there is something wrong with kubernetes. If its the pod, then Prometheus shouldn't be trying to contact the pod at all, it should be contacting the service IP...

It is the pod ip

@koboltmarky
Copy link

even after adding pod annotation, prometheus still looking at port 80 on pod IP in my case. Something is seriously wrong with this.below is my exporter config.

The old failed target will still exits but there should be a new target which should works.

@HoustonPutman
Copy link
Contributor

Can you share your prometheus scraping config? This seems to be a prometheus issue...

@tiimbz
Copy link

tiimbz commented Dec 21, 2022

We are having the same issue. The prometheus.io/port annotation is set to port 80, which doesn't correspond with the port of the pod. This causes Prometheus to fail to scrape the service endpoint.

We've also bypassed the problem by enabling scraping of the pods directly:

  customKubeOptions:
    podOptions:
      annotations:
        prometheus.io/port: "8080"
        prometheus.io/path: /metrics
        prometheus.io/scrape: "true"
        prometheus.io/scheme: http

The Prometheus scraping config we use is the default kubernetes-service-endpoints job from the default config.

@tiimbz
Copy link

tiimbz commented Dec 21, 2022

Looking at the code, it looks like the prometheus.io/port value is set from ExtSolrMetricsPort, not SolrMetricsPort which would have fixed the problem.

Any attempts to overwrite this by using custom serviceAnnotations is not working, as custom annotations can only supplement the default ones, not overwrite them: https://github.com/apache/solr-operator/blob/main/controllers/util/prometheus_exporter_util.go#L400

@samuelverstraete
Copy link
Contributor

We have exactly the same issue.

@coolstim
Copy link

coolstim commented Mar 8, 2023

We are having the same issue. The prometheus.io/port annotation is set to port 80, which doesn't correspond with the port of the pod. This causes Prometheus to fail to scrape the service endpoint.

We've also bypassed the problem by enabling scraping of the pods directly:

  customKubeOptions:
    podOptions:
      annotations:
        prometheus.io/port: "8080"
        prometheus.io/path: /metrics
        prometheus.io/scrape: "true"
        prometheus.io/scheme: http

The Prometheus scraping config we use is the default kubernetes-service-endpoints job from the default config.

Indeed, this is a valid workaround

@HoustonPutman
Copy link
Contributor

So it seems like everyone is using kubernetes-service-endpoints, could you try using kubernetes-services and see if the problem is fixed?

I think the issue is that this feature was designed with the kubernetes-services usage in mind, however it looks like it should work with kubernetes-service-endpoints as well, but breaks in this way. I don't think there's a way that we can get both to work at the same time, unless we remove the prometheus.io/port annotation all-together.

I will try to test this locally but it might be difficult. I'm happy to create a test docker image for anyone else to try out (based on v0.6.0) and see if it fixes things for them.

@coolstim
Copy link

coolstim commented Mar 15, 2023

Situation before Solr:

  • We aren't using the Prometheus operator.
  • All our other services are scraped correctly.

Situation after Solr:
We installed the solr-exporter using

apiVersion: solr.apache.org/v1beta1
kind: SolrPrometheusExporter
metadata:
  name: solr-prom-exporter
spec:
  customKubeOptions:
    podOptions:
      resources:
        requests:
          cpu: 300m
          memory: 900Mi
  solrReference:
    cloud:
      name: "eks"
  numThreads: 6

No metrics are scraped from Solr as, by default, it seems Prometheus is using the endpoints?
Default Prometheus configuration:

global:
  evaluation_interval: 1m
  scrape_interval: 1m
  scrape_timeout: 10s
remote_write:
- queue_config:
    capacity: 2500
    max_samples_per_send: 1000
    max_shards: 200
  sigv4:
    region: east-us-1
  url: https://aps-workspaces.east-us-1.amazonaws.com/workspaces/XXX/api/v1/remote_write
rule_files:
- /etc/config/recording_rules.yml
- /etc/config/alerting_rules.yml
- /etc/config/rules
- /etc/config/alerts
scrape_configs:
- job_name: prometheus
  static_configs:
  - targets:
    - localhost:9090
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  job_name: kubernetes-apiservers
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - action: keep
    regex: default;kubernetes;https
    source_labels:
    - __meta_kubernetes_namespace
    - __meta_kubernetes_service_name
    - __meta_kubernetes_endpoint_port_name
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  job_name: kubernetes-nodes
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - replacement: kubernetes.default.svc:443
    target_label: __address__
  - regex: (.+)
    replacement: /api/v1/nodes/$1/proxy/metrics
    source_labels:
    - __meta_kubernetes_node_name
    target_label: __metrics_path__
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  job_name: kubernetes-nodes-cadvisor
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - replacement: kubernetes.default.svc:443
    target_label: __address__
  - regex: (.+)
    replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
    source_labels:
    - __meta_kubernetes_node_name
    target_label: __metrics_path__
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
- honor_labels: true
  job_name: kubernetes-service-endpoints
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_scrape
  - action: drop
    regex: true
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
  - action: replace
    regex: (https?)
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_scheme
    target_label: __scheme__
  - action: replace
    regex: (.+)
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_path
    target_label: __metrics_path__
  - action: replace
    regex: (.+?)(?::\d+)?;(\d+)
    replacement: $1:$2
    source_labels:
    - __address__
    - __meta_kubernetes_service_annotation_prometheus_io_port
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
    replacement: __param_$1
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_service_name
    target_label: service
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_node_name
    target_label: node
- honor_labels: true
  job_name: kubernetes-service-endpoints-slow
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
  - action: replace
    regex: (https?)
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_scheme
    target_label: __scheme__
  - action: replace
    regex: (.+)
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_path
    target_label: __metrics_path__
  - action: replace
    regex: (.+?)(?::\d+)?;(\d+)
    replacement: $1:$2
    source_labels:
    - __address__
    - __meta_kubernetes_service_annotation_prometheus_io_port
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
    replacement: __param_$1
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_service_name
    target_label: service
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_node_name
    target_label: node
  scrape_interval: 5m
  scrape_timeout: 30s
- honor_labels: true
  job_name: prometheus-pushgateway
  kubernetes_sd_configs:
  - role: service
  relabel_configs:
  - action: keep
    regex: pushgateway
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_probe
- honor_labels: true
  job_name: kubernetes-services
  kubernetes_sd_configs:
  - role: service
  metrics_path: /probe
  params:
    module:
    - http_2xx
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_probe
  - source_labels:
    - __address__
    target_label: __param_target
  - replacement: blackbox
    target_label: __address__
  - source_labels:
    - __param_target
    target_label: instance
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: service
- honor_labels: true
  job_name: kubernetes-pods
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scrape
  - action: drop
    regex: true
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
  - action: replace
    regex: (https?)
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scheme
    target_label: __scheme__
  - action: replace
    regex: (.+)
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_path
    target_label: __metrics_path__
  - action: replace
    regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
    replacement: '[$2]:$1'
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_port
    - __meta_kubernetes_pod_ip
    target_label: __address__
  - action: replace
    regex: (\d+);((([0-9]+?)(\.|$)){4})
    replacement: $2:$1
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_port
    - __meta_kubernetes_pod_ip
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
    replacement: __param_$1
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - action: drop
    regex: Pending|Succeeded|Failed|Completed
    source_labels:
    - __meta_kubernetes_pod_phase
- honor_labels: true
  job_name: kubernetes-pods-slow
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
  - action: replace
    regex: (https?)
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scheme
    target_label: __scheme__
  - action: replace
    regex: (.+)
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_path
    target_label: __metrics_path__
  - action: replace
    regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
    replacement: '[$2]:$1'
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_port
    - __meta_kubernetes_pod_ip
    target_label: __address__
  - action: replace
    regex: (\d+);((([0-9]+?)(\.|$)){4})
    replacement: $2:$1
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_port
    - __meta_kubernetes_pod_ip
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
    replacement: __param_$1
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - action: drop
    regex: Pending|Succeeded|Failed|Completed
    source_labels:
    - __meta_kubernetes_pod_phase
  scrape_interval: 5m
  scrape_timeout: 30s

@HoustonPutman
Copy link
Contributor

HoustonPutman commented Mar 31, 2023

I have a patch that I think should work: #539.
Would someone be willing to try out this fix in their cluster?

Steps to try it:

  1. Checkout the v0.6.0 release
  2. Copy this one line change
  3. Run make docker-build, then upload to docker somewhere
  4. Update your Solr Operator to use this new image
  5. Delete the prometheus exporter service just to make sure the annotation is removed:
    kubectl delete service <name>-solr-metrics
  6. Wait for it to come back and see if Prometheus is happier!

If it does work we can get this into the v0.7.0 release that should be coming soon!

@coolstim
Copy link

coolstim commented Apr 6, 2023

It seems to be working

@HoustonPutman
Copy link
Contributor

Cool, I will go ahead and merge then!

@HoustonPutman HoustonPutman added this to the main (v0.7.0) milestone Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants