Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update steps for setting up metrics on openshift, focusing on single … #953

Merged
merged 2 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions config/observability/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,7 @@ resources:
# See https://github.com/prometheus-operator/prometheus-operator/issues/3071#issuecomment-763746836
- prometheus/additional-scrape-configs.yaml
#https://istio.io/latest/docs/reference/config/telemetry/#MetricSelector-IstioMetric
- prometheus/monitors/service-monitor-limitador-operator.yaml
- prometheus/monitors/service-monitor-kuadrant-operator.yaml
- prometheus/monitors/service-monitor-authorino-operator.yaml
- prometheus/monitors/service-monitor-dns-operator.yaml
- prometheus/monitors/operators.yaml


patchesStrategicMerge:
Expand Down
73 changes: 73 additions & 0 deletions config/observability/prometheus/monitors/operators.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: controller-manager
name: authorino-operator-metrics
namespace: kuadrant-system
spec:
endpoints:
- path: /metrics
port: metrics
scheme: http
selector:
matchLabels:
control-plane: authorino-operator
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: controller-manager
jsmolar marked this conversation as resolved.
Show resolved Hide resolved
app.kubernetes.io/name: servicemonitor
app.kubernetes.io/instance: controller-manager-metrics-monitor
app.kubernetes.io/component: metrics
app.kubernetes.io/created-by: dns-operator
app.kubernetes.io/part-of: dns-operator
app.kubernetes.io/managed-by: kustomize
name: dns-operator-metrics-monitor
namespace: kuadrant-system
spec:
endpoints:
- path: /metrics
port: metrics
scheme: http
selector:
matchLabels:
control-plane: dns-operator-controller-manager
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: controller-manager
name: kuadrant-operator-metrics
namespace: kuadrant-system
spec:
endpoints:
- path: /metrics
port: metrics
scheme: http
selector:
matchLabels:
control-plane: controller-manager
app: kuadrant
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: controller-manager
name: limitador-operator-metrics
namespace: kuadrant-system
spec:
endpoints:
- path: /metrics
port: metrics
scheme: http
selector:
matchLabels:
control-plane: controller-manager
jsmolar marked this conversation as resolved.
Show resolved Hide resolved


This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

34 changes: 20 additions & 14 deletions doc/install/install-openshift.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,36 +172,42 @@ Wait for Envoy Gateway to become available::
kubectl wait --timeout=5m -n envoy-gateway-system deployment/envoy-gateway --for=condition=Available
```

### Step 6 - Optional: Configure observability and metrics

Kuadrant provides a set of example dashboards that use known metrics exported by Kuadrant and Gateway components to provide insight into different components of your APIs and Gateways. While not essential, it is best to set up an OpenShift monitoring stack. This section provides links to OpenShift and Thanos documentation on configuring monitoring and metrics storage.

You can set up user-facing monitoring by following the steps in the OpenShift documentation on [configuring the monitoring stack](https://docs.openshift.com/container-platform/latest/observability/monitoring/configuring-the-monitoring-stack.html).

If you have user workload monitoring enabled, it is best to configure remote writes to a central storage system such as Thanos:

- [OpenShift remote write configuration](https://docs.openshift.com/container-platform/latest/observability/monitoring/configuring-the-monitoring-stack.html#configuring_remote_write_storage_configuring-the-monitoring-stack)
- [Kube Thanos](https://github.com/thanos-io/kube-thanos)
### Step 6 - Optional: Configure observability and metrics (Istio only)

Kuadrant provides a set of example dashboards that use known metrics exported by Kuadrant and Gateway components to provide insight into different components of your APIs and Gateways. While not essential, it is recommended to set these up.
First, enable [monitoring for user-defined projects](https://docs.openshift.com/container-platform/4.17/observability/monitoring/enabling-monitoring-for-user-defined-projects.html#enabling-monitoring-for-user-defined-projects_enabling-monitoring-for-user-defined-projects).
This will allow the scraping of metrics from the gateway and Kuadrant components.
The [example dashboards and alerts](https://docs.kuadrant.io/latest/kuadrant-operator/doc/observability/examples/) for observing Kuadrant functionality use low-level CPU metrics and network metrics available from the user monitoring stack in OpenShift. They also use resource state metrics from Gateway API and Kuadrant resources.

To scrape these additional metrics, you can install a `kube-state-metrics instance`, with a custom resource configuration as follows:
To scrape these additional metrics, you can install a `kube-state-metrics` instance, with a custom resource configuration as follows:

```bash
kubectl apply -f https://raw.githubusercontent.com/Kuadrant/kuadrant-operator/main/config/observability/openshift/kube-state-metrics.yaml
kubectl apply -k https://github.com/Kuadrant/gateway-api-state-metrics?ref=main
```

To enable request metrics in Istio, you must create a `telemetry` resource as follows:
To enable request metrics in Istio and scrape them, create the following resources:

```bash
kubectl apply -f https://raw.githubusercontent.com/Kuadrant/kuadrant-operator/main/config/observability/openshift/telemetry.yaml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not strictly necessary to have request metrics in Istio, those are enabled by default. This Telemetry configuration adds the request path as a label to the request metrics, which is not always desirable as it is a high cardinality label that can flood your prometheus instance if you have a big API. For example each resource in an API would generate a different prometheus time-series. We probably should warn about this at least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
I'll split this out and explain better with a warning.

kubectl apply -f https://raw.githubusercontent.com/Kuadrant/kuadrant-operator/refs/heads/main/config/observability/prometheus/monitors/istio/service-monitor-istiod.yaml
```

You also need to configure scraping of metrics from the various Kuadrant operators.

```bash
kubectl apply -f https://raw.githubusercontent.com/Kuadrant/kuadrant-operator/refs/heads/main/config/observability/prometheus/monitors/operators.yaml
```

If you have Grafana installed in your cluster, you can import the [example dashboards and alerts](https://docs.kuadrant.io/latest/kuadrant-operator/doc/observability/examples).
!!! note

There is 1 more metrics configuration that needs to be applied so that all relevant metrics are being scraped.
That configuration depends on where you deploy your Gateway.
The steps to configure that are detailed in the follow on 'Secure, protect, and connect' guide.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should provide link to Secure, protect, and connect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, and held off as the link is at the end of the guide as a follow on.
However, no harm in linking here too for easier navigation options.


For example installation details, see [installing Grafana on OpenShift](https://cloud.redhat.com/experts/o11y/ocp-grafana/). When installed, you must add your Thanos instance as a data source to Grafana. Alternatively, if you are using only the user workload monitoring stack in your OpenShift cluster, and not writing metrics to an external Thanos instance, you can [set up a data source to the thanos-querier route in the OpenShift cluster](https://docs.openshift.com/container-platform/4.15/observability/monitoring/accessing-third-party-monitoring-apis.html#accessing-metrics-from-outside-cluster_accessing-monitoring-apis-by-using-the-cli).
The [example Grafana dashboards and alerts](https://docs.kuadrant.io/latest/kuadrant-operator/doc/observability/examples/) for observing Kuadrant functionality use low-level CPU metrics and network metrics available from the user monitoring stack in OpenShift. They also use resource state metrics from Gateway API and Kuadrant resources.
jsmolar marked this conversation as resolved.
Show resolved Hide resolved

For Grafana installation details, see [installing Grafana on OpenShift](https://cloud.redhat.com/experts/o11y/ocp-grafana/). When installed, you must [set up a data source to the thanos-querier route in the OpenShift cluster](https://docs.openshift.com/container-platform/4.15/observability/monitoring/accessing-third-party-monitoring-apis.html#accessing-metrics-from-outside-cluster_accessing-monitoring-apis-by-using-the-cli).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanos datasource setup is also described in `install Grafana on OpenShifte guide.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.
I think i'll call that out here, but keep the 2nd link as well as it's a more 'more details' and official way of accessing thanos-querier.


### Step 7 - Setup the catalogsource

Expand Down
40 changes: 40 additions & 0 deletions doc/user-guides/secure-protect-connect-single-multi-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,46 @@ kubectl get gateway ${gatewayName} -n ${gatewayNS} -o=jsonpath='{.status.listene

Kuadrant can help with this by using a TLSPolicy.

### Step 4a - (Optional) Configure metrics to be scraped from the Gateway instance

If you have prometheus in your cluster, set up a metrics proxy service and a ServiceMonitor to configure it to scrape metrics directly from the Gateway pod.
This must be done in the namespace where a Gateway is running.
The reason for the metrics proxy service is because the metrics port (15020) is not exposed by the default Service of the Gateway.
This configuration is required for metrics such as `istio_requests_total`.

```bash
kubectl apply -f - <<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: ingress-gateway
namespace: ${gatewayNS}
spec:
selector:
matchLabels:
istio.io/gateway-name: ${gatewayName}
endpoints:
- port: metrics
path: /stats/prometheus
---
apiVersion: v1
kind: Service
metadata:
name: ingress-metrics-proxy
namespace: ${gatewayNS}
labels:
istio.io/gateway-name: ${gatewayName}
spec:
selector:
istio.io/gateway-name: ${gatewayName}
ports:
- name: metrics
protocol: TCP
port: 15020
targetPort: 15020
EOF
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done using a single PodMonitor that targets all your Gateways because Istio annotates the gateway pods with the port where the metrics are served. It's slightly convoluted though, but has the advantage of targeting all gateways in the namespace with a single PodMonitor:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: istio-proxies-monitor
spec:
  selector:
    matchExpressions:
      - key: istio-prometheus-ignore
        operator: DoesNotExist
  podMetricsEndpoints:
    - path: /stats/prometheus
      interval: 30s
      relabelings:
        - action: keep
          sourceLabels: ["__meta_kubernetes_pod_container_name"]
          regex: "istio-proxy"
        - action: keep
          sourceLabels:
            ["__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape"]
        - action: replace
          regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
          replacement: "[$2]:$1"
          sourceLabels:
            [
              "__meta_kubernetes_pod_annotation_prometheus_io_port",
              "__meta_kubernetes_pod_ip",
            ]
          targetLabel: "__address__"
        - action: replace
          regex: (\d+);((([0-9]+?)(\.|$)){4})
          replacement: "$2:$1"
          sourceLabels:
            [
              "__meta_kubernetes_pod_annotation_prometheus_io_port",
              "__meta_kubernetes_pod_ip",
            ]
          targetLabel: "__address__"
        - action: labeldrop
          regex: "__meta_kubernetes_pod_label_(.+)"
        - sourceLabels: ["__meta_kubernetes_namespace"]
          action: replace
          targetLabel: namespace
        - sourceLabels: ["__meta_kubernetes_pod_name"]
          action: replace
          targetLabel: pod_name

I recall getting this from the Istio documentation, but I can't find it now ...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a similar looking 'additionalScrapeConfig' here but it doesn't work with user workload monitoring on Openshift due to restrictions on what can be configured.

If this single PodMonitor approach works with UWM, I think that would be more robust than the Service/ServiceMonitor approach.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have tested this in OCP with UWM and works as expected.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note that I tested it using sail-operator to install Istio, but it should work the same for other Istio install methods.


### Step 5 - Secure and protect the Gateway with auth, TLS, rate limit, and DNS policies

While your Gateway is now deployed, it has no exposed endpoints and your listener is not programmed. Next, you can set up a `TLSPolicy` that leverages your CertificateIssuer to set up your listener certificates.
Expand Down
Loading