Update monitoring docs to use OpenShift built-in monitoring stack

Signed-off-by: David Kwon <[email protected]>
eclipse-che · Apr 17, 2023 · 66db837 · 66db837
1 parent 7aa89dd
commit 66db837
Show file tree

Hide file tree

Showing 22 changed files with 191 additions and 321 deletions.
diff --git a/...uide/images/monitoring/monitoring-che-che-server-jvm-dashboard-buffer-pools.png b/...uide/images/monitoring/monitoring-che-che-server-jvm-dashboard-buffer-pools.png
diff --git a/...uide/images/monitoring/monitoring-che-che-server-jvm-dashboard-classloading.png b/...uide/images/monitoring/monitoring-che-che-server-jvm-dashboard-classloading.png
diff --git a/...mages/monitoring/monitoring-che-che-server-jvm-dashboard-garbage-collection.png b/...mages/monitoring/monitoring-che-che-server-jvm-dashboard-garbage-collection.png
diff --git a/...es/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-heap.png b/...es/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-heap.png
diff --git a/...onitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-non-heap.png b/...onitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-non-heap.png
diff --git a/...-guide/images/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory.png b/...-guide/images/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory.png
diff --git a/...on-guide/images/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-misc.png b/...on-guide/images/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-misc.png
diff --git a/...guide/images/monitoring/monitoring-che-che-server-jvm-dashboard-quick-facts.png b/...guide/images/monitoring/monitoring-che-che-server-jvm-dashboard-quick-facts.png
diff --git a/...inistration-guide/images/monitoring/monitoring-che-che-server-jvm-dashboard.png b/...inistration-guide/images/monitoring/monitoring-che-che-server-jvm-dashboard.png
diff --git a/...ministration-guide/images/monitoring/monitoring-dev-workspace-metrics-panel.png b/...ministration-guide/images/monitoring/monitoring-dev-workspace-metrics-panel.png
diff --git a/...n-guide/images/monitoring/monitoring-dev-workspace-operator-metrics-panel-1.png b/...n-guide/images/monitoring/monitoring-dev-workspace-operator-metrics-panel-1.png
diff --git a/...n-guide/images/monitoring/monitoring-dev-workspace-operator-metrics-panel-2.png b/...n-guide/images/monitoring/monitoring-dev-workspace-operator-metrics-panel-2.png
diff --git a/...ion-guide/images/monitoring/monitoring-dev-workspace-operator-metrics-panel.png b/...ion-guide/images/monitoring/monitoring-dev-workspace-operator-metrics-panel.png
diff --git a/modules/administration-guide/nav.adoc b/modules/administration-guide/nav.adoc
@@ -49,8 +49,6 @@
 **** xref:creating-a-telemetry-plugin.adoc[]
 *** xref:configuring-server-logging.adoc[]
 *** xref:collecting-logs-using-chectl.adoc[]
-*** xref:monitoring-with-prometheus-and-grafana.adoc[]
-**** xref:installing-prometheus-and-grafana.adoc[]
 **** xref:monitoring-the-dev-workspace-operator.adoc[]
 **** xref:monitoring-che.adoc[]
 ** xref:configuring-networking.adoc[]

diff --git a/modules/administration-guide/pages/installing-prometheus-and-grafana.adoc b/modules/administration-guide/pages/installing-prometheus-and-grafana.adoc
diff --git a/modules/administration-guide/pages/monitoring-the-dev-workspace-operator.adoc b/modules/administration-guide/pages/monitoring-the-dev-workspace-operator.adoc
@@ -8,8 +8,7 @@
 [id="monitoring-the-dev-workspace-operator"]
 = Monitoring the {devworkspace} Operator
 
-
-You can configure an example monitoring stack to process metrics exposed by the {devworkspace} Operator.
+You can configure the OpenShift in-cluster monitoring stack to scrape metrics exposed by the {devworkspace} Operator.
 
 include::partial$proc_collecting-dev-workspace-operator-metrics-with-prometheus.adoc[leveloffset=+1]
 

diff --git a/modules/administration-guide/pages/monitoring-with-prometheus-and-grafana.adoc b/modules/administration-guide/pages/monitoring-with-prometheus-and-grafana.adoc
diff --git a/.../administration-guide/partials/proc_collecting-che-metrics-with-prometheus.adoc b/.../administration-guide/partials/proc_collecting-che-metrics-with-prometheus.adoc
@@ -7,62 +7,127 @@ To use Prometheus to collect, store, and query JVM metrics for {prod-short} Serv
 
 .Prerequisites
 
-* {prod-short} is exposing metrics on port `8087`. See xref:enabling-and-exposing-{prod-id-short}-metrics[Enabling and exposing {prod-short} server JVM metrics].
+* An active `{orch-cli}` session with administrative permissions to the destination {orch-name} cluster. See {orch-cli-link}.
+
+* An instance of {prod-short} running in {orch-name}.
 
-* Prometheus 2.26.0 or later is running. The Prometheus console is running on port `9090` with a corresponding Service. See link:https://prometheus.io/docs/introduction/first_steps/[First steps with Prometheus].
+* {prod-short} is exposing metrics on port `8087`. See xref:enabling-and-exposing-{prod-id-short}-metrics[Enabling and exposing {prod-short} server JVM metrics].
 
 .Procedure
 
-. Configure Prometheus to scrape metrics from port `8087`.
-+
-NOTE: The xref:installing-prometheus-and-grafana.adoc[example monitoring stack] already creates the `prometheus-config` ConfigMap with an empty configuration. To provide the Prometheus configuration details, edit the `data` field of the ConfigMap.
+. Create the ServiceMonitor for detecting the {prod-short} JVM metrics service
 +
-.Prometheus configuration
+.ServiceMonitor
 ====
 [source,yaml,subs="+quotes,+attributes,+macros"]
 ----
-apiVersion: v1
-kind: ConfigMap
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
 metadata:
-  name: prometheus-config
-data:
-  prometheus.yml: |-
-      global:
-        scrape_interval:     5s             <1>
-        evaluation_interval: 5s             <2>
-      scrape_configs:                       <3>
-        - job_name: '{prod-short} Server'
-          static_configs:
-            - targets: ['che-host.__<{prod-short}_{orch-namespace}>__:8087']  <4>
+  name: che-host
+  namespace: {prod-namespace} <1>
+spec:
+  endpoints:
+    - interval: 10s <2>
+      port: metrics
+      scheme: http
+  namespaceSelector:
+    matchNames:
+      - openshift-devspaces
+  selector:
+    matchLabels:
+      app.kubernetes.io/name: devspaces
 ----
-<1> The rate at which a target is scraped.
-<2> The rate at which the recording and alerting rules are re-checked.
-<3> The resources that Prometheus monitors. In the default configuration, a single job, `{prod-short} Server`, scrapes the time series data exposed by {prod-short} Server.
-<4> The scrape target for the metrics from port `8087`. Replace `__<{prod-short}_{orch-namespace}>__` with the {prod-short} {orch-namespace}. The default {prod-short} {orch-namespace} is `{prod-namespace}`.
+<1> The {prod-short} namespace. The default is `{prod-namespace}`.
+<2> The rate at which a target is scraped.
 ====
 
-. Scale the `Prometheus` Deployment down and up to read the updated ConfigMap from the previous step.
+. Create a Role and RoleBinding to allow Prometheus view the metrics
+
 +
-[source,terminal,subs="+attributes,quotes"]
+.Role
+====
+[source,yaml,subs="+quotes,+attributes,+macros"]
 ----
-$ {orch-cli} scale --replicas=0 deployment/prometheus -n monitoring && {orch-cli} scale --replicas=1 deployment/prometheus -n monitoring
+kind: Role
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: prometheus-k8s
+  namespace: {prod-namespace} <1>
+rules:
+  - verbs:
+      - get
+      - list
+      - watch
+    apiGroups:
+      - ''
+    resources:
+      - services
+      - endpoints
+      - pods
 ----
+<1> The {prod-short} namespace. The default is `{prod-namespace}`.
+====
 
-.Verification
++
+.RoleBinding
+====
+[source,yaml,subs="+quotes,+attributes,+macros"]
+----
+kind: RoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: view-openshift-monitoring-prometheus-k8s
+  namespace: {prod-namespace} <1>
+subjects:
+  - kind: ServiceAccount
+    name: prometheus-k8s
+    namespace: openshift-monitoring
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: Role
+  name: prometheus-k8s
+----
+<1> The {prod-short} namespace. The default is `{prod-namespace}`.
+====
 
-. Use port forwarding to access the `Prometheus` Service locally:
+. Allow the in-cluster Prometheus instance to detect the ServiceMonitor in the {prod-short} namespace.
 +
+====
 [source,terminal,subs="+attributes,quotes"]
 ----
-$ {orch-cli} port-forward svc/prometheus 9090:9090 -n monitoring
+oc label namespace {prod-namespace} openshift.io/cluster-monitoring=true
 ----
-. Verify that all targets are up by viewing the `targets` endpoint at `localhost:9090/targets`.
-. Use the Prometheus console to view and query metrics:
-** View metrics at `localhost:9090/metrics`.
-** Query metrics from `localhost:9090/graph`.
+====
+
+.Verification
+
+. In the *Administrator* view of the OpenShift web console, navigate to *Observe* -> *Metrics* from the sidebar menu on the left.
+
+. Run a PromQL query to confirm that the metrics are available. For example, enter `process_uptime_seconds{job="che-host"}` and click *Run queries*.
 +
 For more information, see link:https://prometheus.io/docs/introduction/first_steps/#using-the-expression-browser[Using the expression browser].
 
+[TIP]
+====
+
+If the metrics are not available, view the Prometheus container logs for possible RBAC-related errors.
+
+Get the Prometheus pod name:
+```
+$ oc get pods -l app.kubernetes.io/name=prometheus -n openshift-monitoring -o=jsonpath='{.items[*].metadata.name}'
+prometheus-k8s-0
+```
+According to the output above, there is a Prometheus pod named `prometheus-k8s-0`.
+
+To print the last 20 lines of the Prometheus container logs from the `prometheus-k8s-0` pod:
+```
+$ oc logs --tail=20 prometheus-k8s-0 -c prometheus -n openshift-monitoring
+```
+
+====
+
+[role="_additional-resources"]
 .Additional resources
 
 * link:https://prometheus.io/docs/prometheus/latest/configuration/configuration/[Configuring Prometheus]