Skip to content

Commit

Permalink
Monitoring, Observability and HPA doc improvements (#531)
Browse files Browse the repository at this point in the history
* Drop obsolete Gotchas section from monitoring doc

Prometheus uses nowadays ClusterRole/Binding for accessing metrics
from all namespaces.  There's no need to update RBAC rules.

Signed-off-by: Eero Tamminen <[email protected]>

* Slightly improve HPA doc CPU notes

Signed-off-by: Eero Tamminen <[email protected]>

* Link Helm monitoring and k8s observalibity addon docs

Signed-off-by: Eero Tamminen <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Eero Tamminen <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
eero-t and pre-commit-ci[bot] authored Nov 13, 2024
1 parent 66de41c commit 14198fe
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 12 deletions.
9 changes: 5 additions & 4 deletions helm-charts/HPA.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Read [post-install](#post-install) steps before installation!

### Resource requests

HPA controlled CPU pods SHOULD have appropriate resource requests or affinity rules (enabled in their
HPA controlled _CPU_ pods SHOULD have appropriate resource requests or affinity rules (enabled in their
subcharts and tested to work) so that k8s scheduler does not schedule too many of them on the same
node(s). Otherwise they never reach ready state.

Expand Down Expand Up @@ -79,7 +79,7 @@ Why HPA is opt-in:
- Top level chart name needs to conform to Prometheus metric naming conventions,
as it is also used as a metric name prefix (with dashes converted to underscores)
- Unless pod resource requests, affinity rules, scheduling topology constraints and/or cluster NRI
policies are used to better isolate service inferencing pods from each other, instances
policies are used to better isolate _CPU_ inferencing pods from each other, service instances
scaled up on same node may never get to ready state
- Current HPA rules are just examples, for efficient scaling they need to be fine-tuned for given setup
performance (underlying HW, used models and data types, OPEA version etc)
Expand All @@ -94,8 +94,9 @@ ChatQnA includes pre-configured values files for scaling the services.
To enable HPA, add `-f chatqna/hpa-values.yaml` option to your `helm install` command line.

If **CPU** versions of TGI (and TEI) services are being scaled, resource requests and probe timings
suitable for CPU usage need to be used. Add `-f chatqna/cpu-values.yaml` option to your `helm install`
line. If you need to change model specified there, update the resource requests accordingly.
suitable for CPU usage need to be used. `chatqna/cpu-values.yaml` provides example of such constraints
which can be added (with `-f` option) to your Helm install. As those values depend on the underlying HW,
used model, data type and image versions, the specified resource values may need to be updated.

### Post-install

Expand Down
11 changes: 4 additions & 7 deletions helm-charts/monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
- [Pre-conditions](#pre-conditions)
- [Prometheus install](#prometheus-install)
- [Helm options](#helm-options)
- [Gotchas](#gotchas)
- [Install](#install)
- [Verify](#verify)

Expand All @@ -17,6 +16,10 @@ which can be visualized e.g. in [Grafana](https://grafana.com/).

Scaling the services automatically based on their usage with [HPA](HPA.md) also relies on these metrics.

[Observability documentation](../kubernetes-addons/Observability/README.md)
explains how to install additional monitoring for node and device metrics,
and Grafana for visualizing those metrics.

## Pre-conditions

### Prometheus install
Expand All @@ -42,12 +45,6 @@ provide that as `global.prometheusRelease` value for the OPEA service Helm insta
or in its `values.yaml` file. Otherwise Prometheus ignores the installed
`serviceMonitor` objects.

## Gotchas

By default Prometheus adds [k8s RBAC rules](https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/prometheus-roleBindingSpecificNamespaces.yaml)
for detecting `serviceMonitor`s and querying metrics from `default`, `kube-system` and `monitoring` namespaces.
If Helm is asked to install OPEA service to some other namespace, those rules need to be updated accordingly.

## Install

Install Helm chart with `global.monitoring:true` option.
Expand Down
4 changes: 3 additions & 1 deletion kubernetes-addons/Observability/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# How-To Setup Observability for OPEA Workload in Kubernetes

This guide provides a step-by-step approach to setting up observability for the OPEA workload in a Kubernetes environment. We will cover the setup of Prometheus and Grafana, as well as the collection of metrics for Gaudi hardware, OPEA/chatqna including TGI,TEI-Embedding,TEI-Reranking and other microservies, and PCM.
This guide provides a step-by-step approach to setting up observability for the OPEA workload in a Kubernetes environment. We will cover the setup of Prometheus and Grafana, as well as the collection of metrics for Gaudi hardware, OPEA/chatqna including TGI, TEI-Embedding, TEI-Reranking and other microservices, and PCM.

For monitoring Helm installed OPEA applications, see [Helm monitoring option](../../helm-charts/monitoring.md).

## Prepare

Expand Down

0 comments on commit 14198fe

Please sign in to comment.