Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring, Observability and HPA doc improvements #531

Merged
merged 4 commits into from
Nov 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions helm-charts/HPA.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Read [post-install](#post-install) steps before installation!

### Resource requests

HPA controlled CPU pods SHOULD have appropriate resource requests or affinity rules (enabled in their
HPA controlled _CPU_ pods SHOULD have appropriate resource requests or affinity rules (enabled in their
subcharts and tested to work) so that k8s scheduler does not schedule too many of them on the same
node(s). Otherwise they never reach ready state.

Expand Down Expand Up @@ -79,7 +79,7 @@ Why HPA is opt-in:
- Top level chart name needs to conform to Prometheus metric naming conventions,
as it is also used as a metric name prefix (with dashes converted to underscores)
- Unless pod resource requests, affinity rules, scheduling topology constraints and/or cluster NRI
policies are used to better isolate service inferencing pods from each other, instances
policies are used to better isolate _CPU_ inferencing pods from each other, service instances
scaled up on same node may never get to ready state
- Current HPA rules are just examples, for efficient scaling they need to be fine-tuned for given setup
performance (underlying HW, used models and data types, OPEA version etc)
Expand All @@ -94,8 +94,9 @@ ChatQnA includes pre-configured values files for scaling the services.
To enable HPA, add `-f chatqna/hpa-values.yaml` option to your `helm install` command line.

If **CPU** versions of TGI (and TEI) services are being scaled, resource requests and probe timings
suitable for CPU usage need to be used. Add `-f chatqna/cpu-values.yaml` option to your `helm install`
line. If you need to change model specified there, update the resource requests accordingly.
suitable for CPU usage need to be used. `chatqna/cpu-values.yaml` provides example of such constraints
which can be added (with `-f` option) to your Helm install. As those values depend on the underlying HW,
used model, data type and image versions, the specified resource values may need to be updated.

### Post-install

Expand Down
11 changes: 4 additions & 7 deletions helm-charts/monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
- [Pre-conditions](#pre-conditions)
- [Prometheus install](#prometheus-install)
- [Helm options](#helm-options)
- [Gotchas](#gotchas)
- [Install](#install)
- [Verify](#verify)

Expand All @@ -17,6 +16,10 @@ which can be visualized e.g. in [Grafana](https://grafana.com/).

Scaling the services automatically based on their usage with [HPA](HPA.md) also relies on these metrics.

[Observability documentation](../kubernetes-addons/Observability/README.md)
explains how to install additional monitoring for node and device metrics,
and Grafana for visualizing those metrics.

## Pre-conditions

### Prometheus install
Expand All @@ -42,12 +45,6 @@ provide that as `global.prometheusRelease` value for the OPEA service Helm insta
or in its `values.yaml` file. Otherwise Prometheus ignores the installed
`serviceMonitor` objects.

## Gotchas

By default Prometheus adds [k8s RBAC rules](https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/prometheus-roleBindingSpecificNamespaces.yaml)
for detecting `serviceMonitor`s and querying metrics from `default`, `kube-system` and `monitoring` namespaces.
If Helm is asked to install OPEA service to some other namespace, those rules need to be updated accordingly.

## Install

Install Helm chart with `global.monitoring:true` option.
Expand Down
4 changes: 3 additions & 1 deletion kubernetes-addons/Observability/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# How-To Setup Observability for OPEA Workload in Kubernetes

This guide provides a step-by-step approach to setting up observability for the OPEA workload in a Kubernetes environment. We will cover the setup of Prometheus and Grafana, as well as the collection of metrics for Gaudi hardware, OPEA/chatqna including TGI,TEI-Embedding,TEI-Reranking and other microservies, and PCM.
This guide provides a step-by-step approach to setting up observability for the OPEA workload in a Kubernetes environment. We will cover the setup of Prometheus and Grafana, as well as the collection of metrics for Gaudi hardware, OPEA/chatqna including TGI, TEI-Embedding, TEI-Reranking and other microservices, and PCM.

For monitoring Helm installed OPEA applications, see [Helm monitoring option](../../helm-charts/monitoring.md).

## Prepare

Expand Down