Skip to content

Latest commit

 

History

History
95 lines (66 loc) · 3.03 KB

monitoring.md

File metadata and controls

95 lines (66 loc) · 3.03 KB

Monitoring support

Table of Contents

Introduction

Monitoring provides service component usage metrics for Prometheus, which can be visualized e.g. in Grafana.

Scaling the services automatically based on their usage with HPA also relies on these metrics.

Observability documentation explains how to install additional monitoring for node and device metrics, and Grafana for visualizing those metrics.

Pre-conditions

Prometheus install

If cluster does not run Prometheus operator yet, it SHOULD be be installed before enabling monitoring, e.g. by using a Helm chart for it: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack

To install (older version) of Prometheus:

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ prom_ns=monitoring  # namespace for Prometheus
$ kubectl create ns $prom_ns
$ helm install prometheus-stack prometheus-community/kube-prometheus-stack --version 55.5.2 -n $prom_ns

Helm options

If Prometheus is installed under some other release name than prometheus-stack, provide that as global.prometheusRelease value for the OPEA service Helm install, or in its values.yaml file. Otherwise Prometheus ignores the installed serviceMonitor objects.

Install

Install Helm chart with global.monitoring:true option.

Verify

Check installed Prometheus service names:

$ prom_ns=monitoring  # Prometheus namespace
$ kubectl -n $prom_ns get svc

(Object names depend on whether Prometheus was installed from manifests, or Helm, and the release name given for its Helm install.)

Use service name matching your Prometheus installation:

$ prom_svc=prometheus-stack-kube-prom-prometheus  # Metrics service

Verify Prometheus found metric endpoints for chart services, i.e. last number on curl output is non-zero:

$ chart=chatqna # OPEA chart release name
$ prom_url=http://$(kubectl -n $prom_ns get -o jsonpath="{.spec.clusterIP}:{.spec.ports[0].port}" svc/$prom_svc)
$ curl --no-progress-meter $prom_url/metrics | grep scrape_pool_targets.*$chart

Then check that Prometheus metrics from a relevant LLM inferencing service are available.

For vLLM:

$ curl --no-progress-meter $prom_url/api/v1/query? \
  --data-urlencode 'query=vllm:cache_config_info{service="'$chart'-vllm"}' | jq

Or TGI:

$ curl --no-progress-meter $prom_url/api/v1/query? \
  --data-urlencode 'query=tgi_queue_size{service="'$chart'-tgi"}' | jq

NOTE: services provide metrics only after they've processed their first request. And ChatQnA uses (TEI) reranking service only after query context data has been uploaded!