Prometheus - custom metrics support + other improvements (BerriAI#7489)

* fix(prometheus.py): refactor litellm_input_tokens_metric to use label factory makes adding new metrics easier * feat(prometheus.py): add 'request_model' to 'litellm_input_tokens_metric' * refactor(prometheus.py): refactor 'litellm_output_tokens_metric' to use label factory makes adding new metrics easier * feat(prometheus.py): emit requested model in 'litellm_output_tokens_metric' * feat(prometheus.py): support tracking success events with custom metrics * refactor(prometheus.py): refactor '_set_latency_metrics' to just use the initially created enum values dictionary reduces scope for missing values * feat(prometheus.py): refactor all tags to support custom metadata tags enables metadata tags to be used across for e2e tracking * fix(prometheus.py): fix requested model on success event enum_values * test: fix test * test: fix test * test: handle filenotfound error * docs(prometheus.md): add new values to prometheus * docs(prometheus.md): document adding custom metrics on prometheus * bump: version 1.56.5 → 1.56.6
rajatvig · Jan 15, 2025 · 25ac6bd · 25ac6bd
1 parent 82d7e19
commit 25ac6bd
Show file tree

Hide file tree

Showing 5 changed files with 266 additions and 98 deletions.
diff --git a/docs/my-website/docs/proxy/prometheus.md b/docs/my-website/docs/proxy/prometheus.md
@@ -64,9 +64,9 @@ Use this for for tracking per [user, key, team, etc.](virtual_keys)
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
 | `litellm_spend_metric`                | Total Spend, per `"user", "key", "model", "team", "end-user"`                 |
-| `litellm_total_tokens`         | input + output tokens per `"user", "key", "model", "team", "end-user"`     |
-| `litellm_input_tokens`         | input tokens per `"user", "key", "model", "team", "end-user"`              |
-| `litellm_output_tokens`        | output tokens per `"user", "key", "model", "team", "end-user"`             |
+| `litellm_total_tokens`         | input + output tokens per `"end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model"`     |
+| `litellm_input_tokens`         | input tokens per `"end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model"`     |
+| `litellm_output_tokens`        | output tokens per `"end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model"`             |
 
 ## Proxy Level Tracking Metrics
 
@@ -134,8 +134,8 @@ Use this for LLM API Error monitoring and tracking remaining rate limits and tok
 
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
-| `litellm_request_total_latency_metric`             | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels `model`, `hashed_api_key`, `api_key_alias`, `team`, `team_alias` |
-| `litellm_llm_api_latency_metric`             | Latency (seconds) for just the LLM API call - tracked for labels `model`, `hashed_api_key`, `api_key_alias`, `team`, `team_alias` |
+| `litellm_request_total_latency_metric`             | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model" |
+| `litellm_llm_api_latency_metric`  | Latency (seconds) for just the LLM API call - tracked for labels "model", "hashed_api_key", "api_key_alias", "team", "team_alias", "requested_model", "end_user", "user" |
 | `litellm_llm_api_time_to_first_token_metric`             | Time to first token for LLM API call - tracked for labels `model`, `hashed_api_key`, `api_key_alias`, `team`, `team_alias` [Note: only emitted for streaming requests] |
 
 ## Virtual Key - Budget, Rate Limit Metrics
@@ -149,6 +149,55 @@ Metrics used to track LiteLLM Proxy Budgeting and Rate limiting logic
 | `litellm_remaining_api_key_requests_for_model`                | Remaining Requests for a LiteLLM virtual API key, only if a model-specific rate limit (rpm) has been set for that virtual key. Labels: `"hashed_api_key", "api_key_alias", "model"`|
 | `litellm_remaining_api_key_tokens_for_model`                | Remaining Tokens for a LiteLLM virtual API key, only if a model-specific token limit (tpm) has been set for that virtual key. Labels: `"hashed_api_key", "api_key_alias", "model"`|
 
+## [BETA] Custom Metrics
+
+Track custom metrics on prometheus on all events mentioned above. 
+
+1. Define the custom metrics in the `config.yaml`
+
+```yaml
+model_list:
+  - model_name: openai/gpt-3.5-turbo
+    litellm_params:
+      model: openai/gpt-3.5-turbo
+      api_key: os.environ/OPENAI_API_KEY
+
+litellm_settings:
+  callbacks: ["prometheus"]
+  custom_prometheus_metadata_labels: ["metadata.foo", "metadata.bar"]
+```
+
+2. Make a request with the custom metadata labels
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer <LITELLM_API_KEY>' \
+-d '{
+    "model": "openai/gpt-3.5-turbo",
+    "messages": [
+      {
+        "role": "user",
+        "content": [
+          {
+            "type": "text",
+            "text": "What'\''s in this image?"
+          }
+        ]
+      }
+    ],
+    "max_tokens": 300,
+    "metadata": {
+        "foo": "hello world"
+    }
+}'
+```
+
+3. Check your `/metrics` endpoint for the custom metrics  
+
+```
+... "tag": "hello world" ...
+```
 
 
 ## Monitor System Health
@@ -170,6 +219,7 @@ litellm_settings:
 | `litellm_redis_fails`         | Number of failed redis calls    |
 | `litellm_self_latency`         | Histogram latency for successful litellm api call    |
 
+
 ## **🔥 LiteLLM Maintained Grafana Dashboards **
 
 Link to Grafana Dashboards maintained by LiteLLM
@@ -194,6 +244,7 @@ Here is a screenshot of the metrics you can monitor with the LiteLLM Grafana Das
 | `litellm_requests_metric`             | **deprecated** use `litellm_proxy_total_requests_metric` |
 
 
+
 ## FAQ 
 
 ### What are `_created` vs. `_total` metrics?

diff --git a/litellm/__init__.py b/litellm/__init__.py
@@ -308,6 +308,7 @@
 max_end_user_budget: Optional[float] = None
 disable_end_user_cost_tracking: Optional[bool] = None
 disable_end_user_cost_tracking_prometheus_only: Optional[bool] = None
+custom_prometheus_metadata_labels: Optional[List[str]] = None
 #### REQUEST PRIORITIZATION ####
 priority_reservation: Optional[Dict[str, float]] = None
 #### RELIABILITY ####