Skip to content

Commit

Permalink
Prometheus - custom metrics support + other improvements (BerriAI#7489)
Browse files Browse the repository at this point in the history
* fix(prometheus.py): refactor litellm_input_tokens_metric to use label factory

makes adding new metrics easier

* feat(prometheus.py): add 'request_model' to 'litellm_input_tokens_metric'

* refactor(prometheus.py): refactor 'litellm_output_tokens_metric' to use label factory

makes adding new metrics easier

* feat(prometheus.py): emit requested model in 'litellm_output_tokens_metric'

* feat(prometheus.py): support tracking success events with custom metrics

* refactor(prometheus.py): refactor '_set_latency_metrics' to just use the initially created enum values dictionary

reduces scope for missing values

* feat(prometheus.py): refactor all tags to support custom metadata tags

enables metadata tags to be used across for e2e tracking

* fix(prometheus.py): fix requested model on success event enum_values

* test: fix test

* test: fix test

* test: handle filenotfound error

* docs(prometheus.md): add new values to prometheus

* docs(prometheus.md): document adding custom metrics on prometheus

* bump: version 1.56.5 → 1.56.6
  • Loading branch information
krrishdholakia authored and rajatvig committed Jan 15, 2025
1 parent 82d7e19 commit 25ac6bd
Show file tree
Hide file tree
Showing 5 changed files with 266 additions and 98 deletions.
61 changes: 56 additions & 5 deletions docs/my-website/docs/proxy/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@ Use this for for tracking per [user, key, team, etc.](virtual_keys)
| Metric Name | Description |
|----------------------|--------------------------------------|
| `litellm_spend_metric` | Total Spend, per `"user", "key", "model", "team", "end-user"` |
| `litellm_total_tokens` | input + output tokens per `"user", "key", "model", "team", "end-user"` |
| `litellm_input_tokens` | input tokens per `"user", "key", "model", "team", "end-user"` |
| `litellm_output_tokens` | output tokens per `"user", "key", "model", "team", "end-user"` |
| `litellm_total_tokens` | input + output tokens per `"end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model"` |
| `litellm_input_tokens` | input tokens per `"end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model"` |
| `litellm_output_tokens` | output tokens per `"end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model"` |

## Proxy Level Tracking Metrics

Expand Down Expand Up @@ -134,8 +134,8 @@ Use this for LLM API Error monitoring and tracking remaining rate limits and tok

| Metric Name | Description |
|----------------------|--------------------------------------|
| `litellm_request_total_latency_metric` | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels `model`, `hashed_api_key`, `api_key_alias`, `team`, `team_alias` |
| `litellm_llm_api_latency_metric` | Latency (seconds) for just the LLM API call - tracked for labels `model`, `hashed_api_key`, `api_key_alias`, `team`, `team_alias` |
| `litellm_request_total_latency_metric` | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model" |
| `litellm_llm_api_latency_metric` | Latency (seconds) for just the LLM API call - tracked for labels "model", "hashed_api_key", "api_key_alias", "team", "team_alias", "requested_model", "end_user", "user" |
| `litellm_llm_api_time_to_first_token_metric` | Time to first token for LLM API call - tracked for labels `model`, `hashed_api_key`, `api_key_alias`, `team`, `team_alias` [Note: only emitted for streaming requests] |

## Virtual Key - Budget, Rate Limit Metrics
Expand All @@ -149,6 +149,55 @@ Metrics used to track LiteLLM Proxy Budgeting and Rate limiting logic
| `litellm_remaining_api_key_requests_for_model` | Remaining Requests for a LiteLLM virtual API key, only if a model-specific rate limit (rpm) has been set for that virtual key. Labels: `"hashed_api_key", "api_key_alias", "model"`|
| `litellm_remaining_api_key_tokens_for_model` | Remaining Tokens for a LiteLLM virtual API key, only if a model-specific token limit (tpm) has been set for that virtual key. Labels: `"hashed_api_key", "api_key_alias", "model"`|

## [BETA] Custom Metrics

Track custom metrics on prometheus on all events mentioned above.

1. Define the custom metrics in the `config.yaml`

```yaml
model_list:
- model_name: openai/gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY

litellm_settings:
callbacks: ["prometheus"]
custom_prometheus_metadata_labels: ["metadata.foo", "metadata.bar"]
```
2. Make a request with the custom metadata labels
```bash
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <LITELLM_API_KEY>' \
-d '{
"model": "openai/gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'\''s in this image?"
}
]
}
],
"max_tokens": 300,
"metadata": {
"foo": "hello world"
}
}'
```

3. Check your `/metrics` endpoint for the custom metrics

```
... "tag": "hello world" ...
```


## Monitor System Health
Expand All @@ -170,6 +219,7 @@ litellm_settings:
| `litellm_redis_fails` | Number of failed redis calls |
| `litellm_self_latency` | Histogram latency for successful litellm api call |


## **🔥 LiteLLM Maintained Grafana Dashboards **

Link to Grafana Dashboards maintained by LiteLLM
Expand All @@ -194,6 +244,7 @@ Here is a screenshot of the metrics you can monitor with the LiteLLM Grafana Das
| `litellm_requests_metric` | **deprecated** use `litellm_proxy_total_requests_metric` |



## FAQ

### What are `_created` vs. `_total` metrics?
Expand Down
1 change: 1 addition & 0 deletions litellm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,7 @@
max_end_user_budget: Optional[float] = None
disable_end_user_cost_tracking: Optional[bool] = None
disable_end_user_cost_tracking_prometheus_only: Optional[bool] = None
custom_prometheus_metadata_labels: Optional[List[str]] = None
#### REQUEST PRIORITIZATION ####
priority_reservation: Optional[Dict[str, float]] = None
#### RELIABILITY ####
Expand Down
Loading

0 comments on commit 25ac6bd

Please sign in to comment.