Realtime analytics server for AI DIAL. The service consumes the logs stream from AI DIAL Core, analyzes the conversation and writes the analytics to the InfluxDB.
Refer to Documentation to learn how to configure AI DAL Core and other necessary components.
Check the AI DIAL Core documentation to configure the way to send the logs to the instance of the realtime analytics server.
The realtime analytics server analyzes the logs stream provided by Vector in the realtime and writes metrics to the InfluxDB.
The logs for /chat/completions
and /embeddings
endpoints are saved to the analytics
measurement with the following tags and fields:
Tag | Description |
---|---|
model | The model name for the request. |
deployment | The deployment name of the model or application for the request. |
parent_deployment | The deployment name of the model or application that called the current deployment. |
execution_path | A list of deployment calls representing the call stack of the request. E.g. ['app1', 'app2', 'model1'] means app1 called app2 and app2 called model1 . The last element of the list equals to the deployment tag. The penultimate element of the list (when present) equals to the parent_deployment tag. |
trace_id | OpenTelemetry trace ID. |
core_span_id | OpenTelemetry span ID generated by DIAL Core. |
core_parent_span_id | OpenTelemetry span ID generated by DIAL Core that called the span core_span_id . |
project_id | The project ID for the request. |
language | The language detected for the content of the request. |
upstream | The upstream endpoint used by the DIAL model. |
topic | The topic detected for the content of the request. |
title | The title of the person making the request. |
response_id | Unique ID of the response. |
Field | Description |
---|---|
user_hash | The unique hash identifying the user. |
deployment_price | The cost of this specific request, excluding the cost of any requests it directly or indirectly initiated. |
price | The total cost of the request, including the cost of this request and all related requests it directly or indirectly triggered. |
number_request_messages | The total number of messages in the request. For chat completion requests it's number of messages in the chat history. For embedding requests it's number of inputs. |
chat_id | The unique identifier for the conversation that this request is part of. |
prompt_tokens | The number of tokens in the request. |
completion_tokens | The number of tokens in the response. |
The logs for the /rate
endpoint are saved to the rate_analytics
measurement:
Tag | Description |
---|---|
deployment | The deployment name of the model or application for the request. |
project_id | The project ID for the request. |
title | The title of the person making the request. |
response_id | Unique ID of the response. |
user_hash | The unique hash identifying the user. |
chat_id | The unique identifier for the conversation that this request is part of. |
Field | Description |
---|---|
dislike_count | 1 for a thumbs up request, otherwise 0. |
like_count | 1 for a thumbs down request, otherwise 0. |
Copy .env.example
to .env
and customize it for your environment.
You need to specify the connection options to the InfluxDB instance using the environment variables:
Variable | Description |
---|---|
INFLUX_URL | Url to the InfluxDB to write the analytics data |
INFLUX_ORG | Name of the InfluxDB organization to write the analytics data |
INFLUX_BUCKET | Name of the bucket to write the analytics data |
INFLUX_API_TOKEN | InfluxDB API Token |
You can follow the InfluxDB documentation to setup InfluxDB locally and acquire the required configuration parameters.
Also, following environment valuables can be used to configure the service behavior:
Variable | Default | Description |
---|---|---|
MODEL_RATES | {} | Specifies per-token price rates for models in JSON format |
TOPIC_MODEL | ./topic_model | Specifies the name or path for the topic model. If the model is specified by name, it will be downloaded from, the Huggingface. |
TOPIC_EMBEDDINGS_MODEL | None | Specifies the name or path for the embeddings model used with the topic model. If the model is specified by name, it will be downloaded from, the Huggingface. If None, the name will be used from the topic model config. |
Example of the MODEL_RATES configuration:
{
"gpt-4": {
"unit":"token",
"prompt_price":"0.00003",
"completion_price":"0.00006"
},
"gpt-35-turbo": {
"unit":"token",
"prompt_price":"0.0000015",
"completion_price":"0.000002"
},
"gpt-4-32k": {
"unit":"token",
"prompt_price":"0.00006",
"completion_price":"0.00012"
},
"text-embedding-ada-002": {
"unit":"token",
"prompt_price":"0.0000001"
},
"chat-bison@001": {
"unit":"char_without_whitespace",
"prompt_price":"0.0000005",
"completion_price":"0.0000005"
}
}
This project uses Python>=3.11 and Poetry>=1.6.1 as a dependency manager. Check out Poetry's documentation on how to install it on your system before proceeding.
To install requirements:
poetry install
This will install all requirements for running the package, linting, formatting and tests.
To build the wheel packages run:
make build
To run the development server locally run:
make serve
The server will be running as http://localhost:5001
To build the docker image run:
make docker_build
To run the server locally from the docker image run:
make docker_serve
The server will be running as http://localhost:5001
Run the linting before committing:
make lint
To auto-fix formatting issues run:
make format
Run unit tests locally:
make test
To remove the virtual environment and build artifacts:
make clean