Merge pull request #2 from drewby/ai

Adding OpenAI metrics and fixing markdown errors
lmolkova · Jan 29, 2024 · fdb3ba4 · fdb3ba4
2 parents bdc1982 + c80b80c
commit fdb3ba4
Show file tree

Hide file tree

Showing 8 changed files with 630 additions and 133 deletions.
diff --git a/docs/ai/README.md b/docs/ai/README.md
@@ -19,6 +19,7 @@ Semantic conventions for LLM operations are defined for the following signals:
 
 Technology specific semantic conventions are defined for the following LLM providers:
 
-* [OpenAI](openai.md): Semantic Conventions for *OpenAI*.
+* [OpenAI](openai.md): Semantic Conventions for *OpenAI* spans.
+* [OpenAI Metrics](openai-metrics.md): Semantic Conventions for *OpenAI* metrics.
 
-[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md
+[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md
diff --git a/docs/ai/llm-spans.md b/docs/ai/llm-spans.md
@@ -10,9 +10,9 @@ linkTitle: LLM Calls
 
 <!-- toc -->
 
-- [LLM Request attributes](#llm-request-attributes)
 - [Configuration](#configuration)
-- [Semantic Conventions for specific LLM technologies](#semantic-conventions-for-specific-llm-technologies)
+- [LLM Request attributes](#llm-request-attributes)
+- [Events](#events)
 
 <!-- tocstop -->
 
@@ -35,65 +35,52 @@ By default, these configurations SHOULD NOT capture prompts and completions.
 
 These attributes track input data and metadata for a request to an LLM. Each attribute represents a concept that is common to most LLMs.
 
-<!-- semconv ai(tag=llm-request) -->
+<!-- semconv llm.request -->
 | Attribute  | Type | Description  | Examples  | Requirement Level |
 |---|---|---|---|---|
-| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended |
-| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required |
-| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended |
-| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended |
-| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended |
-| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended |
-| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended |
-
-`llm.model` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.
-
-| Value  | Description |
-|---|---|
-| `gpt-4` | GPT-4 |
-| `gpt-4-32k` | GPT-4 with 32k context window |
-| `gpt-3.5-turbo` | GPT-3.5-turbo |
-| `gpt-3.5-turbo-16k` | GPT-3.5-turbo with 16k context window|
-| `claude-instant-1` | Claude Instant (latest version) |
-| `claude-2` | Claude 2 (latest version) |
-| `other-llm` | Any LLM not listed in this table. Use for any fine-tuned version of a model. |
+| [`llm.request.is_stream`](../attributes-registry/llm.md) | boolean | Whether the LLM responds with a stream. | `False` | Recommended |
+| [`llm.request.max_tokens`](../attributes-registry/llm.md) | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended |
+| [`llm.request.model`](../attributes-registry/llm.md) | string | The name of the LLM a request is being made to. [1] | `gpt-4` | Required |
+| [`llm.request.stop_sequences`](../attributes-registry/llm.md) | string | Array of strings the LLM uses as a stop sequence. | `stop1` | Recommended |
+| [`llm.request.temperature`](../attributes-registry/llm.md) | double | The temperature setting for the LLM request. | `0.0` | Recommended |
+| [`llm.request.top_p`](../attributes-registry/llm.md) | double | The top_p sampling setting for the LLM request. | `1.0` | Recommended |
+| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended |
+| [`llm.response.id`](../attributes-registry/llm.md) | string[] | The unique identifier for the completion. | `[chatcmpl-123]` | Recommended |
+| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. [2] | `gpt-4-0613` | Required |
+| [`llm.system`](../attributes-registry/llm.md) | string | The name of the LLM foundation model vendor, if applicable. [3] | `openai` | Recommended |
+| [`llm.usage.completion_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM response (completion). | `180` | Recommended |
+| [`llm.usage.prompt_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM prompt. | `100` | Recommended |
+| [`llm.usage.total_tokens`](../attributes-registry/llm.md) | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended |
+
+**[1]:** The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned.
+
+**[2]:** The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned.
+
+**[3]:** The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank.
 <!-- endsemconv -->
 
-## LLM Response attributes
+## Events
+
+In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation.
 
-These attributes track output data and metadata for a response from an LLM. Each attribute represents a concept that is common to most LLMs.
+<!-- semconv llm.content.prompt -->
+The event name MUST be `llm.content.prompt`.
 
-<!-- semconv ai(tag=llm-response) -->
 | Attribute  | Type | Description  | Examples  | Requirement Level |
 |---|---|---|---|---|
-| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended |
-| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required |
-| `llm.response.finish_reason` | string | The reason the model stopped generating tokens | `stop` | Recommended |
-| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | Recommended |
-| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | Recommended |
-| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended |
-
-`llm.response.finish_reason` MUST be one of the following:
-
-| Value  | Description |
-|---|---|
-| `stop` | If the model hit a natural stop point or a provided stop sequence. |
-| `max_tokens` | If the maximum number of tokens specified in the request was reached. |
-| `tool_call` | If a function / tool call was made by the model (for models that support such functionality). |
-<!-- endsemconv -->
+| [`llm.prompt`](../attributes-registry/llm.md) | string | The full prompt string sent to an LLM in a request. [1] | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` | Recommended |
 
-## Events
+**[1]:** The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object, this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention.
+<!-- endsemconv -->
 
-In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation.
+<!-- semconv llm.content.completion -->
+The event name MUST be `llm.content.completion`.
 
-<!-- semconv ai(tag=llm-prompt) -->
 | Attribute  | Type | Description  | Examples  | Requirement Level |
-| `llm.prompt` | string | The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object made up of several pieces (such as OpenAI's different message types), this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. | `\n\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\n\nAssistant:` | Recommended |
-<!-- endsemconv -->
+|---|---|---|---|---|
+| [`llm.completion`](../attributes-registry/llm.md) | string | The full response string from an LLM in a response. [1] | `Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!` | Recommended |
 
-<!-- semconv ai(tag=llm-completion) -->
-| Attribute  | Type | Description  | Examples  | Requirement Level |
-| `llm.completion` | string | The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention.| `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Recommended |
+**[1]:** The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention.
 <!-- endsemconv -->
 
-[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md
+[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md