Add POST _unified for the inference API #3313

jonathan-buttner · 2024-12-16T19:44:54Z

This PR adds the _unified route for the Inference API. This coincides with the elasticsearch PR: elastic/elasticsearch#117589

github-actions · 2024-12-16T19:46:15Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.inference`	⚪	Missing test	Missing test
`inference.put`	⚪	Missing test	Missing test
`inference.stream_inference`	🟠	Missing type	Missing type
`inference.unified_inference`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

jonathan-buttner · 2024-12-16T19:46:26Z

specification/_json_spec/inference.unified_inference.json

+    "stability": "stable",
+    "visibility": "public",
+    "headers": {
+      "accept": ["text/event-stream"],


This is what we have for the streaming api here: https://github.com/elastic/elasticsearch-specification/blob/main/specification/_json_spec/inference.stream_inference.json#L10

Not sure if this is correct if it's indicating that the post request accepts an event stream or that the response returns one 🤔

Both HTTP headers are used in client requests:

Accept tells Elasticsearch that returning event streams is OK

Content-Type is about the request's body, which is indeed JSON here.

In other words, this is correct, thank you!

jonathan-buttner · 2024-12-16T19:47:18Z

specification/inference/unified_inference/UnifiedRequest.ts

+
+/**
+ * Perform inference on the service using the Unified Schema
+ * @rest_spec_name inference.unified_inference


unified_inference seemed to follow stream_inference. I'm open to other ideas though.

No strong opinions, this seems OK.

jonathan-buttner · 2024-12-16T19:47:39Z

specification/inference/unified_inference/UnifiedRequest.ts

+/**
+ * Perform inference on the service using the Unified Schema
+ * @rest_spec_name inference.unified_inference
+ * @availability stack since=8.18.0 stability=stable visibility=public


@davidkyle Double check that this is what we want

This looks good to me.

github-actions · 2024-12-16T19:50:25Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.inference`	⚪	Missing test	Missing test
`inference.put`	⚪	Missing test	Missing test
`inference.stream_inference`	🟠	Missing type	Missing type
`inference.unified_inference`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

pquentin

Thanks! I answered your questions, but did not look at the contents yet. Do we have YAML tests in Elasticsearch for this feature? It would help to validate the requests.

pquentin · 2024-12-17T06:36:34Z

specification/_json_spec/inference.unified_inference.json

+    "stability": "stable",
+    "visibility": "public",
+    "headers": {
+      "accept": ["text/event-stream"],


Both HTTP headers are used in client requests:

Accept tells Elasticsearch that returning event streams is OK

Content-Type is about the request's body, which is indeed JSON here.

In other words, this is correct, thank you!

pquentin · 2024-12-17T06:37:17Z

specification/inference/unified_inference/UnifiedRequest.ts

+
+/**
+ * Perform inference on the service using the Unified Schema
+ * @rest_spec_name inference.unified_inference


No strong opinions, this seems OK.

pquentin · 2024-12-17T06:37:18Z

specification/inference/unified_inference/UnifiedRequest.ts

+/**
+ * Perform inference on the service using the Unified Schema
+ * @rest_spec_name inference.unified_inference
+ * @availability stack since=8.18.0 stability=stable visibility=public


This looks good to me.

maxhniebergall

LGTM

l-trotta · 2024-12-18T09:47:51Z

specification/inference/unified_inference/UnifiedRequest.ts

@@ -0,0 +1,214 @@
+/*


~~we usually try to have only the request class in the corresponding Request file, and all other types we put in the types folder above (this is just to make it easier to maintain)~~

nevermind that, could we just move the main Request type to the top of the file?

l-trotta · 2024-12-18T09:53:43Z

specification/inference/unified_inference/UnifiedRequest.ts

+/**
+ * An object representing part of the conversation.
+ */
+export interface Message {


I think this is missing name

name?: string

l-trotta · 2024-12-18T09:54:12Z

specification/inference/unified_inference/UnifiedRequest.ts

+  /**
+   * The tool call that this message is responding to.
+   */
+  tool_call_id?: string


Suggested change

tool_call_id?: string

tool_call_id?: Id

l-trotta · 2024-12-18T09:55:17Z

specification/inference/unified_inference/UnifiedRequest.ts

+  /**
+   * The identifier of the tool call.
+   */
+  id: string


Suggested change

id: string

id: Id

l-trotta · 2024-12-18T11:47:36Z

specification/inference/unified_inference/UnifiedRequest.ts

+    /**
+     * The upper bound limit for the number of tokens that can be generated for a completion request.
+     */
+    max_completion_tokens?: number


Suggested change

max_completion_tokens?: number

max_completion_tokens?: long

l-trotta · 2024-12-18T11:51:27Z

specification/inference/unified_inference/UnifiedRequest.ts

+    /**
+     * The sampling temperature to use.
+     */
+    temperature?: number


Suggested change

temperature?: number

temperature?: float

l-trotta · 2024-12-18T11:55:31Z

specification/inference/unified_inference/UnifiedRequest.ts

+    /**
+     * Nucleus sampling, an alternative to sampling with temperature.
+     */
+    top_p?: number


Suggested change

top_p?: number

top_p?: float

l-trotta · 2024-12-18T12:04:07Z

specification/inference/_types/Results.ts

+  /**
+   * A unique identifier for the chat completion
+   */
+  id: string


Suggested change

id: string

id: Id

l-trotta · 2024-12-18T12:32:03Z

specification/inference/unified_inference/UnifiedRequest.ts

+  /**
+   * The content of the message.
+   */
+  content: string | Array<ContentObject>


Unions such as this one have to be handled in a separate type for them to be understandable by the static client, so here we need a new type:

/** * @codegen_names string, object */ export type MessageContent = string | Array<ContentObject>

content: MessageContent

l-trotta · 2024-12-18T12:32:57Z

specification/inference/unified_inference/UnifiedRequest.ts

+    /**
+     * Controls which tool is called by the model.
+     */
+    tool_choice?: string | CompletionToolChoice


Same as content above, new type needed to define union:

/** * @codegen_names string, object */ export type CompletionToolType = string | CompletionToolChoice

tool_choice?: CompletionToolType

l-trotta · 2024-12-18T12:36:24Z

@jonathan-buttner I have a question about the response format: is there a way to get it in json format, or is it just text/stream?

jonathan-buttner · 2024-12-18T13:28:54Z

@jonathan-buttner I have a question about the response format: is there a way to get it in json format, or is it just text/stream?

Thanks for the review! Currently it's only text/stream but in the near future we'd like to add a non-streaming version. Is that possible to do using the same route (.../_unified)? Or do we need to create a new one?

jonathan-buttner · 2024-12-18T13:30:14Z

Thanks! I answered your questions, but did not look at the contents yet. Do we have YAML tests in Elasticsearch for this feature? It would help to validate the requests.

Thanks for the review! Not yet, let me see if I can add some.

l-trotta · 2024-12-18T13:49:29Z

@jonathan-buttner I have a question about the response format: is there a way to get it in json format, or is it just text/stream?

Thanks for the review! Currently it's only text/stream but in the near future we'd like to add a non-streaming version. Is that possible to do using the same route (.../_unified)? Or do we need to create a new one?

the thing is: if the return type is just text then the response body should be just string, because the whole body structure is needed just for json, not other types of responses. @pquentin like we were talking about hot threads remember?

jonathan-buttner · 2024-12-18T13:53:26Z

@jonathan-buttner I have a question about the response format: is there a way to get it in json format, or is it just text/stream?

Thanks for the review! Currently it's only text/stream but in the near future we'd like to add a non-streaming version. Is that possible to do using the same route (.../_unified)? Or do we need to create a new one?

the thing is: if the return type is just text then the response body should be just string, because the whole body structure is needed just for json, not other types of responses. @pquentin like we were talking about hot threads remember?

Oh sorry I might have misunderstood. Here's what Postman parses out:

Where each one of those messages is valid json:

{
    "id": "chatcmpl-AfVcn7De1ibDPKB7iBy83oEhpHxC9",
    "choices": [],
    "model": "gpt-4o-2024-08-06",
    "object": "chat.completion.chunk",
    "usage": {
        "completion_tokens": 6,
        "prompt_tokens": 63,
        "total_tokens": 69
    }
}

Except for the [DONE] message I suppose.

Do we still want the response to be only text?

pquentin · 2024-12-18T14:34:08Z

Postman makes it look mainly like JSON, but those are server-sent events. Is this unified API going to be used for multiple providers? OpenAI, Anthropic Claude, and Google Gemini all use server-sent events but then provide really different structures: https://til.simonwillison.net/llms/streaming-llm-apis.

If yes, then clients would need to know what provider this is (in the content-type header?). There would not be a need to map the options in the spec; it's OK to treat it as an opaque string or buffer. Also, I'm afraid that more complete discussions around this will have to wait for next year.

jonathan-buttner · 2024-12-18T14:46:50Z

Postman makes it look mainly like JSON, but those are server-sent events. Is this unified API going to be used for multiple providers? OpenAI, Anthropic Claude, and Google Gemini all use server-sent events but then provide really different structures: https://til.simonwillison.net/llms/streaming-llm-apis.

If yes, then clients would need to know what provider this is (in the content-type header?). There would not be a need to map the options in the spec; it's OK to treat it as an opaque string or buffer. Also, I'm afraid that more complete discussions around this will have to wait for next year.

Postman makes it look mainly like JSON, but those are server-sent events.

Yeah that makes sense

Is this unified API going to be used for multiple providers? OpenAI, Anthropic Claude, and Google Gemini all use server-sent events but then provide really different structures: https://til.simonwillison.net/llms/streaming-llm-apis.

Sorry I should have provided more details when opening the PR. The hope with this API is to provide a consistent request and response schema that will be the same across multiple providers. Internally we'll handle transforming the request and response to match the individual providers and our schema.

When we get a request for Anthropic we'll translate our "unified" schema into a request for Anthropic specifically and then as we get responses from Anthropic we'll translate it back into the format that fits the response schema we're proposing here.

Adding the unified api

ea37f16

jonathan-buttner added the specification label Dec 16, 2024

jonathan-buttner requested review from davidkyle and maxhniebergall December 16, 2024 19:44

jonathan-buttner requested a review from a team as a code owner December 16, 2024 19:44

jonathan-buttner commented Dec 16, 2024

View reviewed changes

Fixing formatting

5b80fae

pquentin reviewed Dec 17, 2024

View reviewed changes

maxhniebergall reviewed Dec 17, 2024

View reviewed changes

l-trotta reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add POST _unified for the inference API #3313

Add POST _unified for the inference API #3313

jonathan-buttner commented Dec 16, 2024

github-actions bot commented Dec 16, 2024

jonathan-buttner Dec 16, 2024

pquentin Dec 17, 2024

jonathan-buttner Dec 16, 2024

pquentin Dec 17, 2024

jonathan-buttner Dec 16, 2024

pquentin Dec 17, 2024

github-actions bot commented Dec 16, 2024

pquentin left a comment

pquentin Dec 17, 2024

pquentin Dec 17, 2024

pquentin Dec 17, 2024

maxhniebergall left a comment •

edited

Loading

l-trotta Dec 18, 2024 •

edited

Loading

l-trotta Dec 18, 2024

l-trotta Dec 18, 2024

l-trotta Dec 18, 2024

l-trotta Dec 18, 2024

l-trotta Dec 18, 2024

l-trotta Dec 18, 2024

l-trotta Dec 18, 2024

l-trotta Dec 18, 2024

l-trotta Dec 18, 2024

l-trotta commented Dec 18, 2024

jonathan-buttner commented Dec 18, 2024

jonathan-buttner commented Dec 18, 2024

l-trotta commented Dec 18, 2024

jonathan-buttner commented Dec 18, 2024 •

edited

Loading

pquentin commented Dec 18, 2024

jonathan-buttner commented Dec 18, 2024

Add POST _unified for the inference API #3313

Are you sure you want to change the base?

Add POST _unified for the inference API #3313

Conversation

jonathan-buttner commented Dec 16, 2024

github-actions bot commented Dec 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 16, 2024

pquentin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxhniebergall left a comment • edited Loading

Choose a reason for hiding this comment

l-trotta Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-trotta commented Dec 18, 2024

jonathan-buttner commented Dec 18, 2024

jonathan-buttner commented Dec 18, 2024

l-trotta commented Dec 18, 2024

jonathan-buttner commented Dec 18, 2024 • edited Loading

pquentin commented Dec 18, 2024

jonathan-buttner commented Dec 18, 2024

maxhniebergall left a comment •

edited

Loading

l-trotta Dec 18, 2024 •

edited

Loading

jonathan-buttner commented Dec 18, 2024 •

edited

Loading