Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add POST _unified for the inference API #3313

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jonathan-buttner
Copy link
Contributor

This PR adds the _unified route for the Inference API. This coincides with the elasticsearch PR: elastic/elasticsearch#117589

Copy link
Contributor

Following you can find the validation results for the APIs you have changed.

API Status Request Response
inference.delete Missing test Missing test
inference.get 🟢 1/1 1/1
inference.inference Missing test Missing test
inference.put Missing test Missing test
inference.stream_inference 🟠 Missing type Missing type
inference.unified_inference Missing test Missing test

You can validate these APIs yourself by using the make validate target.

"stability": "stable",
"visibility": "public",
"headers": {
"accept": ["text/event-stream"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we have for the streaming api here: https://github.com/elastic/elasticsearch-specification/blob/main/specification/_json_spec/inference.stream_inference.json#L10

Not sure if this is correct if it's indicating that the post request accepts an event stream or that the response returns one 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both HTTP headers are used in client requests:

  • Accept tells Elasticsearch that returning event streams is OK
  • Content-Type is about the request's body, which is indeed JSON here.

In other words, this is correct, thank you!


/**
* Perform inference on the service using the Unified Schema
* @rest_spec_name inference.unified_inference
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unified_inference seemed to follow stream_inference. I'm open to other ideas though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinions, this seems OK.

/**
* Perform inference on the service using the Unified Schema
* @rest_spec_name inference.unified_inference
* @availability stack since=8.18.0 stability=stable visibility=public
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidkyle Double check that this is what we want

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

Copy link
Contributor

Following you can find the validation results for the APIs you have changed.

API Status Request Response
inference.delete Missing test Missing test
inference.get 🟢 1/1 1/1
inference.inference Missing test Missing test
inference.put Missing test Missing test
inference.stream_inference 🟠 Missing type Missing type
inference.unified_inference Missing test Missing test

You can validate these APIs yourself by using the make validate target.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I answered your questions, but did not look at the contents yet. Do we have YAML tests in Elasticsearch for this feature? It would help to validate the requests.

"stability": "stable",
"visibility": "public",
"headers": {
"accept": ["text/event-stream"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both HTTP headers are used in client requests:

  • Accept tells Elasticsearch that returning event streams is OK
  • Content-Type is about the request's body, which is indeed JSON here.

In other words, this is correct, thank you!


/**
* Perform inference on the service using the Unified Schema
* @rest_spec_name inference.unified_inference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinions, this seems OK.

/**
* Perform inference on the service using the Unified Schema
* @rest_spec_name inference.unified_inference
* @availability stack since=8.18.0 stability=stable visibility=public
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

Copy link
Member

@maxhniebergall maxhniebergall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,214 @@
/*
Copy link
Contributor

@l-trotta l-trotta Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually try to have only the request class in the corresponding Request file, and all other types we put in the types folder above (this is just to make it easier to maintain)

nevermind that, could we just move the main Request type to the top of the file?

/**
* An object representing part of the conversation.
*/
export interface Message {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing name

name?: string

/**
* The tool call that this message is responding to.
*/
tool_call_id?: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tool_call_id?: string
tool_call_id?: Id

/**
* The identifier of the tool call.
*/
id: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
id: string
id: Id

/**
* The upper bound limit for the number of tokens that can be generated for a completion request.
*/
max_completion_tokens?: number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_completion_tokens?: number
max_completion_tokens?: long

/**
* The sampling temperature to use.
*/
temperature?: number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
temperature?: number
temperature?: float

/**
* Nucleus sampling, an alternative to sampling with temperature.
*/
top_p?: number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
top_p?: number
top_p?: float

/**
* A unique identifier for the chat completion
*/
id: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
id: string
id: Id

/**
* The content of the message.
*/
content: string | Array<ContentObject>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unions such as this one have to be handled in a separate type for them to be understandable by the static client, so here we need a new type:

/**
 * @codegen_names string, object
 */
export type MessageContent = string | Array<ContentObject>
content: MessageContent

/**
* Controls which tool is called by the model.
*/
tool_choice?: string | CompletionToolChoice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as content above, new type needed to define union:

/**
 * @codegen_names string, object
 */
export type CompletionToolType = string | CompletionToolChoice
tool_choice?: CompletionToolType

@l-trotta
Copy link
Contributor

@jonathan-buttner I have a question about the response format: is there a way to get it in json format, or is it just text/stream?

@jonathan-buttner
Copy link
Contributor Author

@jonathan-buttner I have a question about the response format: is there a way to get it in json format, or is it just text/stream?

Thanks for the review! Currently it's only text/stream but in the near future we'd like to add a non-streaming version. Is that possible to do using the same route (.../_unified)? Or do we need to create a new one?

@jonathan-buttner
Copy link
Contributor Author

Thanks! I answered your questions, but did not look at the contents yet. Do we have YAML tests in Elasticsearch for this feature? It would help to validate the requests.

Thanks for the review! Not yet, let me see if I can add some.

@l-trotta
Copy link
Contributor

@jonathan-buttner I have a question about the response format: is there a way to get it in json format, or is it just text/stream?

Thanks for the review! Currently it's only text/stream but in the near future we'd like to add a non-streaming version. Is that possible to do using the same route (.../_unified)? Or do we need to create a new one?

the thing is: if the return type is just text then the response body should be just string, because the whole body structure is needed just for json, not other types of responses. @pquentin like we were talking about hot threads remember?

@jonathan-buttner
Copy link
Contributor Author

jonathan-buttner commented Dec 18, 2024

@jonathan-buttner I have a question about the response format: is there a way to get it in json format, or is it just text/stream?

Thanks for the review! Currently it's only text/stream but in the near future we'd like to add a non-streaming version. Is that possible to do using the same route (.../_unified)? Or do we need to create a new one?

the thing is: if the return type is just text then the response body should be just string, because the whole body structure is needed just for json, not other types of responses. @pquentin like we were talking about hot threads remember?

Oh sorry I might have misunderstood. Here's what Postman parses out:

image

Where each one of those messages is valid json:

{
    "id": "chatcmpl-AfVcn7De1ibDPKB7iBy83oEhpHxC9",
    "choices": [],
    "model": "gpt-4o-2024-08-06",
    "object": "chat.completion.chunk",
    "usage": {
        "completion_tokens": 6,
        "prompt_tokens": 63,
        "total_tokens": 69
    }
}

Except for the [DONE] message I suppose.

Do we still want the response to be only text?

@pquentin
Copy link
Member

Postman makes it look mainly like JSON, but those are server-sent events. Is this unified API going to be used for multiple providers? OpenAI, Anthropic Claude, and Google Gemini all use server-sent events but then provide really different structures: https://til.simonwillison.net/llms/streaming-llm-apis.

If yes, then clients would need to know what provider this is (in the content-type header?). There would not be a need to map the options in the spec; it's OK to treat it as an opaque string or buffer. Also, I'm afraid that more complete discussions around this will have to wait for next year.

@jonathan-buttner
Copy link
Contributor Author

Postman makes it look mainly like JSON, but those are server-sent events. Is this unified API going to be used for multiple providers? OpenAI, Anthropic Claude, and Google Gemini all use server-sent events but then provide really different structures: https://til.simonwillison.net/llms/streaming-llm-apis.

If yes, then clients would need to know what provider this is (in the content-type header?). There would not be a need to map the options in the spec; it's OK to treat it as an opaque string or buffer. Also, I'm afraid that more complete discussions around this will have to wait for next year.

Postman makes it look mainly like JSON, but those are server-sent events.

Yeah that makes sense

Is this unified API going to be used for multiple providers? OpenAI, Anthropic Claude, and Google Gemini all use server-sent events but then provide really different structures: https://til.simonwillison.net/llms/streaming-llm-apis.

Sorry I should have provided more details when opening the PR. The hope with this API is to provide a consistent request and response schema that will be the same across multiple providers. Internally we'll handle transforming the request and response to match the individual providers and our schema.

When we get a request for Anthropic we'll translate our "unified" schema into a request for Anthropic specifically and then as we get responses from Anthropic we'll translate it back into the format that fits the response schema we're proposing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants