-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: initial skeleton of LLMRoute and LLMBackend #20
Conversation
Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
cc @arkodg |
cc @sanjeewa-malalgoda (i would appreciate it if you can ping other folks from your org) |
Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
@mathetake We reviewed this and it looks good for us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed myself🏃
// APISchema specifies the API schema of the input that the target Gateway(s) will receive. | ||
// Based on this schema, the ai-gateway will perform the necessary transformation to the | ||
// output schema specified in the selected LLMBackend during the routing process. | ||
APISchema LLMAPISchema `json:"inputSchema"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APISchema LLMAPISchema `json:"inputSchema"` | |
APISchema LLMAPISchema `json:"apiSchema"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to allow different APISchema for different LLM route if we define at the route level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my understanding is correct, this indicates which vendor and model the route belongs to. Therefore, a single LLMRoute can correspond to only one vendor and model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to allow different APISchema for different LLM route if we define at the route level?
I think the answer would be no at the moment since i am not sure why a user wants to access API like that - clients need to use different set of API clients and switching it from their end depending on a path? idk.
Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
on second thought i kept the field name |
Signed-off-by: Takeshi Yoneda <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by notes
Based on this schema, the ai-gateway will perform the necessary transformation to the | ||
output schema specified in the selected LLMBackend during the routing process. | ||
|
||
Currently, the only supported schema is OpenAI as the input schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be less problematic to remove this text constraint and introduce bedrock later once supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the constraint will be enforced in the cel validation rule that happens at k8s API server - will do the follow up soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good comment anyways adrian as usual
Co-authored-by: Adrian Cole <[email protected]> Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
This commit is a follow up on #20. Basically, this makes LLMRoute a pure "addition" to the existing standardized HTTPRoute. This makes it possible to configure something like ``` kind: LLMRoute metadata: name: llm-route spec: inputSchema: OpenAI httpRouteRef: name: my-llm-route --- kind: HTTPRoute metadata: name: my-llm-route spec: matches: - headers: key: x-envoy-ai-gateway-llm-model value: llama3-70b backendRefs: - kserve: weight: 20 - aws-bedrock: weight: 80 ``` where LLMRoute is purely referencing HTTPRoute and users can configure whatever routing condition in a standardized way via HTTPRoute while leveraging the LLM specific information, in this case x-envoy-ai-gateway-llm-model header. In the implementation, though it's not merged yet, we have to do the routing calculation in the extproc by actually analyzing the referenced HTTPRoute, and emulate the behavior in order to do the transformation. The reason is that the routing decision is made at the very end of filter chain in general, and by the time we invoke extproc, we don't have that info. Furthermore, `x-envoy-ai-gateway-llm-model` is not available before extproc. As a bonus of this, we no longer need TargetRef at LLMRoute level since that's within the HTTPRoute resources. This will really simplify the PoC implementation. --------- Signed-off-by: Takeshi Yoneda <[email protected]>
This commit is a follow up on #20. Basically, this makes LLMRoute a pure "addition" to the existing standardized HTTPRoute. This makes it possible to configure something like ``` kind: LLMRoute metadata: name: llm-route spec: inputSchema: OpenAI httpRouteRef: name: my-llm-route --- kind: HTTPRoute metadata: name: my-llm-route spec: matches: - headers: key: x-envoy-ai-gateway-llm-model value: llama3-70b backendRefs: - kserve: weight: 20 - aws-bedrock: weight: 80 ``` where LLMRoute is purely referencing HTTPRoute and users can configure whatever routing condition in a standardized way via HTTPRoute while leveraging the LLM specific information, in this case x-envoy-ai-gateway-llm-model header. In the implementation, though it's not merged yet, we have to do the routing calculation in the extproc by actually analyzing the referenced HTTPRoute, and emulate the behavior in order to do the transformation. The reason is that the routing decision is made at the very end of filter chain in general, and by the time we invoke extproc, we don't have that info. Furthermore, `x-envoy-ai-gateway-llm-model` is not available before extproc. As a bonus of this, we no longer need TargetRef at LLMRoute level since that's within the HTTPRoute resources. This will really simplify the PoC implementation. --------- Signed-off-by: Takeshi Yoneda <[email protected]>
This adds the skeleton API of LLMRoute and LLMBackend.
These two resources would be the foundation for the future
iterations, such as authn/z, token-based rate limiting,
schema transformation and more advanced thingy like #10
Note: we might / will break APIs if necessity comes up until
the initial release.
part of #13
cc @yuzisun