-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: make LLMRoute reference HTTPRoute #39
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Takeshi Yoneda <[email protected]>
mathetake
commented
Dec 9, 2024
Signed-off-by: Takeshi Yoneda <[email protected]>
mathetake
requested review from
aabchoo,
missBerg and
wengyao04
as code owners
December 9, 2024 23:20
Signed-off-by: Takeshi Yoneda <[email protected]>
mathetake
changed the title
api: make LLMRoute reference HTTPRef
api: make LLMRoute reference HTTPRoute
Dec 9, 2024
Krishanx92
reviewed
Dec 10, 2024
yuzisun
reviewed
Dec 10, 2024
Signed-off-by: Takeshi Yoneda <[email protected]>
/lgtm |
Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
thanks @Krishanx92 @yuzisun for the reviews! |
aabchoo
pushed a commit
that referenced
this pull request
Dec 12, 2024
This commit is a follow up on #20. Basically, this makes LLMRoute a pure "addition" to the existing standardized HTTPRoute. This makes it possible to configure something like ``` kind: LLMRoute metadata: name: llm-route spec: inputSchema: OpenAI httpRouteRef: name: my-llm-route --- kind: HTTPRoute metadata: name: my-llm-route spec: matches: - headers: key: x-envoy-ai-gateway-llm-model value: llama3-70b backendRefs: - kserve: weight: 20 - aws-bedrock: weight: 80 ``` where LLMRoute is purely referencing HTTPRoute and users can configure whatever routing condition in a standardized way via HTTPRoute while leveraging the LLM specific information, in this case x-envoy-ai-gateway-llm-model header. In the implementation, though it's not merged yet, we have to do the routing calculation in the extproc by actually analyzing the referenced HTTPRoute, and emulate the behavior in order to do the transformation. The reason is that the routing decision is made at the very end of filter chain in general, and by the time we invoke extproc, we don't have that info. Furthermore, `x-envoy-ai-gateway-llm-model` is not available before extproc. As a bonus of this, we no longer need TargetRef at LLMRoute level since that's within the HTTPRoute resources. This will really simplify the PoC implementation. --------- Signed-off-by: Takeshi Yoneda <[email protected]>
mathetake
added a commit
that referenced
this pull request
Dec 21, 2024
This is a follow up on #39, and stops embedding HTTPRoute as-is. The rationale here is somewhat complicated; We have to know which backend we route traffics to **before** Envoy decides the routing as we need to do perform the transformation etc. Hence, we need to implement the routing logics (e.g. header matching, weights, etc.) by ourselves, and cannot rely on the native router code path in Envoy. That results in us only being able to support only a small subset of HTTPRoute functionality as well as we cannot simply use the embedded HTTPRoute as a "base" in the translation. If we embed HTTPRoute here, it would create an impression that all HTTPRoute functionality can be supported here no matter how much we document otherwise. We discussed about this issue, and reached the consensus that we should have our own LLMRoute rule definition and only have the fields that actually can be supported. The example API would look like; ``` kind: LLMRoute metadata: name: namespace: default spec: inputSchema: schema: OpenAI rules: - matches: - headers: - type: Exact name: x-envoy-ai-gateway-llm-model value: llama3-70b backendRefs: - name: kserve weight: 20 - name: aws-bedrock weight: 80 ``` --------- Signed-off-by: Takeshi Yoneda <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit is a follow up on #20. Basically, this makes LLMRoute
a pure "addition" to the existing standardized HTTPRoute.
This makes it possible to configure something like
where LLMRoute is purely referencing HTTPRoute and
users can configure whatever routing condition in a standardized way
via HTTPRoute while leveraging the LLM specific information, in this case
x-envoy-ai-gateway-llm-model header.
In the implementation, though it's not merged yet, we have to do the
routing calculation in the extproc by actually analyzing the referenced
HTTPRoute, and emulate the behavior in order to do the transformation.
The reason is that the routing decision is made at the very end of filter chain
in general, and by the time we invoke extproc, we don't have that info.
Furthermore,
x-envoy-ai-gateway-llm-model
is not available before extproc.As a bonus of this, we no longer need TargetRef at LLMRoute level since that's within
the HTTPRoute resources. This will really simplify the PoC implementation.