Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: make LLMRoute reference HTTPRoute #39

Merged
merged 7 commits into from
Dec 10, 2024
Merged

Conversation

mathetake
Copy link
Member

@mathetake mathetake commented Dec 9, 2024

This commit is a follow up on #20. Basically, this makes LLMRoute
a pure "addition" to the existing standardized HTTPRoute.
This makes it possible to configure something like

kind: LLMRoute
metadata:
  name: llm-route
spec:
  inputSchema: OpenAI
  httpRouteRef:
    name: my-llm-route
---
kind: HTTPRoute
metadata:
  name: my-llm-route
spec:
  matches:
     - headers:
         key: x-envoy-ai-gateway-llm-model
         value: llama3-70b 
       backendRefs: 
       - kserve:
         weight: 20
       - aws-bedrock:
         weight: 80

where LLMRoute is purely referencing HTTPRoute and
users can configure whatever routing condition in a standardized way
via HTTPRoute while leveraging the LLM specific information, in this case
x-envoy-ai-gateway-llm-model header.

In the implementation, though it's not merged yet, we have to do the
routing calculation in the extproc by actually analyzing the referenced
HTTPRoute, and emulate the behavior in order to do the transformation.
The reason is that the routing decision is made at the very end of filter chain
in general, and by the time we invoke extproc, we don't have that info.
Furthermore, x-envoy-ai-gateway-llm-model is not available before extproc.

As a bonus of this, we no longer need TargetRef at LLMRoute level since that's within
the HTTPRoute resources. This will really simplify the PoC implementation.

@mathetake mathetake requested a review from a team December 9, 2024 23:14
api/v1alpha1/api.go Outdated Show resolved Hide resolved
Signed-off-by: Takeshi Yoneda <[email protected]>
@mathetake mathetake marked this pull request as ready for review December 9, 2024 23:20
Signed-off-by: Takeshi Yoneda <[email protected]>
@mathetake mathetake changed the title api: make LLMRoute reference HTTPRef api: make LLMRoute reference HTTPRoute Dec 9, 2024
api/v1alpha1/api.go Outdated Show resolved Hide resolved
Signed-off-by: Takeshi Yoneda <[email protected]>
@mathetake mathetake requested a review from yuzisun December 10, 2024 23:09
@yuzisun
Copy link
Contributor

yuzisun commented Dec 10, 2024

/lgtm

Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
@mathetake
Copy link
Member Author

thanks @Krishanx92 @yuzisun for the reviews!

@mathetake mathetake merged commit 8299827 into main Dec 10, 2024
6 checks passed
@mathetake mathetake deleted the httprouteconfiguration branch December 10, 2024 23:21
aabchoo pushed a commit that referenced this pull request Dec 12, 2024
This commit is a follow up on #20. Basically, this makes LLMRoute
a pure "addition" to the existing standardized HTTPRoute.
This makes it possible to configure something like
```
kind: LLMRoute
metadata:
  name: llm-route
spec:
  inputSchema: OpenAI
  httpRouteRef:
    name: my-llm-route
---
kind: HTTPRoute
metadata:
  name: my-llm-route
spec:
  matches:
     - headers:
         key: x-envoy-ai-gateway-llm-model
         value: llama3-70b
       backendRefs:
       - kserve:
         weight: 20
       - aws-bedrock:
         weight: 80
```

where LLMRoute is purely referencing HTTPRoute and
users can configure whatever routing condition in a standardized way
via HTTPRoute while leveraging the LLM specific information, in this
case
x-envoy-ai-gateway-llm-model header.

In the implementation, though it's not merged yet, we have to do the
routing calculation in the extproc by actually analyzing the referenced
HTTPRoute, and emulate the behavior in order to do the transformation.
The reason is that the routing decision is made at the very end of
filter chain
in general, and by the time we invoke extproc, we don't have that info.
Furthermore, `x-envoy-ai-gateway-llm-model` is not available before
extproc.

As a bonus of this, we no longer need TargetRef at LLMRoute level since
that's within
the HTTPRoute resources. This will really simplify the PoC
implementation.

---------

Signed-off-by: Takeshi Yoneda <[email protected]>
mathetake added a commit that referenced this pull request Dec 21, 2024
This is a follow up on #39, and stops embedding HTTPRoute as-is.
The rationale here is somewhat complicated; We have to know which
backend we route traffics to **before** Envoy decides the routing as
we need to do perform the transformation etc. Hence, we need to 
implement the routing logics (e.g. header matching, weights, etc.) 
by ourselves, and cannot rely on the native router code path in Envoy.
That results in us only being able to support only a small subset of 
HTTPRoute functionality as well as we cannot simply use the embedded
HTTPRoute as a "base" in the translation. If we embed HTTPRoute here, 
it would create an impression that  all HTTPRoute functionality can be 
supported  here no matter how much we document otherwise.

We discussed about this issue, and reached the consensus that we should
have our own LLMRoute rule definition and only have the fields that 
actually can be supported.

The example API would look like;
```
kind: LLMRoute
metadata:
  name: 
  namespace: default
spec:
  inputSchema:
    schema: OpenAI
  rules:
    - matches:
      - headers:
        - type: Exact
          name: x-envoy-ai-gateway-llm-model
          value: llama3-70b
      backendRefs:
        - name: kserve
          weight: 20
        - name: aws-bedrock
          weight: 80
```

---------

Signed-off-by: Takeshi Yoneda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants