-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding support for routing to LLMServerPool as a valid backendRef #4423
Comments
Hoping end users as well as vendors using Envoy Gateway today can chime in and share whether they are interested in using this feature if it did exist natively. Please also leave a comment, If you're not yet an Envoy Gateway user, but would adopt it, if this feature was added 😄 Current workaround
|
So far, we refrained from supporting specific backends (e.g. S3, EC2, ... ). This API is not yet a widely adopted resource like
The alternative, as I understand it, is to have a backend resource define portions of the downstream filter chain. In general (not for the LLM use case), that can create some issues around unexpected side effects and conflicts from different backends. Maybe this can be mitigated by scoping the filters to specific routes or even using upstream filters and by detecting/resolving conflicts in IR translation.
This can be improved (somewhat) by supporting backend reference extensibility, as proposed here: #4373 (comment).
|
EG can't directly support The current workaround, using a dummy backend approach, is a bit of a hack. It results in an HTTPRoute that can be confusing to anyone inspecting it, as the destination cluster is just a placeholder. This can be improved by adding support to custom Backend types, as @guydc suggested. EG will need to invoke an "LLM Gateway extension" to translate the EG delegates the translation of apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: llm-route
spec:
parentRefs:
- name: inference-gateway
sectionName: llm-gw
rules:
- backendRefs:
- group: llm-gateway.k8s.io
kind: LLMServerPool
name: llm-backend This Backend resource is only used by the LLM Gateway controller, EG doesn't care about it. apiVersion: llm-gateway.k8s.io
kind: LLMServerPool
metadata:
name: llm-backend
spec:
.... omitted, EG doesn't care This mechanism can also be used to support other vendor-specific or private Backend types as out-of-tree extensions, such as AWS S3, EC2, Lambda, etc. |
To clarify, we're checking with Envoy-based Gateway API implementations to understand which ones would be open to adding native support for the new LLMServerPool API that wg-serving is working on.
Completely agree. This is a bit of a chicken and egg problem though. We want to see Gateway API implementations support this new k8s API as a backend, but that requires one implementation to be first. Ideally that's an OSS implementation that can then be used as a reference implementation for how this integration can work.
The point here is that this is a new Kubernetes API, not a third-party extension. Deciding on whether or not to support this should be more related to whether or not this project should support TLSRoute or ServiceImport - OSS Kubernetes APIs that are still only in alpha. I've suggested that instead of continuing to work on the rather fragile workaround in #4423 (comment), it would be better for the WG to work to support this resource natively in an OSS + CNCF Gateway API implementation. Envoy Gateway seems like a great option for this, but we'll also be open to any other projects that are interested. |
@robscott Thanks for the clarification! I initially thought this was being proposed as an EG-specific API. If it's going to be a Kubernetes API like |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. |
Description:
The kubernetes-sigs/llm-instance-gateway project has introduced a new backendRef called LLMServerPool, representing a collection of model servers inside Kubernetes, that can be routed to, from an HTTPRoute, and is looking for envoy proxy based implementations to support routing to this backendRef natively. More in kubernetes-sigs/gateway-api-inference-extension#19
Creating this issue, to decide on whether Envoy Gateway should add support for this
The text was updated successfully, but these errors were encountered: