Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding support for routing to LLMServerPool as a valid backendRef #4423

Open
arkodg opened this issue Oct 10, 2024 · 6 comments
Open
Labels
kind/decision A record of a decision made by the community. stale

Comments

@arkodg
Copy link
Contributor

arkodg commented Oct 10, 2024

Description:

The kubernetes-sigs/llm-instance-gateway project has introduced a new backendRef called LLMServerPool, representing a collection of model servers inside Kubernetes, that can be routed to, from an HTTPRoute, and is looking for envoy proxy based implementations to support routing to this backendRef natively. More in kubernetes-sigs/gateway-api-inference-extension#19

Creating this issue, to decide on whether Envoy Gateway should add support for this

@arkodg arkodg added the kind/decision A record of a decision made by the community. label Oct 10, 2024
@arkodg
Copy link
Contributor Author

arkodg commented Oct 11, 2024

Hoping end users as well as vendors using Envoy Gateway today can chime in and share whether they are interested in using this feature if it did exist natively.

Please also leave a comment, If you're not yet an Envoy Gateway user, but would adopt it, if this feature was added 😄

Current workaround

@guydc
Copy link
Contributor

guydc commented Oct 11, 2024

So far, we refrained from supporting specific backends (e.g. S3, EC2, ... ). This API is not yet a widely adopted resource like Service, ServiceImport.

Create an EnvoyExtensionPolicyto configure the ext proc service

The alternative, as I understand it, is to have a backend resource define portions of the downstream filter chain. In general (not for the LLM use case), that can create some issues around unexpected side effects and conflicts from different backends. Maybe this can be mitigated by scoping the filters to specific routes or even using upstream filters and by detecting/resolving conflicts in IR translation.

  • Would this significantly complicate existing translation in EG?
  • Are there other examples in EG/GW-API space for backends having this "implict" impact on downstream traffic processing?

Edit xDS to add the using EnvoyPatchPolicy or Extension Server to add the original_destination_cluster xDS Cluster config

This can be improved (somewhat) by supporting backend reference extensibility, as proposed here: #4373 (comment).

  • Users may still reference the LLMServerPool in their HTTPRoutes, but EG is not responsible for the translation.
  • The extension server required for LLM resource translation may be delivered as part of an extended EG "contrib" chart, to simplify LCM.

@zhaohuabing
Copy link
Member

zhaohuabing commented Oct 15, 2024

EG can't directly support LLMServerPool as a Backend type because it lacks the logic to handle LLM-specific configuraitons, such as how to set up the filter chain and routes properly. This responsibility falls to a standalone component, the "LLM Gateway controller".

The current workaround, using a dummy backend approach, is a bit of a hack. It results in an HTTPRoute that can be confusing to anyone inspecting it, as the destination cluster is just a placeholder. This can be improved by adding support to custom Backend types, as @guydc suggested.

EG will need to invoke an "LLM Gateway extension" to translate the llm-backend to a original_destination_cluster. This extension will also insert an ExtProc filter to the HTTP filter chain to retrieve the IP of the LLM pod, this can be added via an EnvoyExtensionPolicy or through a xDS mutation extension point like the Extension Server.

EG delegates the translation of llm-gateway.k8s.io\LLMServerPool Backend type to a third-party extension.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: llm-route
spec:
  parentRefs:
    - name: inference-gateway
      sectionName: llm-gw
  rules:
  - backendRefs:
      - group: llm-gateway.k8s.io
        kind: LLMServerPool
        name: llm-backend

This Backend resource is only used by the LLM Gateway controller, EG doesn't care about it.

apiVersion: llm-gateway.k8s.io
kind: LLMServerPool
metadata:
  name: llm-backend
spec:
  .... omitted, EG doesn't care

This mechanism can also be used to support other vendor-specific or private Backend types as out-of-tree extensions, such as AWS S3, EC2, Lambda, etc.

@robscott
Copy link

To clarify, we're checking with Envoy-based Gateway API implementations to understand which ones would be open to adding native support for the new LLMServerPool API that wg-serving is working on.

This API is not yet a widely adopted resource like Service, ServiceImport.

Completely agree. This is a bit of a chicken and egg problem though. We want to see Gateway API implementations support this new k8s API as a backend, but that requires one implementation to be first. Ideally that's an OSS implementation that can then be used as a reference implementation for how this integration can work.

EG delegates the translation of llm-gateway.k8s.io\LLMServerPool Backend type to a third-party extension.

The point here is that this is a new Kubernetes API, not a third-party extension. Deciding on whether or not to support this should be more related to whether or not this project should support TLSRoute or ServiceImport - OSS Kubernetes APIs that are still only in alpha.

I've suggested that instead of continuing to work on the rather fragile workaround in #4423 (comment), it would be better for the WG to work to support this resource natively in an OSS + CNCF Gateway API implementation. Envoy Gateway seems like a great option for this, but we'll also be open to any other projects that are interested.

@zhaohuabing
Copy link
Member

zhaohuabing commented Oct 16, 2024

@robscott Thanks for the clarification! I initially thought this was being proposed as an EG-specific API. If it's going to be a Kubernetes API like TCPRoute, then EG would be happy to support it. EG has already supported all the experimental Gateway APIs, so supporting this API would be in line with that.

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/decision A record of a decision made by the community. stale
Projects
None yet
Development

No branches or pull requests

4 participants