Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync vLLM support from Examples repo k8s manifests to Helm charts #608

Closed
eero-t opened this issue Nov 22, 2024 · 4 comments · Fixed by #610
Closed

Sync vLLM support from Examples repo k8s manifests to Helm charts #608

eero-t opened this issue Nov 22, 2024 · 4 comments · Fixed by #610
Assignees

Comments

@eero-t
Copy link
Contributor

eero-t commented Nov 22, 2024

Several k8s app manifest and docker compose files in Examples repo support vLLM:

$ GenAIExamples$ find -iname '*vllm*.yaml'
./EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml
./ChatQnA/docker_compose/intel/cpu/xeon/compose_vllm.yaml
./ChatQnA/docker_compose/intel/hpu/gaudi/compose_vllm.yaml
./ChatQnA/kubernetes/intel/hpu/gaudi/manifest/chatqna-vllm-remote-inference.yaml
./ChatQnA/kubernetes/intel/hpu/gaudi/manifest/chatqna-vllm.yaml
./WorkflowExecAgent/docker_compose/intel/cpu/xeon/compose_vllm.yaml

And k8s ones specify vLLM options in configMap:
https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/intel/hpu/gaudi/manifest/chatqna-vllm.yaml#L178

However, such vLLM support is missing from application Helm charts here, and vLLM options are missing from the vLLM configMap in Helm charts:
https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/configmap.yaml

Some of those options are specified with ExtraCmdArgs chart value in couple of other charts:

Latter one uses those args only in the CI values file.


=> I think:

  • vLLM options should be fixed to be specified only in configMap
  • vLLM support should be added to app Helm charts, to same apps as used in Examples repo
  • Examples repo manifests should be generated from Helm charts
@poussa
Copy link
Collaborator

poussa commented Nov 25, 2024

I think we should only have Helm charts. The GenAIExamples repository manifest should be deleted and enhance the Helm charts to support all the use cases. Now the project maintains the K8s deployment files in two different places which makes no sense. It is duplicate effort, error prone and leads to confusion.

@eero-t
Copy link
Contributor Author

eero-t commented Nov 25, 2024

I'm working on adding similar vLLM support to ChatQnA & DocSum Helm charts: https://github.com/eero-t/GenAIInfra/commits/helm-vllm/

As I've never used "leaveEdgeCraftRAG" or "WorkflowExecAgent", I'll leave those to somebody else.

Note that this support will require user specifying which LLM is to be used with the application, e.g. by using gaudi-tgi-values.yaml or gaudi-vllm-values.yaml file. I.e. it will increase the Helm charts messiness a bit...

@poussa poussa assigned poussa and yongfengdu and unassigned poussa Nov 25, 2024
@lianhao
Copy link
Collaborator

lianhao commented Nov 26, 2024

I think we should only have Helm charts. The GenAIExamples repository manifest should be deleted and enhance the Helm charts to support all the use cases. Now the project maintains the K8s deployment files in two different places which makes no sense. It is duplicate effort, error prone and leads to confusion.

Yes. We're planning to move the helm charts from here to GenAIExamples and GenAIComps separately, and after that, we should delete those GenAIExample static k8s manifest files.

@yongfengdu
Copy link
Collaborator

@eero-t Yes, this is we should complete at 1.2 release cycle. Once this is done, #403 can be closed.

  1. For examples like chatqna and others, we should support more falvors - Currently we only have the guardrails-usvc.enabled condition. We should change it to more flexible one. For each configurable component, use tgi.enabled=[true|false] condition and use values.yaml file to control enablement. so far the flavors I know: guardrails, vllm/tgi, rerank/no_rerank. (wrapper/no_wrapper for 1.0, but I think we don't need that after 1.1)
  2. For vllm service, it's a little complicated.
    1). Not all options are supported by Environment Variables, so can't put everything to configmap. Of course we can sync with the chatqna docker compose first.
    2). The Entrypoint for vllm docker image are not consistent, some use "/bin/bash", and some use "python3 -m vllm.entrypoints.openai.api_server". It's necessary to have the "extraCmdArgs" and with different forms to support different entrypoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants