Sync vLLM support from Examples repo k8s manifests to Helm charts #608

eero-t · 2024-11-22T14:04:40Z

Several k8s app manifest and docker compose files in Examples repo support vLLM:

$ GenAIExamples$ find -iname '*vllm*.yaml'
./EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml
./ChatQnA/docker_compose/intel/cpu/xeon/compose_vllm.yaml
./ChatQnA/docker_compose/intel/hpu/gaudi/compose_vllm.yaml
./ChatQnA/kubernetes/intel/hpu/gaudi/manifest/chatqna-vllm-remote-inference.yaml
./ChatQnA/kubernetes/intel/hpu/gaudi/manifest/chatqna-vllm.yaml
./WorkflowExecAgent/docker_compose/intel/cpu/xeon/compose_vllm.yaml

And k8s ones specify vLLM options in configMap:
https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/intel/hpu/gaudi/manifest/chatqna-vllm.yaml#L178

However, such vLLM support is missing from application Helm charts here, and vLLM options are missing from the vLLM configMap in Helm charts:
https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/configmap.yaml

Some of those options are specified with ExtraCmdArgs chart value in couple of other charts:

Latter one uses those args only in the CI values file.

=> I think:

vLLM options should be fixed to be specified only in configMap
vLLM support should be added to app Helm charts, to same apps as used in Examples repo
Examples repo manifests should be generated from Helm charts

The text was updated successfully, but these errors were encountered:

poussa · 2024-11-25T12:15:41Z

I think we should only have Helm charts. The GenAIExamples repository manifest should be deleted and enhance the Helm charts to support all the use cases. Now the project maintains the K8s deployment files in two different places which makes no sense. It is duplicate effort, error prone and leads to confusion.

eero-t · 2024-11-25T12:33:23Z

I'm working on adding similar vLLM support to ChatQnA & DocSum Helm charts: https://github.com/eero-t/GenAIInfra/commits/helm-vllm/

As I've never used "leaveEdgeCraftRAG" or "WorkflowExecAgent", I'll leave those to somebody else.

Note that this support will require user specifying which LLM is to be used with the application, e.g. by using gaudi-tgi-values.yaml or gaudi-vllm-values.yaml file. I.e. it will increase the Helm charts messiness a bit...

lianhao · 2024-11-26T01:24:34Z

I think we should only have Helm charts. The GenAIExamples repository manifest should be deleted and enhance the Helm charts to support all the use cases. Now the project maintains the K8s deployment files in two different places which makes no sense. It is duplicate effort, error prone and leads to confusion.

Yes. We're planning to move the helm charts from here to GenAIExamples and GenAIComps separately, and after that, we should delete those GenAIExample static k8s manifest files.

yongfengdu · 2024-11-26T02:35:28Z

@eero-t Yes, this is we should complete at 1.2 release cycle. Once this is done, #403 can be closed.

For examples like chatqna and others, we should support more falvors - Currently we only have the guardrails-usvc.enabled condition. We should change it to more flexible one. For each configurable component, use tgi.enabled=[true|false] condition and use values.yaml file to control enablement. so far the flavors I know: guardrails, vllm/tgi, rerank/no_rerank. (wrapper/no_wrapper for 1.0, but I think we don't need that after 1.1)
For vllm service, it's a little complicated.
1). Not all options are supported by Environment Variables, so can't put everything to configmap. Of course we can sync with the chatqna docker compose first.
2). The Entrypoint for vllm docker image are not consistent, some use "/bin/bash", and some use "python3 -m vllm.entrypoints.openai.api_server". It's necessary to have the "extraCmdArgs" and with different forms to support different entrypoint.

poussa assigned poussa and yongfengdu and unassigned poussa Nov 25, 2024

eero-t mentioned this issue Nov 26, 2024

Add vLLM+HPA support to ChatQnA Helm chart #610

Merged

2 tasks

yongfengdu closed this as completed in #610 Dec 18, 2024

yongfengdu mentioned this issue Dec 19, 2024

Make chatqna components configurable #653

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync vLLM support from Examples repo k8s manifests to Helm charts #608

Sync vLLM support from Examples repo k8s manifests to Helm charts #608

eero-t commented Nov 22, 2024 •

edited

Loading

poussa commented Nov 25, 2024

eero-t commented Nov 25, 2024

lianhao commented Nov 26, 2024 •

edited

Loading

yongfengdu commented Nov 26, 2024

Sync vLLM support from Examples repo k8s manifests to Helm charts #608

Sync vLLM support from Examples repo k8s manifests to Helm charts #608

Comments

eero-t commented Nov 22, 2024 • edited Loading

poussa commented Nov 25, 2024

eero-t commented Nov 25, 2024

lianhao commented Nov 26, 2024 • edited Loading

yongfengdu commented Nov 26, 2024

eero-t commented Nov 22, 2024 •

edited

Loading

lianhao commented Nov 26, 2024 •

edited

Loading