Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature-request] Update vLLM library in LMI containers to v0.6.0 #4240

Open
CoolFish88 opened this issue Sep 16, 2024 · 1 comment
Open

Comments

@CoolFish88
Copy link

Concise Description:

vLLM v0.6.0 provides 2.7x throughput improvement and 5x latency reduction over the previous version (v0.5.3)

DLC image/dockerfile:
763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-lmi11.0.0-cu124
763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-neuronx-sdk2.19.1

Is your feature request related to a problem? Please describe.
Improve the performance of LMI containters

Describe the solution you'd like
Update vLLM library in LMI containers to v0.6.0

@siddvenk
Copy link
Contributor

siddvenk commented Oct 1, 2024

We are planning a release that will include vllm 0.6.2 within the next 2 weeks. In the meantime, you can try providing a requirements.txt with vllm==0.6.x and leverage a later version of vllm that way. If you go this route, you should also set OPTION_ROLLING_BATCH=vllm environment variable to force usage of vllm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants