-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the microservice of vLLM #78
Conversation
Please help review, and I expect your invaluable feedback. Thanks. @Jian-Zhang @xuechendi |
The PR has fixed the above comments and is ready to merge. Please help review. Thanks. @lvliang-intel @hshen14 @Jian-Zhang |
055ae40
to
23d3b63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the PR title, only LLM microservice based on vLLM was added. but in the code, you in fact included Ray Serve version, right? and it's better to move 'text-generation/ray_serve/docker' to upper level like what we did in vLLM and other examples.
another question is why there is a requirements.txt in 'text-generation/ray_serve/docker' folder? moving it to dockerfile is feasible?
@ftian1 Thanks for your comments. I exactly renamed the ray serve name instead of changing its version, did not implement the ray serve microservice in this PR, and will implement it in the next PR. The code in "docker" folder of the ray_serve was self-implemented docker image and just aimed to launch the ray serve llm engine unlike that the vLLM/TGI can build the docker from official github. The requirements.txt of ray serve was the necessary file to build the ray serve engine image instead of the microservice of the ray langchain implementation, so I put it under the docker folder as before. |
21b51da
to
122560a
Compare
This PR is ready to merge. Would you please help check this PR? Thanks. @ftian1 @lvliang-intel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
Signed-off-by: tianyil1 <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: tianyil1 <[email protected]>
Signed-off-by: tianyil1 <[email protected]>
Signed-off-by: tianyil1 <[email protected]>
Signed-off-by: tianyil1 <[email protected]>
Signed-off-by: tianyil1 <[email protected]>
Signed-off-by: tianyil1 <[email protected]>
af2a15b
to
1782801
Compare
for more information, see https://pre-commit.ci
* refine the vllm microservice Signed-off-by: tianyil1 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename the rayllm to ray_serve Signed-off-by: tianyil1 <[email protected]> * refactor the ray service code structure Signed-off-by: tianyil1 <[email protected]> * refine the vllm and readme Signed-off-by: tianyil1 <[email protected]> * update the readme with correct ray service name Signed-off-by: tianyil1 <[email protected]> * update the readme Signed-off-by: tianyil1 <[email protected]> * refine the readme Signed-off-by: tianyil1 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: tianyil1 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: V, Ganesan <[email protected]>
Description
This PR mainly updates the microservice wrapper for the vllm to align with the tgi, rename the ray backend service, and adds the vllm and ray introduction in the readme.
Issues
n/a
.Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
No newly introduced 3rd party dependency.
Tests
This PR is tested in the Gaudi2 server with:
2 sockers Intel(R) Xeon(R) Platinum 8368 CPU @ 2.40GHz
8 Gaudi nodes, HL-SMI Version: hl-1.14.0-fw-48.0.1.0 Driver Version: 1.14.0-9e8ecf8
which is tested well in the above env:
vLLM backend serving
Langchain vLLM serving