-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading Model through Multi-Node Ray Cluster Fails #881
Comments
Seems it's just not enough RAM: |
@VarunSreenivasan16 did you figure out how to fix this? I run into the same issue recently. |
Hi @WoosukKwon , I encountered the same issue when connecting to remote Ray cluster with vllm v0.2.6. any idea or direction? Thank you! |
@VarunSreenivasan16 @masnec did either of you ever figure out what the issue was or resolve it? |
@hmellor it worked after adding those two flags |
@masnec would you mind contributing this information to https://docs.vllm.ai/en/latest/serving/distributed_serving.html? |
Btw, using tensor-parallel across multiple nodes can penalty the performance a lot because it requires high network communication. |
Problem Description
I'm trying to spin up the VLLM API server inside a docker container (that has vllm and all requirements installed) on the head node of a ray cluster (the cluster contains 4 nodes and has access to 4 T4 GPUs) with the following command in the container:
python3 -m vllm.entrypoints.api_server --host 0.0.0.0 --port 8000 --model /home/model --tensor-parallel-size 4 --swap-space 2
I get back the following error (NOTE: I x'd out the IP):
How to Reproduce
Hardware
Model
llama-2-13B
model (obtained from merging Llama-2-13b-hf and lora adapter using peft).tokenizer.model
,tokenizer.json
,tokenizer_config.json
are present in the model directory as well (obtained from base hf model).Cluster Setup
ray status --address <head_node_ip:6379>
and I can confirm that all the four nodes are accessible from the docker container running in the head node.RAY_ADDRESS
to beray://<head_node_ip:10001>
and then ran the command to spin up the API server.Docker Container Setup on Head Node
The following was the Dockerfile used to build the container:
The requirements.txt is
I can confirm the installation succeeded and nvidia-smi inside the container correctly shows the cuda version to be 11.8.
I also used
--gpus all flag
when running thedocker run
command.Request
I'd appreciate any advice or help on this issue. I'm happy to provide any more information required to address the issue.
The text was updated successfully, but these errors were encountered: