-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how many rpc-host should I start on remote server #11859
Comments
You can start multiple
The maximum number of devices that can be used for offloading is currently 16. |
thank you for reply, @rgerganov I tried to start 4 rpc-server on each gpu host with CUDA_VISIBLE_DEVICES, and run llama-server on server A, it failed with error:
|
Discussed in #11858
Originally posted by vino5211 February 14, 2025
I have 4 gpu servers A,B,C,D, each has 4 NVIDIA A800 80GB PCIe. I start rpcserver on B,C,D. Here is the output of rpc-server commond, it seems found 4 CUDA devices, but only Device 0 is used on each server. So the question is how many rpc servers should I start on remote server, if there is 4 cuda devices?
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: Host ('0.0.0.0') is != '127.0.0.1'
Never expose the RPC server to an open network!
This is an experimental feature and is not secure!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 4 CUDA devices:
Device 0: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Device 1: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Device 2: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Device 3: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Starting RPC server on 0.0.0.0:50052, backend memory: 80614 MB
The text was updated successfully, but these errors were encountered: