Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how many rpc-host should I start on remote server #11859

Open
vino5211 opened this issue Feb 14, 2025 · 3 comments
Open

how many rpc-host should I start on remote server #11859

vino5211 opened this issue Feb 14, 2025 · 3 comments

Comments

@vino5211
Copy link

Discussed in #11858

Originally posted by vino5211 February 14, 2025
I have 4 gpu servers A,B,C,D, each has 4 NVIDIA A800 80GB PCIe. I start rpcserver on B,C,D. Here is the output of rpc-server commond, it seems found 4 CUDA devices, but only Device 0 is used on each server. So the question is how many rpc servers should I start on remote server, if there is 4 cuda devices?
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: Host ('0.0.0.0') is != '127.0.0.1'
Never expose the RPC server to an open network!
This is an experimental feature and is not secure!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 4 CUDA devices:
Device 0: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Device 1: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Device 2: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Device 3: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Starting RPC server on 0.0.0.0:50052, backend memory: 80614 MB

@rgerganov
Copy link
Collaborator

rgerganov commented Feb 14, 2025

You can start multiple rpc-server on each host, setting different CUDA device like this:

$ CUDA_VISIBLE_DEVICES=0 ./rpc-server -H 0.0.0.0 -p 50052
$ CUDA_VISIBLE_DEVICES=1 ./rpc-server -H 0.0.0.0 -p 50053
$ CUDA_VISIBLE_DEVICES=2 ./rpc-server -H 0.0.0.0 -p 50054
$ CUDA_VISIBLE_DEVICES=3 ./rpc-server -H 0.0.0.0 -p 50055

The maximum number of devices that can be used for offloading is currently 16.

@vino5211
Copy link
Author

vino5211 commented Feb 14, 2025

thank you for reply, @rgerganov I tried to start 4 rpc-server on each gpu host with CUDA_VISIBLE_DEVICES, and run llama-server on server A, it failed with error:
llama.cpp/ggml/src/ggml-backend.cpp:1455: GGML_ASSERT(n_backends <= GGML_SCHED_MAX_BACKENDS) failed
I print n_nackends = 17 ,there is 4 local cuda + 12 remote cuda + cpu,and GGML_SCHED_MAX_BACKENDS is set to 16 in source code.

You can start multiple rpc-server on each host, setting different CUDA device like this:

$ CUDA_VISIBLE_DEVICES=0 ./rpc-server -H 0.0.0.0 -p 50052
$ CUDA_VISIBLE_DEVICES=1 ./rpc-server -H 0.0.0.0 -p 50053
$ CUDA_VISIBLE_DEVICES=2 ./rpc-server -H 0.0.0.0 -p 50054
$ CUDA_VISIBLE_DEVICES=3 ./rpc-server -H 0.0.0.0 -p 50055

The maximum number of devices that can be used for offloading is currently 16.

@lexasub
Copy link
Contributor

lexasub commented Feb 19, 2025

@vino5211 , read #11361 (split tensor)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants