how many rpc-host should I start on remote server #11859

vino5211 · 2025-02-14T06:08:41Z

Discussed in #11858

^{Originally posted by vino5211 February 14, 2025}
I have 4 gpu servers A,B,C,D, each has 4 NVIDIA A800 80GB PCIe. I start rpcserver on B,C,D. Here is the output of rpc-server commond, it seems found 4 CUDA devices, but only Device 0 is used on each server. So the question is how many rpc servers should I start on remote server, if there is 4 cuda devices?
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: Host ('0.0.0.0') is != '127.0.0.1'
Never expose the RPC server to an open network!
This is an experimental feature and is not secure!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 4 CUDA devices:
Device 0: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Device 1: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Device 2: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Device 3: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes
Starting RPC server on 0.0.0.0:50052, backend memory: 80614 MB

rgerganov · 2025-02-14T07:05:21Z

You can start multiple rpc-server on each host, setting different CUDA device like this:

$ CUDA_VISIBLE_DEVICES=0 ./rpc-server -H 0.0.0.0 -p 50052
$ CUDA_VISIBLE_DEVICES=1 ./rpc-server -H 0.0.0.0 -p 50053
$ CUDA_VISIBLE_DEVICES=2 ./rpc-server -H 0.0.0.0 -p 50054
$ CUDA_VISIBLE_DEVICES=3 ./rpc-server -H 0.0.0.0 -p 50055

The maximum number of devices that can be used for offloading is currently 16.

vino5211 · 2025-02-14T07:36:24Z

thank you for reply, @rgerganov I tried to start 4 rpc-server on each gpu host with CUDA_VISIBLE_DEVICES, and run llama-server on server A, it failed with error:
llama.cpp/ggml/src/ggml-backend.cpp:1455: GGML_ASSERT(n_backends <= GGML_SCHED_MAX_BACKENDS) failed
I print n_nackends = 17 ,there is 4 local cuda + 12 remote cuda + cpu，and GGML_SCHED_MAX_BACKENDS is set to 16 in source code.

You can start multiple rpc-server on each host, setting different CUDA device like this:
$ CUDA_VISIBLE_DEVICES=0 ./rpc-server -H 0.0.0.0 -p 50052
$ CUDA_VISIBLE_DEVICES=1 ./rpc-server -H 0.0.0.0 -p 50053
$ CUDA_VISIBLE_DEVICES=2 ./rpc-server -H 0.0.0.0 -p 50054
$ CUDA_VISIBLE_DEVICES=3 ./rpc-server -H 0.0.0.0 -p 50055
The maximum number of devices that can be used for offloading is currently 16.

lexasub · 2025-02-19T09:23:27Z

@vino5211 , read #11361 (split tensor)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how many rpc-host should I start on remote server #11859

how many rpc-host should I start on remote server #11859

vino5211 commented Feb 14, 2025

rgerganov commented Feb 14, 2025 •

edited

Loading

vino5211 commented Feb 14, 2025 •

edited

Loading

lexasub commented Feb 19, 2025

how many rpc-host should I start on remote server #11859

how many rpc-host should I start on remote server #11859

Comments

vino5211 commented Feb 14, 2025

Discussed in #11858

rgerganov commented Feb 14, 2025 • edited Loading

vino5211 commented Feb 14, 2025 • edited Loading

lexasub commented Feb 19, 2025

rgerganov commented Feb 14, 2025 •

edited

Loading

vino5211 commented Feb 14, 2025 •

edited

Loading