feat: add nccl support for multi-gpu tensor parallelism #91

jorgeantonio21 · 2024-10-01T10:27:48Z

No description provided.

…m service

… back to the user

eureka-cpu · 2024-10-04T06:55:01Z

Looks fine to me, though in general it's probably best to avoid using the panic macro in production code if the function returns a Result you may be better off finding a way of propagating the error and handling it

backends/vllm/src/models/llama.rs

backends/vllm/src/models/llama_nccl.rs

backends/vllm/src/models/llama.rs

backends/vllm/src/model_executor.rs

Cifko · 2024-10-04T13:56:29Z

backends/vllm/src/tests/mod.rs

 mod llama;
+#[cfg(all(test, feature = "nccl"))]


the test check is not necessary, as this whole folder is included only when running test

jorgeantonio21 · 2024-10-04T15:16:14Z

Closes #101.

jorgeantonio21 · 2024-10-04T15:32:28Z

Addresses #24 (llama).

jorgeantonio21 added 30 commits September 26, 2024 21:39

first commit

6bac598

first commit

b8c1acb

add tests mod

4ec93f7

Merge branch 'main' into openai-api-revision

9bf7864

first commit

c05f5ba

refactor repository

7615979

refactor the llm service logic to be able to communicate with the axu…

73a781f

…m service

fmt

a701134

message content format and parse RequestBody into GenerateRequest

e67814d

merge main and resolve conflicts

ca4231b

config comments

3b37f7b

minor mods

6223f08

add unit tests for messages to prompt

2768d3b

improve docs, resolve clippy, add remaining logic to handle responses…

75b7c14

… back to the user

refactor tests

d41011f

refactor tests for llm

5138aa9

first commit

e675069

llama-nccl

528479e

add clap args

f19902e

add features derive to clap

e98682b

remove comments

1522072

resolve few issues with finished reason parsing

0dc3b60

address PR comments

29f92f8

add llama models enums

977f242

add llama models enums

fd4860a

add llama models enums

eeee203

correct meta hf string

fdaac7d

correct meta hf string

a7dbee5

Merge branch 'vllm-openai-api-integration' into ja-nccl

0f1f9d2

add changes

05ccda0

jorgeantonio21 marked this pull request as ready for review October 4, 2024 01:18

jorgeantonio21 requested review from Cifko, francis2tm and eureka-cpu October 4, 2024 01:18

jorgeantonio21 added 4 commits October 4, 2024 02:21

update features on server crate

343934d

update features dependencies on backends with nccl

e8db2e5

add changes

f6f3385

Merge branch 'main' into ja-nccl

65a8eb1

Cifko reviewed Oct 4, 2024

View reviewed changes

jorgeantonio21 added 12 commits October 4, 2024 13:11

address PR comments

f6f0317

add small changes

6a41d0e

add small changes

12fbff8

add small changes

832d020

remove unnecessary feature

8915587

remove unnecessary feature flags from code

19ced8d

remove unnecessary feature flags from code

31e7279

add changes

232db33

add imports

6859e5a

add imports

b8697e1

add imports

0312c8a

add feature gating to llama tests

c7160cf

Cifko approved these changes Oct 4, 2024

View reviewed changes

jorgeantonio21 added 2 commits October 4, 2024 16:12

only allocate device ids memory

6e7886d

only allocate device ids memory

07f92d1

jorgeantonio21 merged commit 036cc85 into main Oct 4, 2024
1 check failed

jorgeantonio21 mentioned this pull request Oct 6, 2024

Cache config calculation of number of gpu blocks available should depend only on the available device ids for the model #101

Closed

jorgeantonio21 mentioned this pull request Oct 17, 2024

Distributed inference and tensor parallelism plans EricLBuehler/mistral.rs#675

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add nccl support for multi-gpu tensor parallelism #91

feat: add nccl support for multi-gpu tensor parallelism #91

jorgeantonio21 commented Oct 1, 2024

eureka-cpu commented Oct 4, 2024

Cifko Oct 4, 2024

jorgeantonio21 commented Oct 4, 2024

jorgeantonio21 commented Oct 4, 2024

feat: add nccl support for multi-gpu tensor parallelism #91

feat: add nccl support for multi-gpu tensor parallelism #91

Conversation

jorgeantonio21 commented Oct 1, 2024

eureka-cpu commented Oct 4, 2024

Cifko Oct 4, 2024

Choose a reason for hiding this comment

jorgeantonio21 commented Oct 4, 2024

jorgeantonio21 commented Oct 4, 2024