You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I build mistral with cargo build --release --features "cuda flash-attn" and run model with ./target/release/mistralrs-server --port 1234 -n 8 plain -m ./Qwen/Qwen2-72B-Instruct/ -a qwen2 on 8*a100 device, the nvitop shown only one gpu' memory is growing and then oom.
Describe the bug
I build mistral with
cargo build --release --features "cuda flash-attn"
and run model with./target/release/mistralrs-server --port 1234 -n 8 plain -m ./Qwen/Qwen2-72B-Instruct/ -a qwen2
on 8*a100 device, the nvitop shown only one gpu' memory is growing and then oom.Latest commit
3a79137
The text was updated successfully, but these errors were encountered: