You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem is that a part of the memory was used for "CPU_AARCH64 model buffer". Normally the model takes only 150GB of RAM, now it takes 260GB and loads much slower. Command line: /root/llama.cpp/build/bin/llama-server -m /dev/shm/DeepSeek-V2.5-Q4_0-00001-of-00004.gguf -t 72. This doesn't appear when using Q4_K_M.
Name and Version
build: 4465 (9a48399) with gcc (conda-forge gcc 13.3.0-1) 13.3.0 for x86_64-conda-linux-gnu
Operating systems
Linux
GGML backends
CPU
Hardware
2x Intel Xeon 24 core (Kaggle)
Models
DeepSeek-V2.5: https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q4_0
Problem description & steps to reproduce
The problem is that a part of the memory was used for "CPU_AARCH64 model buffer". Normally the model takes only 150GB of RAM, now it takes 260GB and loads much slower. Command line:
/root/llama.cpp/build/bin/llama-server -m /dev/shm/DeepSeek-V2.5-Q4_0-00001-of-00004.gguf -t 72
. This doesn't appear when using Q4_K_M.Compile commands:
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: