Skip to content

0.10.2

Compare
Choose a tag to compare
@b4rtaz b4rtaz released this 29 Jul 12:23
· 24 commits to main since this release
71135e6

This version introduces a new CLI argument: --max-seq-len <n>. It allows you to reduce the context size and, at the same time, reduce memory consumption. This argument works with the following commands: dllama inference, dllama chat, and dllama-api. You don't need to set it in the worker because the root node will distribute the information to the worker.

Example:

./dllama chat --model ... --nthreads 8 --max-seq-len 1024