Release 0.10.2 · b4rtaz/distributed-llama

This version introduces a new CLI argument: --max-seq-len <n>. It allows you to reduce the context size and, at the same time, reduce memory consumption. This argument works with the following commands: dllama inference, dllama chat, and dllama-api. You don't need to set it in the worker because the root node will distribute the information to the worker.

Example:

./dllama chat --model ... --nthreads 8 --max-seq-len 1024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.10.2