Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b4174
vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484) The vulkan-shaders-gen was not parsing the --no-clean argument correctly. Because the previous code was parsing the arguments which have a value only and the --no-clean argument does not have a value, it was not being parsed correctly. This commit can now correctly parse arguments that don't have values.
b4173
Introduce llama-run (#10291) It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <[email protected]>
b4171
server : add more information about error (#10455)
b4170
server : enable cache_prompt by default (#10501) ggml-ci
b4169
metal : enable mat-vec kernels for bs <= 4 (#10491)
b4168
Rename Olmo1124 to Olmo2 (#10500)
b4167
llama : accept a list of devices to use to offload a model (#10497) * llama : accept a list of devices to use to offload a model * accept `--dev none` to completely disable offloading * fix dev list with dl backends * rename env parameter to LLAMA_ARG_DEVICE for consistency
b4164
server : add speculative decoding support (#10455) * server : add speculative decoding support ggml-ci * server : add helper function slot.can_speculate() ggml-ci
b4163
ggml : add support for dynamic loading of backends (#10469) * ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <[email protected]>
b4162
tests : fix compile warning