Releases · ggerganov/llama.cpp

26 Nov 02:50

0eb4e12

b4174 Latest

Latest

vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484)

The vulkan-shaders-gen was not parsing the --no-clean argument correctly.
Because the previous code was parsing the arguments which have a value only
and the --no-clean argument does not have a value, it was not being parsed
correctly. This commit can now correctly parse arguments that don't have values.

Assets 21

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-11-26T02:50:33Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-11-26T02:50:42Z
llama-b1-bin-win-hip-x64-gfx1030.zip

236 MB 2024-11-26T02:50:54Z
llama-b1-bin-win-hip-x64-gfx1100.zip

238 MB 2024-11-26T02:51:01Z
llama-b1-bin-win-hip-x64-gfx1101.zip

238 MB 2024-11-26T02:51:09Z
llama-b4174-bin-macos-arm64.zip

53.5 MB 2024-11-26T02:51:16Z
llama-b4174-bin-macos-x64.zip

54.5 MB 2024-11-26T02:51:18Z
llama-b4174-bin-ubuntu-x64.zip

58.6 MB 2024-11-26T02:51:20Z
llama-b4174-bin-win-avx-x64.zip

8.41 MB 2024-11-26T02:51:23Z
llama-b4174-bin-win-avx2-x64.zip

8.42 MB 2024-11-26T02:51:24Z
Source code (zip)

2024-11-26T01:47:20Z
Source code (tar.gz)

2024-11-26T01:47:20Z

25 Nov 22:56

github-actions

b4173

0cc6375

b4173

Introduce llama-run (#10291)

It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.

Signed-off-by: Eric Curtin <[email protected]>

Assets 21

25 Nov 22:32

github-actions

b4171

9fd8c26

b4171

server : add more information about error (#10455)

Assets 21

25 Nov 21:55

github-actions

b4170

47f931c

b4170

server : enable cache_prompt by default (#10501)

ggml-ci

Assets 21

25 Nov 21:52

github-actions

b4169

106964e

b4169

metal : enable mat-vec kernels for bs <= 4 (#10491)

Assets 21

25 Nov 21:34

github-actions

b4168

80acb7b

b4168

Rename Olmo1124 to Olmo2 (#10500)

Assets 21

25 Nov 21:26

github-actions

b4167

10bce04

b4167

llama : accept a list of devices to use to offload a model (#10497)

* llama : accept a list of devices to use to offload a model

* accept `--dev none` to completely disable offloading

* fix dev list with dl backends

* rename env parameter to LLAMA_ARG_DEVICE for consistency

Assets 21

25 Nov 18:09

github-actions

b4164

9ca2e67

b4164

server : add speculative decoding support (#10455)

* server : add speculative decoding support

ggml-ci

* server : add helper function slot.can_speculate()

ggml-ci

Assets 21

25 Nov 17:41

github-actions

b4163

5931c1f

b4163

ggml : add support for dynamic loading of backends (#10469)

* ggml : add support for dynamic loading of backends

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 21

25 Nov 17:17

github-actions

b4162

f6d12e7

b4162

tests : fix compile warning

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4174

b4173

b4171

b4170

b4169

b4168

b4167

b4164

b4163

b4162