Releases · leixy76/llama.cpp

24 Jul 16:19

f19bf99

b3454 Latest

Latest

Build Llama SYCL Intel with static libs (#8668)

Ensure SYCL CI builds both static & dynamic libs for testing purposes

Signed-off-by: Joe Todd <[email protected]>

Assets 20

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-07-24T16:19:06Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-07-24T16:19:13Z
llama-b3454-bin-macos-arm64.zip

47.3 MB 2024-07-24T16:19:21Z
llama-b3454-bin-macos-x64.zip

50.4 MB 2024-07-24T16:19:22Z
llama-b3454-bin-ubuntu-x64.zip

53.6 MB 2024-07-24T16:19:24Z
llama-b3454-bin-win-avx-x64.zip

7.46 MB 2024-07-24T16:19:25Z
llama-b3454-bin-win-avx2-x64.zip

7.45 MB 2024-07-24T16:19:26Z
llama-b3454-bin-win-avx512-x64.zip

7.46 MB 2024-07-24T16:19:27Z
llama-b3454-bin-win-cuda-cu11.7.1-x64.zip

124 MB 2024-07-24T16:19:28Z
llama-b3454-bin-win-cuda-cu12.2.0-x64.zip

123 MB 2024-07-24T16:19:30Z
Source code (zip)

2024-07-24T13:36:00Z
Source code (tar.gz)

2024-07-24T13:36:00Z

22 Jul 15:35

github-actions

b3439

566daa5

b3439

*.py: Stylistic adjustments for python (#8233)

* Superflous parens in conditionals were removed.
* Unused args in function were removed.
* Replaced unused `idx` var with `_`
* Initializing file_format and format_version attributes
* Renaming constant to capitals
* Preventing redefinition of the `f` var

Signed-off-by: Jiri Podivin <[email protected]>

Assets 20

22 Jul 02:08

github-actions

b3432

45f2c19

b3432

flake.lock: Update (#8610)

Assets 20

18 Jul 13:32

github-actions

b3414

0d2c732

b3414

server: use relative routes for static files in new UI (#8552)

* server: public: fix api_url on non-index pages

* server: public: use relative routes for static files in new UI

Assets 20

17 Jul 16:08

github-actions

b3409

30f80ca

b3409

CONTRIBUTING.md : remove mention of noci (#8541)

Assets 3

17 Jul 01:52

github-actions

b3405

5e116e8

b3405

make/cmake: add missing force MMQ/cuBLAS for HIP (#8515)

Assets 20

14 Jul 13:42

github-actions

b3389

73cf442

b3389

llama : fix Gemma-2 Query scaling factors (#8473)

* 9B - query_pre_attn_scalar = 256 not 224

See https://github.com/google/gemma_pytorch/commit/03e657582d17cb5a8617ebf333c1c16f3694670e

Gemma 9b should use 256 and not 224 (self.config.hidden_size // self.config.num_attention_heads)

* llama : fix Gemma-2 Query scaling factor

ggml-ci

---------

Co-authored-by: Daniel Han <[email protected]>

Assets 20

12 Jul 15:03

github-actions

b3384

4e24cff

b3384

server : handle content array in chat API (#8449)

* server : handle content array in chat API

* Update examples/server/utils.hpp

Co-authored-by: Xuan Son Nguyen <[email protected]>

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 20

11 Jul 14:36

github-actions

b3372

a977c11

b3372

gitignore : deprecated binaries

Assets 20

11 Jul 01:48

github-actions

b3369

278d0e1

b3369

Initialize default slot sampling parameters from the global context. …

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: leixy76/llama.cpp

b3454

b3439

b3432

b3414

b3409

b3405

b3389

b3384

b3372

b3369