[pull] master from ggerganov:master #121

pull · 2024-06-24T05:45:31Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

…el variants (#5763) * gguf-py : add T5 model architecture * gguf-py : add separate tensors for encoder and decoder * gguf-py : add new model header parameters: decoder_start_token_id, attention.relative_buckets_count, tokenizer.ggml.remove_extra_whitespaces, tokenizer.ggml.precompiled_charsmap * convert-hf : add model conversion support for T5ForConditionalGeneration and T5WithLMHeadModel --------- Co-authored-by: Stanisław Szymczyk <[email protected]>

* add parameters for embeddings --embd-normalize --embd-output-format --embd-separator description in the README.md * Update README.md fix tipo * Trailing whitespace * fix json generation, use " not ' * fix merge master * fix code formating group of parameters // embedding print usage for embedding parameters --------- Co-authored-by: Brian <[email protected]>

* support splits in convert.py * Support split by size and dry run to write estimated shards/filesizes * Move split functionality to new GGUFManager class * fix improper function signature * tentative push of convert-hf-to-gguf support * resolve merge + SplitArguments for easier parsing * Fix eager tensor memory leak and remove convert.py changes Removed a memory leak caused by unexpected reference retention to eager tensors. Also removed GGUFManager functionality in convert.py in favor of specializing for convert-hf-to-gguf.py. * refactor SplitStrategy to be a deque Instead of having SplitStrategy have a `data` field that is a deque, just have SplitStrategy be a subclass of deque itself. * fix Q8 quantization * remove unnecessary imports in gguf_manager * fix final? merge issue * fix gguf_writer placement and remove comments * oops, actually fix gguf_writer placement * reduce duplicated code from gguf_writer * further simplify GGUFManager * simplify even further and standardize with GGUFWriter * reduce diffs with master * form shards while adding tensors, SHA256 sums agree with master * re-add type hint Co-authored-by: compilade <[email protected]> * GGUFWriter compatibility fix Co-authored-by: compilade <[email protected]> * Shard dataclass and un-negative dont_add_architecture * type consistency in format_n_bytes_to_str * move kv keys to constants.py * make pathlib explicit * base-1024 bytes to base-1000 * rename GGUFManager to GGUFWriterSplit * Update gguf-py/gguf/constants.py Co-authored-by: compilade <[email protected]> * fix convert-hf-to-gguf.py permissions * fix line endings * Update gguf-py/gguf/gguf_writer_split.py Co-authored-by: compilade <[email protected]> * convert-hf : restore executable file permission * examples/convert-legacy-llama.py: restore executable file permission * reinstate original gguf package import and fix type annotation * attempt to appease the linter * attempt 2 to appease the linter * attempt 3 to appease the linter * comma consistency * Update convert-hf-to-gguf.py Co-authored-by: compilade <[email protected]> * edit cmd line args * use simplification from #7827 * kv/ti data are still wrong * try to refactor kv data (still fails) * fix ti data messiness * tidy up * fix linting * actually make the linter happy * cleanup round 1 * remove SplitStrategy, SplitArguments * appease linter * fix typing and clean up * fix linting * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * progress bar, fix split logic * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * catch oversights * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * swap bar orders * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * compatibility fix * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update convert-hf-to-gguf.py Co-authored-by: compilade <[email protected]> --------- Co-authored-by: Brian <[email protected]> Co-authored-by: compilade <[email protected]>

* CUDA: optimize MMQ int8 tensor core performance * only a single get_mma_tile_x_k function * simplify code, make functions constexpr

#8090) Co-authored-by: Stanisław Szymczyk <[email protected]> Co-authored-by: Brian <[email protected]>

…rompt (#7950) * SimpleChat: Allow for chat req bool options to be user controlled * SimpleChat: Allow user to control cache_prompt flag in request * SimpleChat: Add sample GUI images to readme file Show the chat screen and the settings screen * SimpleChat:Readme: Add quickstart block, title to image, cleanup * SimpleChat: RePosition contents of the Info and Settings UI Make it more logically structured and flow through. * SimpleChat: Rename to apiRequestOptions from chatRequestOptions So that it is not wrongly assumed that these request options are used only for chat/completions endpoint. Rather these are used for both the end points, so rename to match semantic better. * SimpleChat: Update image included with readme wrt settings ui * SimpleChat:ReadMe: Switch to webp screen image to reduce size

* add chat template support for llama-cli * add help message * server: simplify format_chat * more consistent naming * improve * add llama_chat_format_example * fix server * code style * code style * Update examples/main/main.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

* remove completions file * fix inverted vector * add mean method * code style * remove inverted pca hotfix

ggml-ci

…S (cmake) (#8140)

* clip : suppress unused variable warnings This commit suppresses unused variable warnings for the variables e in the catch blocks. The motivation for this change is to suppress the warnings that are generated on Windows when using the MSVC compiler. The warnings are not displayed when using GCC because GCC will mark all catch parameters as used. Signed-off-by: Daniel Bevenius <[email protected]> * squash! clip : suppress unused variable warnings Remove e (/*e*/) instead instead of using GGML_UNUSED. --------- Signed-off-by: Daniel Bevenius <[email protected]>

- Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"

* account for space prefix character * use find instead

Co-authored-by: kustaaya <[email protected]>

* Add Qwen2MoE 57B-A14B * Add Qwen2MoE 57B-A14B

* Delete examples/llama.android/llama/CMakeLists.txt #8145 (comment) This file is not being used for building on Android. `llama.cpp/examples/llama.android/llama/src/main/cpp/CMakeLists.txt` is being used instead. * Update CMakeLists.txt Pick local llama.cpp files instead of fetching content from git

* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow * refactored `llama_control_vector_load_one()` * allow multiple directions for same layer in same file * llama_control_vector_load_one() and llama_control_vector_load() now break on error * removed unnecessary ggml_free() call

Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/e9ee548d90ff586a6471b4ae80ae9cfcbceb3420?narHash=sha256-4Zu0RYRcAY/VWuu6awwq4opuiD//ahpc2aFHg2CWqFY%3D' (2024-06-13) → 'github:NixOS/nixpkgs/d603719ec6e294f034936c0d0dc06f689d91b6c3?narHash=sha256-k3JqJrkdoYwE3fHE6xGDY676AYmyh4U2Zw%2B0Bwe5DLU%3D' (2024-06-20) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Philip Taron <[email protected]>

* add chatml fallback for cpp `llama_chat_apply_template` * remove redundant code

* cmake : fix deprecated option names not working * remove LlAMA_OPENMP

* CI: fix release build (Ubuntu) PR #8006 changes defaults to build shared libs. However, CI for releases expects static builds. * CI: fix release build (Mac) --------- Co-authored-by: loonerin <[email protected]>

…perties (#8132) * json: update grammars/README * mention broken prefixItems * add mention to llama-gbnf-validator * json: explicit type: object for nested items object in cli example

* Inference support for Gemma 2 model family * Update convert-hf-to-gguf.py, constants, and tensor mappings * cleanup * format fix * Fix special token vocab bug * Don't add space prefix * fix deleted lines * Update src/llama.cpp Co-authored-by: slaren <[email protected]> * Add model type names * Add control vector * Fix model type identification --------- Co-authored-by: Andrei Betlen <[email protected]> Co-authored-by: slaren <[email protected]>

…rn escapes (#8180) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md

* add --spm-infill option * support --spm-infill * support --spm-infill

…emplate_internal` (#8172) * tmp_contains * minicpm chat template * add DeepSeek Lite template * change deepseek-lite to deepseek2 * correct code comment * correct code from master branch

fairydreaming and others added 3 commits June 24, 2024 07:06

disable publishing the full-rocm docker image (#8083)

8cb508d

github-actions bot added examples devops python labels Jun 24, 2024

pull bot added ⤵️ pull and removed examples devops python labels Jun 24, 2024

github-actions bot added examples devops python labels Jun 24, 2024

CUDA: optimize MMQ int8 tensor core performance (#8062)

9a590c8

* CUDA: optimize MMQ int8 tensor core performance * only a single get_mma_tile_x_k function * simplify code, make functions constexpr

github-actions bot added ggml Nvidia GPU labels Jun 24, 2024

fairydreaming and others added 2 commits June 24, 2024 14:13

gguf-py : fix tensor groups for encoder-decoder models in gguf-dump.py (

d62e4aa

#8090) Co-authored-by: Stanisław Szymczyk <[email protected]> Co-authored-by: Brian <[email protected]>

CUDA: use MMQ instead of cuBLAS by default (#8075)

a818f30

github-actions bot added the build label Jun 24, 2024

JohannesGaessler and others added 3 commits June 24, 2024 22:15

CUDA: fix MMQ writeback for int8 tensor cores (#8100)

3b099bc

CUDA: fix matrix multiplication algorithm choice (#8102)

2df373a

[SYCL] Re-enabled mul_mat_batched_sycl (#8095)

083bacc

github-actions bot added the SYCL label Jun 25, 2024

HatsuneMikuUwU33 and others added 2 commits June 25, 2024 10:44

Update control vector help (#8104)

f702a90

github-actions bot added the server label Jun 25, 2024

github-actions bot added the testing label Jun 25, 2024

cvector: better prompt handling, add "mean vector" method (#8069)

49c03c7

* remove completions file * fix inverted vector * add mean method * code style * remove inverted pca hotfix

ggerganov and others added 10 commits June 26, 2024 19:26

readme : update API notes

a95631e

devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (#8139)

0e814df

ggml-ci

authors : regen

4713bf3

sync : ggml

f2d48ff

make : fix missing -O3 (#8143)

c7ab7b6

ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLA…

31ec399

…S (cmake) (#8140)

ci : publish new docker images only when the files change (#8142)

ae5d0f4

scripts : fix filename sync

c70d117

Fix llama-android.cpp for error - "common/common.h not found" (#8145)

ac14662

- Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"

github-actions bot added the android label Jun 27, 2024

CISC and others added 18 commits June 27, 2024 10:46

llama : fix CodeLlama FIM token checks (#8144)

911e35b

* account for space prefix character * use find instead

Added support for Viking pre-tokenizer (#8135)

f675b20

Co-authored-by: kustaaya <[email protected]>

CUDA: fix MMQ stream-k for --split-mode row (#8167)

85a267d

Add Qwen2MoE 57B-A14B model identifier (#8158)

6030c61

* Add Qwen2MoE 57B-A14B * Add Qwen2MoE 57B-A14B

Add chatml fallback for cpp llama_chat_apply_template (#8160)

16791b8

* add chatml fallback for cpp `llama_chat_apply_template` * remove redundant code

cmake : fix deprecated option names not working (#8171)

8172ee9

* cmake : fix deprecated option names not working * remove LlAMA_OPENMP

CI: fix release build (Ubuntu+Mac) (#8170)

558f44b

* CI: fix release build (Ubuntu) PR #8006 changes defaults to build shared libs. However, CI for releases expects static builds. * CI: fix release build (Mac) --------- Co-authored-by: loonerin <[email protected]>

json: update grammars/README w/ examples & note about additionalPro…

cb0b06a

…perties (#8132) * json: update grammars/README * mention broken prefixItems * add mention to llama-gbnf-validator * json: explicit type: object for nested items object in cli example

Add missing items in makefile (#8177)

a27aa50

json: restore default additionalProperties to false, fix some patte…

139cc62

…rn escapes (#8180) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md

cmake : allow user to override default options (#8178)

b851b3f

Add SPM infill support (#8016)

38373cf

* add --spm-infill option * support --spm-infill * support --spm-infill

Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_t…

26a39bb

…emplate_internal` (#8172) * tmp_contains * minicpm chat template * add DeepSeek Lite template * change deepseek-lite to deepseek2 * correct code comment * correct code from master branch

json: attempt to skip slow tests when running under emulator (#8189)

8748d8a

teleprint-me closed this Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #121

[pull] master from ggerganov:master #121

pull bot commented Jun 24, 2024 •

edited

Loading

[pull] master from ggerganov:master #121

[pull] master from ggerganov:master #121

Conversation

pull bot commented Jun 24, 2024 • edited Loading

pull bot commented Jun 24, 2024 •

edited

Loading