Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from ggerganov:master #121

Closed
wants to merge 55 commits into from
Closed

Conversation

pull[bot]
Copy link

@pull pull bot commented Jun 24, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

fairydreaming and others added 3 commits June 24, 2024 07:06
…el variants (#5763)

* gguf-py : add T5 model architecture

* gguf-py : add separate tensors for encoder and decoder

* gguf-py : add new model header parameters: decoder_start_token_id, attention.relative_buckets_count, tokenizer.ggml.remove_extra_whitespaces, tokenizer.ggml.precompiled_charsmap

* convert-hf : add model conversion support for T5ForConditionalGeneration and T5WithLMHeadModel

---------

Co-authored-by: Stanisław Szymczyk <[email protected]>
* add parameters for embeddings
--embd-normalize
--embd-output-format
--embd-separator
description in the README.md

* Update README.md

fix tipo

* Trailing whitespace

* fix json generation, use " not '

* fix merge master

* fix code formating
group of parameters // embedding
print usage for embedding parameters

---------

Co-authored-by: Brian <[email protected]>
* support splits in convert.py

* Support split by size and dry run to write estimated shards/filesizes

* Move split functionality to new GGUFManager class

* fix improper function signature

* tentative push of convert-hf-to-gguf support

* resolve merge + SplitArguments for easier parsing

* Fix eager tensor memory leak and remove convert.py changes

Removed a memory leak caused by unexpected reference retention to eager tensors.

Also removed GGUFManager functionality in convert.py in favor of specializing for convert-hf-to-gguf.py.

* refactor SplitStrategy to be a deque

Instead of having SplitStrategy have a `data` field that is a deque, just have SplitStrategy be a subclass of deque itself.

* fix Q8 quantization

* remove unnecessary imports in gguf_manager

* fix final? merge issue

* fix gguf_writer placement and remove comments

* oops, actually fix gguf_writer placement

* reduce duplicated code from gguf_writer

* further simplify GGUFManager

* simplify even further and standardize with GGUFWriter

* reduce diffs with master

* form shards while adding tensors, SHA256 sums agree with master

* re-add type hint

Co-authored-by: compilade <[email protected]>

* GGUFWriter compatibility fix

Co-authored-by: compilade <[email protected]>

* Shard dataclass and un-negative dont_add_architecture

* type consistency in format_n_bytes_to_str

* move kv keys to constants.py

* make pathlib explicit

* base-1024 bytes to base-1000

* rename GGUFManager to GGUFWriterSplit

* Update gguf-py/gguf/constants.py

Co-authored-by: compilade <[email protected]>

* fix convert-hf-to-gguf.py permissions

* fix line endings

* Update gguf-py/gguf/gguf_writer_split.py

Co-authored-by: compilade <[email protected]>

* convert-hf : restore executable file permission

* examples/convert-legacy-llama.py: restore executable file permission

* reinstate original gguf package import and fix type annotation

* attempt to appease the linter

* attempt 2 to appease the linter

* attempt 3 to appease the linter

* comma consistency

* Update convert-hf-to-gguf.py

Co-authored-by: compilade <[email protected]>

* edit cmd line args

* use simplification from #7827

* kv/ti data are still wrong

* try to refactor kv data (still fails)

* fix ti data messiness

* tidy up

* fix linting

* actually make the linter happy

* cleanup round 1

* remove SplitStrategy, SplitArguments

* appease linter

* fix typing and clean up

* fix linting

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* progress bar, fix split logic

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* catch oversights

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* swap bar orders

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* compatibility fix

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* Update convert-hf-to-gguf.py

Co-authored-by: compilade <[email protected]>

---------

Co-authored-by: Brian <[email protected]>
Co-authored-by: compilade <[email protected]>
* CUDA: optimize MMQ int8 tensor core performance

* only a single get_mma_tile_x_k function

* simplify code, make functions constexpr
@github-actions github-actions bot added the build label Jun 24, 2024
@github-actions github-actions bot added the SYCL label Jun 25, 2024
HatsuneMikuUwU33 and others added 2 commits June 25, 2024 10:44
…rompt (#7950)

* SimpleChat: Allow for chat req bool options to be user controlled

* SimpleChat: Allow user to control cache_prompt flag in request

* SimpleChat: Add sample GUI images to readme file

Show the chat screen and the settings screen

* SimpleChat:Readme: Add quickstart block, title to image, cleanup

* SimpleChat: RePosition contents of the Info and Settings UI

Make it more logically structured and flow through.

* SimpleChat: Rename to apiRequestOptions from chatRequestOptions

So that it is not wrongly assumed that these request options are
used only for chat/completions endpoint. Rather these are used
for both the end points, so rename to match semantic better.

* SimpleChat: Update image included with readme wrt settings ui

* SimpleChat:ReadMe: Switch to webp screen image to reduce size
* add chat template support for llama-cli

* add help message

* server: simplify format_chat

* more consistent naming

* improve

* add llama_chat_format_example

* fix server

* code style

* code style

* Update examples/main/main.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>
* remove completions file

* fix inverted vector

* add mean method

* code style

* remove inverted pca hotfix
ggerganov and others added 10 commits June 26, 2024 19:26
* clip : suppress unused variable warnings

This commit suppresses unused variable warnings for the variables e in
the catch blocks.

The motivation for this change is to suppress the warnings that are
generated on Windows when using the MSVC compiler. The warnings are
not displayed when using GCC because GCC will mark all catch parameters
as used.

Signed-off-by: Daniel Bevenius <[email protected]>

* squash! clip : suppress unused variable warnings

Remove e (/*e*/) instead instead of using GGML_UNUSED.

---------

Signed-off-by: Daniel Bevenius <[email protected]>
- Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"
CISC and others added 18 commits June 27, 2024 10:46
* account for space prefix character

* use find instead
* Add Qwen2MoE 57B-A14B

* Add Qwen2MoE 57B-A14B
* Delete examples/llama.android/llama/CMakeLists.txt

#8145 (comment)

This file is not being used for building on Android. `llama.cpp/examples/llama.android/llama/src/main/cpp/CMakeLists.txt` is being used instead.

* Update CMakeLists.txt

Pick local llama.cpp files instead of fetching content from git
* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow

* refactored `llama_control_vector_load_one()`

* allow multiple directions for same layer in same file

* llama_control_vector_load_one() and llama_control_vector_load() now break on error

* removed unnecessary ggml_free() call
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/e9ee548d90ff586a6471b4ae80ae9cfcbceb3420?narHash=sha256-4Zu0RYRcAY/VWuu6awwq4opuiD//ahpc2aFHg2CWqFY%3D' (2024-06-13)
  → 'github:NixOS/nixpkgs/d603719ec6e294f034936c0d0dc06f689d91b6c3?narHash=sha256-k3JqJrkdoYwE3fHE6xGDY676AYmyh4U2Zw%2B0Bwe5DLU%3D' (2024-06-20)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Philip Taron <[email protected]>
* add chatml fallback for cpp `llama_chat_apply_template`

* remove redundant code
* cmake : fix deprecated option names not working

* remove LlAMA_OPENMP
* CI: fix release build (Ubuntu)

PR #8006 changes defaults to build shared libs. However, CI for releases
expects static builds.

* CI: fix release build (Mac)

---------

Co-authored-by: loonerin <[email protected]>
…perties (#8132)

* json: update grammars/README

* mention broken prefixItems

* add mention to llama-gbnf-validator

* json: explicit type: object for nested items object in cli example
* Inference support for Gemma 2 model family

* Update convert-hf-to-gguf.py, constants, and tensor mappings

* cleanup

* format fix

* Fix special token vocab bug

* Don't add space prefix

* fix deleted lines

* Update src/llama.cpp

Co-authored-by: slaren <[email protected]>

* Add model type names

* Add control vector

* Fix model type identification

---------

Co-authored-by: Andrei Betlen <[email protected]>
Co-authored-by: slaren <[email protected]>
…rn escapes (#8180)

* json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset

* json: revert default of additionalProperties to false

* Update README.md
* add --spm-infill option

* support --spm-infill

* support --spm-infill
…emplate_internal` (#8172)

* tmp_contains

* minicpm chat template

* add DeepSeek Lite template

* change deepseek-lite to deepseek2

* correct code comment

* correct code from master branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.