[pull] master from ggerganov:master #118

pull · 2024-06-17T15:32:00Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* gguf-dump.py: add --markdown dump output * gguf-dump.py: Add toc * gguf-dump.py: use standard tensor name lookup. Also add tensor ID field * gguf-dump.py: Add tensor overview count * gguf-dump.py: fix array preview * gguf-dump.py: markdownTableWithAlignmentSupport() added * Add type hints and spacing Co-authored-by: compilade <[email protected]> * gguf-dump.py: prettyfy dimention * gguf-dump: right align element count * gguf-dump.py: element count autosizing * Apply suggestions from code review Co-authored-by: compilade <[email protected]> --------- Co-authored-by: compilade <[email protected]>

* Implement non-mapped async IO for CUDA on Windows. On a fast Gen5 NVMe drive this change improves model load time by >3x while it should be the same (or slightly faster) on any other drive. * Free resources except for backend. * Change assertions to exceptions in llama_file, find correct cuda backend to create CUDA resources and respect the use_mmap flag again for CUDA. * Apply suggestions from code review Co-authored-by: slaren <[email protected]> * Fix editorconfig and unused variable * Fix issues with Windows build --------- Co-authored-by: slaren <[email protected]>

Signed-off-by: thxCode <[email protected]>

* update: convert-hf-to-gguf.py to support Qwen2-57B-A14B * fix: QWEN2MOE support for expert_feed_forward_length previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shared_exp are now properly calculated * update: convert-hf-to-gguf.py cleanup for Qwen2MoeForCausalLM * fix: QWEN2MOE support for expert_feed_forward_length previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shexp are now properly calculated

* whisper : use ggml_backend_sched (wip) * use sched in whisper_allocr * whisper : single backend in whisper_context * whisper : remove whisper_state->backends_used * whisper : remove whisper_context->backend * whisper : reset scheduler after init * whisper : fix external encoder (e.g. CoreML) * whisper : cleanup * whisper : handle null GPU buffer types + fix sycl --------- Co-authored-by: slaren <[email protected]>

Signed-off-by: thxCode <[email protected]>

On hosts which are not prepared/dedicated to execute code using CUDA it is still possible to compile llama.cpp with CUDA support by just installing the development packages. Missing are the runtime libraries like /usr/lib64/libcuda.so* and currently the link step will fail. The development environment is prepared for such situations. There are stub libraries for all the CUDA libraries available in the $(CUDA_PATH)/lib64/stubs directory. Adding this directory to the end of the search path will not change anything for environments which currently work fine but will enable compiling llama.cpp also in case the runtime code is not available.

* Only use FIM middle if it exists * Only use FIM middle if it exists

* Random test: add_bos_token, add_eos_token * Random test: add BPE models for testing * Custom regex split fails with codepoint 0 * Fix falcon punctuation regex * Refactor llm_tokenizer_bpe: move code to constructor * Move 'add_special_bos/eos' logic to llm_tokenizer_bpe * Move tokenizer flags to vocab structure. * Default values for special_add_bos/eos * Build vocab.special_tokens_cache using vocab token types * Generalize 'jina-v2' per token attributes * Fix unicode whitespaces (deepseek-coder, deepseek-llm) * Skip missing byte tokens (falcon) * Better unicode data generation * Replace char32_t with uint32_t

* seperate lower precision GEMM from the main files * fix workgroup size hardcode

mofosyne and others added 5 commits June 17, 2024 15:25

rpc : fix load/store misaligned addresses (#7948)

21be9ca

fix: divide 0 exception in mamba (#7932)

c637fcd

Signed-off-by: thxCode <[email protected]>

sched : offload_op also requires supports_op (#7977)

99052cd

github-actions bot added python ggml labels Jun 17, 2024

Add Nix and Flox install instructions (#7899)

b473e95

pull bot added ⤵️ pull and removed python ggml labels Jun 17, 2024

github-actions bot added python ggml labels Jun 17, 2024

ggerganov and others added 5 commits June 17, 2024 19:40

llama : disable FA if KV head size do not match (#7982)

7c26775

Make updates to type cast based on compiler instead of OS (#7851)

5b6da18

ggml : sync

5326bcc

github-actions bot added the script label Jun 18, 2024

abgulati and others added 5 commits June 18, 2024 09:57

readme : update UI list (#7943)

1193778

chore: clean useless beam search param (#7985)

b96f9af

Signed-off-by: thxCode <[email protected]>

Fix no gcc pragma on Windows (#7751)

84f6de1

Only use FIM middle token if it exists (#7648)

91c188d

* Only use FIM middle if it exists * Only use FIM middle if it exists

github-actions bot added examples server labels Jun 18, 2024

github-actions bot added the testing label Jun 18, 2024

[SYCL] refactor (#6408)

623494a

* seperate lower precision GEMM from the main files * fix workgroup size hardcode

github-actions bot added the SYCL label Jun 19, 2024

teleprint-me closed this Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #118

[pull] master from ggerganov:master #118

pull bot commented Jun 17, 2024 •

edited

Loading

[pull] master from ggerganov:master #118

[pull] master from ggerganov:master #118

Conversation

pull bot commented Jun 17, 2024 • edited Loading

pull bot commented Jun 17, 2024 •

edited

Loading