[pull] master from ggerganov:master #8

pull · 2023-12-28T16:42:56Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

The default values for tfs_z and typical_p were being set to zero, which caused the token candidates array to get shrunk down to one element thus preventing any sampling. Note this only applies to OpenAI API compatible HTTP server requests. The solution is to use the default values that OpenAI documents, as well as ensuring we use the llama.cpp defaults for the rest. I've tested this change still ensures deterministic output by default. If a "temperature" greater than 0 is explicitly passed, then output is unique each time. If "seed" is specified in addition to "temperature" then the output becomes deterministic once more. See Mozilla-Ocho/llamafile#117 See Mozilla-Ocho/llamafile@9e4bf29

* fixed mul-mat error for old GPUs * style fixes * add mul mat src1 f16 test cases, fix more cases ggml-ci --------- Co-authored-by: bssrdf <[email protected]> Co-authored-by: slaren <[email protected]>

@qeaa

* Build with CLBlast * Declare GGML_API After rebasing, examples/talk-llama failed: "D:\a\whisper.cpp\whisper.cpp\build\ALL_BUILD.vcxproj" (build target) (1) -> "D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj" (default target) (14) -> (Link target) -> llama.obj : error LNK2019: unresolved external symbol ggml_cl_free_data referenced in function "public: __cdecl llama_model::~llama_model(void)" (??1llama_model@@qeaa@XZ) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj] llama.obj : error LNK2019: unresolved external symbol ggml_cl_transform_tensor referenced in function "public: void __cdecl llama_model_loader::load_all_data(struct ggml_context *,void (__cdecl*)(float,void *),void *,struct llama_mlock *)" (?load_all_data@llama_model_loader@@QEAAXPEAUggml_context@@P6AXMPEAX@Z1PEAUllama_mlock@@@z) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj] D:\a\whisper.cpp\whisper.cpp\build\bin\Release\talk-llama.exe : fatal error LNK1120: 2 unresolved externals [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]

* fix infinite loop * slight UI simplification, clearer UX * clearer UI text, add timings to completion log

* Fix main-cmake-pkg compilation * Use glob to load common files * cmake : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <[email protected]>

The server currently schedules tasks using a sleep(5ms) busy loop. This adds unnecessary latency since most sleep implementations do a round up to the system scheduling quantum (usually 10ms). Other libc sleep impls spin for smaller time intervals which results in the server's busy loop consuming all available cpu. Having the explicit notify() / wait() code also helps aid in the readability of the server code. See Mozilla-Ocho/llamafile@711344b

This change makes it possible to use flags like `--grammar` when using the `llava-cli` program. The rest is just code cleanup deleting a long standing TODO comment. This change also ensures that logging information is emitted to stderr which helps the `llava-cli` command be more friendly to shell scripts. See Mozilla-Ocho/llamafile@1cd334f

* fix "ld: warning: ignoring duplicate libraries: '../libllama.a'" * fix warning in example.

* flake.lock: update to hotfix CUDA::cuda_driver Required to support #4606 * flake.nix: rewrite 1. Split into separate files per output. 2. Added overlays, so that this flake can be integrated into others. The names in the overlay are `llama-cpp`, `llama-cpp-opencl`, `llama-cpp-cuda`, and `llama-cpp-rocm` so that they fit into the broader set of Nix packages from [nixpkgs](https://github.com/nixos/nixpkgs). 3. Use [callPackage](https://summer.nixos.org/blog/callpackage-a-tool-for-the-lazy/) rather than `with pkgs;` so that there's dependency injection rather than dependency lookup. 4. Add a description and meta information for each package. The description includes a bit about what's trying to accelerate each one. 5. Use specific CUDA packages instead of cudatoolkit on the advice of SomeoneSerge. 6. Format with `serokell/nixfmt` for a consistent style. 7. Update `flake.lock` with the latest goods. * flake.nix: use finalPackage instead of passing it manually * nix: unclutter darwin support * nix: pass most darwin frameworks unconditionally ...for simplicity * *.nix: nixfmt nix shell github:piegamesde/nixfmt/rfc101-style --command \ nixfmt flake.nix .devops/nix/*.nix * flake.nix: add maintainers * nix: move meta down to follow Nixpkgs style more closely * nix: add missing meta attributes nix: clarify the interpretation of meta.maintainers nix: clarify the meaning of "broken" and "badPlatforms" nix: passthru: expose the use* flags for inspection E.g.: ``` ❯ nix eval .#cuda.useCuda true ``` * flake.nix: avoid re-evaluating nixpkgs too many times * flake.nix: use flake-parts * nix: migrate to pname+version * flake.nix: overlay: expose both the namespace and the default attribute * ci: add the (Nix) flakestry workflow * nix: cmakeFlags: explicit OFF bools * nix: cuda: reduce runtime closure * nix: fewer rebuilds * nix: respect config.cudaCapabilities * nix: add the impure driver's location to the DT_RUNPATHs * nix: clean sources more thoroughly ...this way outPaths change less frequently, and so there are fewer rebuilds * nix: explicit mpi support * nix: explicit jetson support * flake.nix: darwin: only expose the default --------- Co-authored-by: Someone Serge <[email protected]>

* python: add check-requirements.sh and GitHub workflow This script and workflow forces package versions to remain compatible across all convert*.py scripts, while allowing secondary convert scripts to import dependencies not wanted in convert.py. * Move requirements into ./requirements * Fail on "==" being used for package requirements (but can be suppressed) * Enforce "compatible release" syntax instead of == * Update workflow * Add upper version bound for transformers and protobuf * improve check-requirements.sh * small syntax change * don't remove venvs if nocleanup is passed * See if this fixes docker workflow * Move check-requirements.sh into ./scripts/ --------- Co-authored-by: Jared Van Bortel <[email protected]>

Signed-off-by: hydai <[email protected]>

* clip: enable CUDA backend * add missing kernels * add enough padding for alignment * remove ggml_repeat of clip.cpp * add metal backend * llava : fixes - avoid ggml_repeat - use GGML_USE_ instead of CLIP_USE_ macros - remove unused vars --------- Co-authored-by: Georgi Gerganov <[email protected]>

* feat: add avx_vnni based on intel documents * ggml: add avx vnni based on intel document * llama: add avx vnni information display * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * Update ggml.c Fix indentation upgate Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

* clip : refactor + bug fixes ggml-ci * server : add log message

gpt2 : Add gpt2 architecture integration (#4555)

ea5497d

pull bot added the ⤵️ pull label Dec 28, 2023

jart and others added 22 commits December 28, 2023 15:20

scripts : do not sync commits from this repo

ca38b8d

ggml : fix some mul mat cases + add tests for src1 F16 (ggml/669)

afc8c19

* fixed mul-mat error for old GPUs * style fixes * add mul mat src1 f16 test cases, fix more cases ggml-ci --------- Co-authored-by: bssrdf <[email protected]> Co-authored-by: slaren <[email protected]>

sync : ggml

38b3de4

scripts : print list of sync commits

c8255f8

llama.swiftui : fix infinite loop, ouput timings, buff UI (#4674)

afd997a

* fix infinite loop * slight UI simplification, clearer UX * clearer UI text, add timings to completion log

main-cmake-pkg : fix build issue (#4665)

82d6eab

* Fix main-cmake-pkg compilation * Use glob to load common files * cmake : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <[email protected]>

server : allow to generate multimodal embeddings (#4681)

b93edd2

server : fix OpenAI server sampling w.r.t. penalty. (#4675)

60f55e8

cmake : fix ld warning duplicate libraries libllama.a (#4671)

97bbca6

* fix "ld: warning: ignoring duplicate libraries: '../libllama.a'" * fix warning in example.

cuda: fix vmm oom issue on NVIDIA AGX Orin (#4687)

91bb39c

Signed-off-by: hydai <[email protected]>

clip : use ggml_backend_buffer_is_host (#4205)

0235b9b

CUDA: fix tensor core logic for Pascal and HIP (#4682)

a20f3c7

CUDA: fixed tensor cores not being used on RDNA3 (#4697)

39d8bc7

clip : refactor + bug fixes (#4696)

9fbda71

* clip : refactor + bug fixes ggml-ci * server : add log message

pull bot merged commit 9fbda71 into teleprint-me:master Dec 30, 2023
41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #8

[pull] master from ggerganov:master #8

pull bot commented Dec 28, 2023 •

edited

Loading

[pull] master from ggerganov:master #8

[pull] master from ggerganov:master #8

Conversation

pull bot commented Dec 28, 2023 • edited Loading

pull bot commented Dec 28, 2023 •

edited

Loading