-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
falcon: metal crashes with GGML_ASSERT: ggml-metal.m:932: n % 4 == 0
#3754
Labels
bug
Something isn't working
Comments
jmorganca
changed the title
metal crashes with
falcon: metal crashes with Oct 24, 2023
GGML_ASSERT: ggml-metal.m:932: n % 4 == 0
GGML_ASSERT: ggml-metal.m:932: n % 4 == 0
Update: it also seems to happen with starcoder 3b models. Same assertion being fired |
Should be fixed now - these models have 71 attention heads, didn't expect odd numbers in Metal |
@ggerganov thanks for the fast response 😊 |
mattgauf
added a commit
to mattgauf/llama.cpp
that referenced
this issue
Oct 27, 2023
* master: (350 commits) speculative : ensure draft and target model vocab matches (ggerganov#3812) llama : correctly report GGUFv3 format (ggerganov#3818) simple : fix batch handling (ggerganov#3803) cuda : improve text-generation and batched decoding performance (ggerganov#3776) server : do not release slot on image input (ggerganov#3798) batched-bench : print params at start log : disable pid in log filenames server : add parameter -tb N, --threads-batch N (ggerganov#3584) (ggerganov#3768) server : do not block system prompt update (ggerganov#3767) sync : ggml (conv ops + cuda MSVC fixes) (ggerganov#3765) cmake : add missed dependencies (ggerganov#3763) cuda : add batched cuBLAS GEMM for faster attention (ggerganov#3749) Add more tokenizer tests (ggerganov#3742) metal : handle ggml_scale for n%4 != 0 (close ggerganov#3754) Revert "make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)" issues : separate bug and enhancement template + no default title (ggerganov#3748) Update special token handling in conversion scripts for gpt2 derived tokenizers (ggerganov#3746) llama : remove token functions with `context` args in favor of `model` (ggerganov#3720) Fix baichuan convert script not detecing model (ggerganov#3739) make : add optional CUDA_NATIVE_ARCH (ggerganov#2482) ...
brittlewis12
added a commit
to brittlewis12/llmfarm_core.swift
that referenced
this issue
Nov 17, 2023
cebtenzzre
added a commit
to nomic-ai/llama.cpp
that referenced
this issue
Nov 23, 2023
cebtenzzre
added a commit
to nomic-ai/llama.cpp
that referenced
this issue
Nov 23, 2023
brittlewis12
added a commit
to brittlewis12/llmfarm_core.swift
that referenced
this issue
Nov 30, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Prerequisites
Running a newly converted + quantized GGUF version of falcon 7b instruct results in an assertion being fired:
Full logs:
The text was updated successfully, but these errors were encountered: