-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't run 3B models #4177
Comments
That's the reason why i removed my akins model that i already converted. These should work with current Llama.cpp versions. |
Personally, i think https://huggingface.co/maddes8cht/NousResearch-Nous-Capybara-3B-V1.9-gguf seems to be one of the best stableLMs out there, but some others are rather good to in some fields. |
Offloading up to 34 layers should already work. The linked PR solves the issue for all stablelm models. |
Actually I think the akins model didn't work for me too, but I'm currently not sure about the error it caused and cannot test it right now. |
I found that the models
This Error does have nothing to do with the 34 Layers offloading limit, it occurs with You can find these not working models in this collection: A converted |
It may be better to open a different issue about that. I tested TheBloke's akins and rocket GGUFs and they work. |
It does indeed work with 34 layers but why? the cpu is doing some of the work that way, which is not ideal |
okay |
I converted the Rocket 3B yesterday and still can't offload the last KV cache layer. I know there are some models where the necessary support for offloading all layers (especially non-repeating layers) just isn't there. For example, it's the same with Persimmon - you can't offload the last KV non-repeating layer. These models are different architectures - Persimmon is its own (as far as I know), Rocket is Bloom (at least according to the metadata). |
With #4156 it should be possible to offload everything, but it still needs to be merged. |
I reconverted the mentioned models with a current release, and now it works - currently uploading. |
I think the only change was to inference, not conversion. So it wasn't an issue with making the models, just running them which seems to be fixed now that #4156 has been merged. |
I can't seem to be able to run 3B models (no issue with 7b and 13b). I'm getting the following error:
GGML_ASSERT: I:\llama-cpp\llama.cpp\ggml-cuda.cu:6709: ne00 == n_dims && "ne00 != n_dims is not implemented for CUDA yet"
I tried rocket-3b.Q5_K_M and akins-3b.Q6_K from thebloke
Tested with this on win10:
main.exe -ngl 35 -m rocket-3b.Q5_K_M.gguf -p Hello
The text was updated successfully, but these errors were encountered: