Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OLMoE Q4_0 quant does not work #11862

Open
l3utterfly opened this issue Feb 14, 2025 · 1 comment
Open

OLMoE Q4_0 quant does not work #11862

l3utterfly opened this issue Feb 14, 2025 · 1 comment

Comments

@l3utterfly
Copy link
Contributor

l3utterfly commented Feb 14, 2025

Name and Version

commit hash: a4f011e

Operating systems

Other? (Please let us know in description)

GGML backends

CPU

Hardware

Snapdragon 8 Gen 2

Models

Model is here: https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct-GGUF/tree/main

Problem description & steps to reproduce

It is failing with the following error for arrch64:

llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4013: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed

Model is here: https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct-GGUF/tree/main

Do you know why this error happens? Does the model need to be re-quanted?

First Bad Commit

No response

Relevant log output

llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4013: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
@ggerganov
Copy link
Member

Does it work with cmake -DGGML_CPU_AARCH64=OFF ...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants