OLMoE Q4_0 quant does not work #11862

l3utterfly · 2025-02-14T08:11:28Z

Name and Version

commit hash: a4f011e

Operating systems

Other? (Please let us know in description)

GGML backends

CPU

Hardware

Snapdragon 8 Gen 2

Models

Model is here: https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct-GGUF/tree/main

Problem description & steps to reproduce

It is failing with the following error for arrch64:

llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4013: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed

Model is here: https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct-GGUF/tree/main

Do you know why this error happens? Does the model need to be re-quanted?

First Bad Commit

No response

Relevant log output

llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4013: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed

The text was updated successfully, but these errors were encountered:

ggerganov · 2025-02-14T09:28:32Z

Does it work with cmake -DGGML_CPU_AARCH64=OFF ...?

l3utterfly added the bug-unconfirmed label Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OLMoE Q4_0 quant does not work #11862

OLMoE Q4_0 quant does not work #11862

l3utterfly commented Feb 14, 2025 •

edited

Loading

ggerganov commented Feb 14, 2025

OLMoE Q4_0 quant does not work #11862

OLMoE Q4_0 quant does not work #11862

Comments

l3utterfly commented Feb 14, 2025 • edited Loading

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

ggerganov commented Feb 14, 2025

l3utterfly commented Feb 14, 2025 •

edited

Loading