Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed #1043

lhwong · 2024-10-13T16:54:09Z

I got the following error when running model Imported from GGUF which is generated from the model fine-tuned with LoRA.

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

The following are commands used

mlx_lm.lora --train --model meta-llama/Llama-3.2-1B --data ~/Projects/AI/data --iters 1000

mlx_lm.generate --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --prompt "What is biomolecule?"

mlx_lm.fuse --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --export-gguf

Create Modelfile
FROM ./fused_model/ggml-model-f16.gguf

ollama create example -f Modelfile

ollama run example

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed
/Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed
/Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

The text was updated successfully, but these errors were encountered:

awni · 2024-10-14T17:55:21Z

I would file an issue with the https://github.com/ollama/ollama folks. It's not clear to me this is an issue with MLX..

lhwong · 2024-10-15T00:06:31Z

@awni Could it be due to GGUF exported by mlx_lm is F16 and the comman I used to create the model (ollama create example -f Modelfile) is wrong or certain setting is required?

"Export the fused model to GGUF. Note GGUF support is limited to Mistral, Mixtral, and Llama style models in fp16 precision." Reference: https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md

hansvdam · 2024-10-16T16:45:29Z

I have the same. Making a guff after the fuse with llama.cpp does work when running it in ollama:
https://github.com/ggerganov/llama.cpp

python convert_hf_to_gguf.py <path_to>/fused_model --outfile output_file.gguf

then in the ollama MODELFILE, put (with the parameters and template):
FROM output_file.gguf

hschaeufler · 2024-10-28T18:17:56Z

I got the following error when running model Imported from GGUF which is generated from the model fine-tuned with LoRA.

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

The following are commands used

mlx_lm.lora --train --model meta-llama/Llama-3.2-1B --data ~/Projects/AI/data --iters 1000

mlx_lm.generate --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --prompt "What is biomolecule?"

mlx_lm.fuse --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --export-gguf

Create Modelfile FROM ./fused_model/ggml-model-f16.gguf

ollama create example -f Modelfile

ollama run example

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

@lhwong @hansvdam
You can fuse the model without gguf-Export in import it in ollama. It currently has only a problem in ollama with the format, which is why you have to downgrade the transformer library first.
See also: ollama/ollama#7167 (comment)

pipenv install transformers==4.44.2 or pip install transformers==4.44.2 (depending on your package manager)

Fuse the model without gguf

mlx_lm.fuse --model "meta-llama/Meta-Llama-3.1-8B-Instruct" \
    --adapter-path "results/llama3_1_8B_instruct_lora/tuning_11/adapters" \
    --save-path "results/llama3_1_8B_instruct_lora/tuning_11/lora_fused_model/"

Model-File

FROM "/Volumes/Extreme SSD/dartgen/results/llama3_1_8B_instruct_lora/tuning_11/lora_fused_model"

PARAMETER temperature 0.6
PARAMETER top_p	 0.9

And import it:
ollama create hschaeufler/dartgen-llama-3.1:8b-instruct-bf16-v11 -f Modelfile

hschaeufler mentioned this issue Oct 29, 2024

Fine-tuned Llama 3.2 1B safe_serialized: Error: json: cannot unmarshal array into Go struct field .model.merges of type string ollama/ollama#7167

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed #1043

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed #1043

lhwong commented Oct 13, 2024 •

edited

Loading

awni commented Oct 14, 2024

lhwong commented Oct 15, 2024

hansvdam commented Oct 16, 2024

hschaeufler commented Oct 28, 2024 •

edited

Loading

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed #1043

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed #1043

Comments

lhwong commented Oct 13, 2024 • edited Loading

awni commented Oct 14, 2024

lhwong commented Oct 15, 2024

hansvdam commented Oct 16, 2024

hschaeufler commented Oct 28, 2024 • edited Loading

lhwong commented Oct 13, 2024 •

edited

Loading

hschaeufler commented Oct 28, 2024 •

edited

Loading