bug: models start the model imported can not work. #1439

cloudherder · 2024-10-05T12:12:00Z

Cortex version

cortex-1.0.0-rc1-windows-amd64-local-installer

Describe the Bug

Run all models by imported，return “ Model failed to load with status code: 500”

Steps to Reproduce

1.cortex-beta models import --model_id gemma-2b-Q8_0.gguf --model_path ./gemma-2b-Q8_0.gguf
It is successful, and can run modles subcommand ,such as list，get ，update, delete.
2.cortex-beta models start gemma-2b-Q8_0.gguf
It return ：
gguf_init_from_file: failed to open '': 'Invalid argument'
{"timestamp":1728130117,"level":"ERROR","function":"LoadModel","line":186,"message":"llama.cpp unable to load model","model":""}
Model failed to load with status code: 500
Error: ?

Screenshots / Logs

cortex.log
cortex-cli.log

What is your OS?

MacOS
Windows
Linux

What engine are you running?

cortex.llamacpp (default)
cortex.tensorrt-llm (Nvidia GPUs)
cortex.onnx (NPUs, DirectML)

vansangpfiev · 2024-10-05T12:40:51Z

Hi @cloudherder, for models import, the absolute path is required for --model_path for now.
We will improve this soon. Apologies for the inconvenience.

cloudherder · 2024-10-05T14:03:56Z

Hi @cloudherder, for models import, the absolute path is required for --model_path for now. We will improve this soon. Apologies for the inconvenience.

Thank you for your reply! You have created a great work! I tested it with an absolute path. The results are shown as follows:

The following error is recorded in the Cortex.log file:
20241005 13:29:56.458000 UTC 10188 ERROR ggml_backend_cuda_buffer_type_alloc_buffer: allocating 2539.93 MiB on device 0: cudaMalloc failed: out of memory

llama_engine.cc:393
20241005 13:29:56.484000 UTC 10188 ERROR llama_model_load: error loading model: unable to allocate backend buffer
llama_engine.cc:393
20241005 13:29:56.484000 UTC 10188 ERROR llama_load_model_from_file: failed to load model

The sizes of the three models tested are 2.46GB，2.48GB and 7.06GB，my laptop has 16GB of memory， and using server.exe of llama.cpp can load and use these three models normally.

vansangpfiev · 2024-10-05T14:30:04Z

@cloudherder Seems like you don't have enough VRAM. Please try to set the ngl of your model to 0 or 1
For example, with model gemma-2b-Q8_0.gguf you can check model config by running:

cortex-beta models get gemma-2b-Q8_0.gguf

Then set the ngl to 1:

cortex-beta models update --model_id gemma-2b-Q8_0.gguf --ngl 1

Run cortex-beta models get gemma-2b-Q8_0.gguf to check if config is updated
Then try to start the model.

Can you share the output of nvidia-smi command also?

cloudherder · 2024-10-05T15:02:03Z

@cloudherder Seems like you don't have enough VRAM. Please try to set the ngl of your model to 0 or 1 For example, with model gemma-2b-Q8_0.gguf you can check model config by running:
cortex-beta models get gemma-2b-Q8_0.gguf
Then set the ngl to 1:
cortex-beta models update --model_id gemma-2b-Q8_0.gguf --ngl 1
Run cortex-beta models get gemma-2b-Q8_0.gguf to check if config is updated Then try to start the model.

Can you share the output of nvidia-smi command also?

Thanks for your help! this is my test:

the output of nvidia-smi.exe

vansangpfiev · 2024-10-17T02:08:18Z

Hi @cloudherder, apologies for late response. Can you please set ngl = 0 and try again?
Would you mind sharing the logs when you run with ngl = 1?

gabrielle-ong · 2024-10-25T02:29:28Z

Hi @cloudherder, we've released cortex v1.0.1 (release note)
We'll love if you can give cortex another go, with the many models you've downloaded.

To update to cortex v1.0.1 (or download here: https://cortex.so/)

> cortex update
> cortex update --server

gabrielle-ong · 2024-11-28T07:49:35Z

@cloudherder - closing this stale issue. We've released Cortex 1.0.3 with bugfixes an a much improved UX.
We're also working on recommending models based on your VRAM to release in 2 sprints.
#1108

cloudherder added the type: bug Something isn't working label Oct 5, 2024

gabrielle-ong added this to Menlo Oct 15, 2024

github-project-automation bot moved this to Investigating in Menlo Oct 15, 2024

gabrielle-ong self-assigned this Oct 16, 2024

gabrielle-ong added os: Linux category: model running Inference ux, handling context/parameters, runtime needs info Needs more logs, steps to help reproduce os: Windows engine: llama.cpp labels Oct 17, 2024

gabrielle-ong mentioned this issue Oct 17, 2024

planning: Bubble up error messages from logs #1514

Closed

gabrielle-ong moved this from Investigating to Icebox in Menlo Oct 25, 2024

gabrielle-ong closed this as completed Nov 28, 2024

github-project-automation bot moved this from Icebox to Review + QA in Menlo Nov 28, 2024

gabrielle-ong moved this from Review + QA to Completed in Menlo Nov 28, 2024

gabrielle-ong changed the title ~~bug: [DESCRIPTION]models start the model imported can not work.~~ bug: models start the model imported can not work. Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: models start the model imported can not work. #1439

bug: models start the model imported can not work. #1439

cloudherder commented Oct 5, 2024

vansangpfiev commented Oct 5, 2024

cloudherder commented Oct 5, 2024

vansangpfiev commented Oct 5, 2024

cloudherder commented Oct 5, 2024

vansangpfiev commented Oct 17, 2024

gabrielle-ong commented Oct 25, 2024

gabrielle-ong commented Nov 28, 2024

bug: models start the model imported can not work. #1439

bug: models start the model imported can not work. #1439

Comments

cloudherder commented Oct 5, 2024

Cortex version

Describe the Bug

Steps to Reproduce

Screenshots / Logs

What is your OS?

What engine are you running?

vansangpfiev commented Oct 5, 2024

cloudherder commented Oct 5, 2024

vansangpfiev commented Oct 5, 2024

cloudherder commented Oct 5, 2024

vansangpfiev commented Oct 17, 2024

gabrielle-ong commented Oct 25, 2024

gabrielle-ong commented Nov 28, 2024