fix(llama-cpp): consistently select fallback #3789

mudler · 2024-10-11T10:19:38Z

Description

We didn't took in consideration the case where the host has the CPU flagset, but the binaries were not actually present in the asset dir. This problem was extended also to GPU detection, as it was relying on the fallback backend to be present and be compiled with GPU support.

This made possible for instance for models that specified the llama-cpp backend directly in the config to not eventually pick-up the fallback binary in case the optimized binaries were not present.

To reproduce: have a model specifying the backend "llama-cpp" manually, and have only the fallback binaries in the build assets ( neither AVX specific or GPU ones )

Notes for Reviewers

It does some refactoring around picking up the correct binary, and reduces complexity of the code.

Should fix #3727 and also fix #3673

Signed commits

Yes, I signed my commits.

netlify · 2024-10-11T10:20:02Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`8055c87`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/6709256ed990630008369115
😎 Deploy Preview	https://deploy-preview-3789--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

We didn't took in consideration the case where the host has the CPU flagset, but the binaries were not actually present in the asset dir. This made possible for instance for models that specified the llama-cpp backend directly in the config to not eventually pick-up the fallback binary in case the optimized binaries were not present. Signed-off-by: Ettore Di Giacinto <[email protected]>

Signed-off-by: Ettore Di Giacinto <[email protected]>

mudler added the bug Something isn't working label Oct 11, 2024

mudler force-pushed the fix/llama-cpp-fallback branch from ead515c to c47a451 Compare October 11, 2024 10:23

mudler added 3 commits October 11, 2024 12:32

chore: adjust and simplify selection

2df7c2a

Signed-off-by: Ettore Di Giacinto <[email protected]>

fix: move failure recovery to BackendLoader()

2487d94

Signed-off-by: Ettore Di Giacinto <[email protected]>

comments

53cff5c

Signed-off-by: Ettore Di Giacinto <[email protected]>

mudler force-pushed the fix/llama-cpp-fallback branch from 7deb85f to 53cff5c Compare October 11, 2024 10:50

minor fixups

8055c87

Signed-off-by: Ettore Di Giacinto <[email protected]>

This was referenced Oct 11, 2024

[bug] docker image show can't find llama backend #3673

Closed

[llama-cpp] Fails: backend not found: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp #3727

Closed

mudler merged commit be6c4e6 into master Oct 11, 2024
31 checks passed

mudler deleted the fix/llama-cpp-fallback branch October 11, 2024 14:55

mudler mentioned this pull request Oct 11, 2024

fix(welcome): do not list model twice if we have a config #3790

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llama-cpp): consistently select fallback #3789

fix(llama-cpp): consistently select fallback #3789

mudler commented Oct 11, 2024 •

edited

Loading

netlify bot commented Oct 11, 2024 •

edited

Loading

fix(llama-cpp): consistently select fallback #3789

fix(llama-cpp): consistently select fallback #3789

Conversation

mudler commented Oct 11, 2024 • edited Loading

netlify bot commented Oct 11, 2024 • edited Loading

✅ Deploy Preview for localai ready!

mudler commented Oct 11, 2024 •

edited

Loading

netlify bot commented Oct 11, 2024 •

edited

Loading