[Question] CUDA use in LLamaSharp #545

vvdb-architecture · 2024-01-19T09:13:09Z

vvdb-architecture
Jan 19, 2024

Context / Scenario

I'm using Kernel-memory with LLamaSharp. Despite having a RTX 3080 and the latest CUDA drivers installed, CUDA is not used.

Question

Not sure if this is a bug or I'm missing something, so here's a question instead:

The LlamaSharp.csproj contains

     <PackageReference Include="LLamaSharp.Backend.Cpu"/>
     <PackageReference Include="LLamaSharp.Backend.Cuda12"/>

I found out that if both Cpu and Cuda12 back-ends are referenced, only the CPU is being used even if the CUDA DLL is loaded.
If I remove the reference to LLamaSharp.Backend.Cpu, then the CUDA back-end will start to be used.

It might be a "latest version thing", I don't know. But here you are.

dluc · 2024-01-21T22:51:25Z

dluc
Jan 21, 2024
Maintainer

@vvdb-architecture I've noticed something similar but I could not repro, I would report it to the LLamaSharp project. They will probably ask for logs

0 replies

dluc · 2024-01-22T17:40:05Z

dluc
Jan 22, 2024
Maintainer

If you add a call to NativeLibraryConfig.Instance.WithLogs() you should see logs about the backend selection.

For instance, if you run the code here https://github.com/microsoft/kernel-memory/tree/llamatest the console should contain some useful information.

0 replies

vvdb-architecture · 2024-01-26T15:01:52Z

vvdb-architecture
Jan 26, 2024
Author

It seems that in LLamaSharp the CPU back-end and the Cuda back-ends can't be installed at the same time.

I would suggest the maintainers of Kernel-memory to either add a comment in the .csproj file or in the readme.md to this effect.

0 replies

dluc · 2024-01-26T20:51:27Z

dluc
Jan 26, 2024
Maintainer

Considering that the service is also packaged as a Docker image, even if we add a comment, the Docker image will have all the LLamaSharp packages, and the issue will persist. We could opt for ollama or LM Studio to support LLama models, maybe removing LLamaSharp.

0 replies

martindevans · 2024-03-14T15:39:25Z

martindevans
Mar 14, 2024

It seems that in LLamaSharp SciSharp/LLamaSharp#189 (comment).

It's intended that they should be installable at the same time now. If there are multiple installed LLamaSharp is doing runtime feature detection to try and detect which backend is best to use. There seems to be a bug in that right now though :(

0 replies

dluc · 2024-03-14T17:57:02Z

dluc
Mar 14, 2024
Maintainer

The runtime detection was available last year too but it never worked in my tests, with runtime always using CPU. Might be about the way assemblies are loaded and persist in memory, just guessing.

0 replies

dluc · 2024-09-05T05:48:27Z

dluc
Sep 5, 2024
Maintainer

Update: KM v0.72 now includes an Ollama connector, making it extremely easier to work with local models.

Example here: https://github.com/microsoft/kernel-memory/blob/main/examples/212-dotnet-ollama/Program.cs

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] CUDA use in LLamaSharp #545

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Question] CUDA use in LLamaSharp #545

vvdb-architecture Jan 19, 2024

Context / Scenario

Question

Replies: 7 comments

dluc Jan 21, 2024 Maintainer

dluc Jan 22, 2024 Maintainer

vvdb-architecture Jan 26, 2024 Author

dluc Jan 26, 2024 Maintainer

martindevans Mar 14, 2024

dluc Mar 14, 2024 Maintainer

dluc Sep 5, 2024 Maintainer

vvdb-architecture
Jan 19, 2024

dluc
Jan 21, 2024
Maintainer

dluc
Jan 22, 2024
Maintainer

vvdb-architecture
Jan 26, 2024
Author

dluc
Jan 26, 2024
Maintainer

martindevans
Mar 14, 2024

dluc
Mar 14, 2024
Maintainer

dluc
Sep 5, 2024
Maintainer