Ollama Support #14

Otts86 · 2024-07-12T03:02:42Z

Otts86
Jul 12, 2024

I already have a local instance of Ollama that I'm using for other AI applications. Can I point LARS at that as opposed to installing new models on the host machine?

murtaza-nasir · 2024-08-02T22:18:29Z

murtaza-nasir
Aug 2, 2024

I would love something like this that could work with an openai compatible endpoint. I serve models locally using vllm or aphrodite. Llama.cpp is too slow and doesn't support concurrency.

0 replies

abgulati · 2024-08-05T23:19:48Z

abgulati
Aug 5, 2024
Maintainer

I'm presently engaged with adding support for a new self-developed backend, HF-Waitress

This new backend adds support for HF-Transformer & AWQ-quantized models directly off the hub, while providing on-the-fly quantization via BitsAndBytes, HQQ and Quanto.. It also negates the need to manually download LLMs yourself, simply working off the model name to do the rest. It works OOB with no setup necessary, and provides concurrency and streaming responses all within a single platform-agnostic Python script that can be ported anywhere.

It will soon be the default LLM-loader in LARS! As Ollama is another implementation of llama.cpp, explicit support for it is not planned at this time though I recognize the benefits.

llama.cpp will be retained in LARS as a user-electable alternative to HF-Waitress for GGUF models, primarily due to their advantage of hybrid-inferencing. You'll be able to bring in your own GGUFs same as today.

OpenAI is not planned at this time as LARS remains open-source, local-deployment centric. However, code to make OpenAI work is already in the LARS codebase so if an official engagement necessitates it, I will work on enabling it.

In the meanwhile, community-contributions are absolutely welcome as always for these features!

2 replies

murtaza-nasir Aug 5, 2024

Thank you for the amazing work! However, I wasn't talking about openAI support. I was talking about openAI compatible API support, so users can load a model like Llama or Qwen in vLLM and then point LARS to their own vLLM or any other endpoint. The openAI API is the de-facto standard in enabling applications to work with all types of model hosting methods, locally or in the cloud.

abgulati Aug 6, 2024
Maintainer

Thanks so much, I truly appreciate the kind words! Apologies for misunderstanding earlier, that definitely sounds interesting and worth exploring, I'll be sure to look into vllm 🍻

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama Support #14

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Ollama Support #14

Otts86 Jul 12, 2024

Replies: 2 comments · 2 replies

murtaza-nasir Aug 2, 2024

abgulati Aug 5, 2024 Maintainer

murtaza-nasir Aug 5, 2024

abgulati Aug 6, 2024 Maintainer

Otts86
Jul 12, 2024

Replies: 2 comments 2 replies

murtaza-nasir
Aug 2, 2024

abgulati
Aug 5, 2024
Maintainer

abgulati Aug 6, 2024
Maintainer