[FR] Support for embedding models run through Ollama #559

wwjCMP · 2024-04-22T14:44:23Z

Through Ollama, there is a wide selection of embedding models available, and the operation is very efficient. Supporting Ollama's embedding models will effectively enhance the convenience of use.

brianpetro · 2024-04-22T14:53:48Z

Hi @wwjCMP and thanks for the feature request.

Any relevant documentation you can point me to would help implement this sooner.

Thanks
🌴

matttrent · 2024-04-22T20:00:07Z

+1 for this. I'd love to make use of it as well.

I've successfully configured Smart Connections to work with a local Ollama server running Llama 3 for chat.

I'm not certain on how perfectly the Ollama API corresponds to the OpenAI API the choice of Custom Local model my selection expects. But it's close enough that the plugin and the LLM API are communicating correctly for chat.

The Ollama API docs are here: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings

From the looks of it, you could get embeddings from the Ollama server instance using the same request format as you're currently making with the chat client with 2 changes:

path: /api/embeddings
model: nomic-embed-text (or any other embedding model pulled from https://ollama.com/library)

And a very abridged Ollama install for my macbook:

# install
brew install ollama

# pull models
ollama pull llama3
ollama pull nomic-embed-text

# serve
OLLAMA_ORIGINS=app://obsidian.md* ollama serve

Of any of the Obsidian LLM chat + RAG plugins, I think Smart connections has the best RAG and chat responses. I just wish local embedding computation didn't interfere with using my vault at the same time. I successfully got Ollama embeddings working in Obsidian Copilot if it's useful as an example.

Thanks!

jkunczik · 2024-04-23T20:59:51Z

+1 This feature would be awesome! I have a slow laptop, but a decent GPU server, which is sitting idle most of the time.

atmassrf · 2024-04-26T15:14:49Z

@brianpetro jan.ai creates a fully OpenAI compatible server API running locally at port 1337
+1 for this feature 🥇

kennygokh · 2024-04-26T15:50:26Z

+1 for this. I'd love to make use of it as well.

I've successfully configured Smart Connections to work with a local Ollama server running Llama 3 for chat.

I'm not certain on how perfectly the Ollama API corresponds to the OpenAI API the choice of Custom Local model my selection expects. But it's close enough that the plugin and the LLM API are communicating correctly for chat.

The Ollama API docs are here: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings

From the looks of it, you could get embeddings from the Ollama server instance using the same request format as you're currently making with the chat client with 2 changes:

path: /api/embeddings

model: nomic-embed-text (or any other embedding model pulled from https://ollama.com/library)

And a very abridged Ollama install for my macbook:
# install
brew install ollama

# pull models
ollama pull llama3
ollama pull nomic-embed-text

# serve
OLLAMA_ORIGINS=app://obsidian.md* ollama serve
Of any of the Obsidian LLM chat + RAG plugins, I think Smart connections has the best RAG and chat responses. I just wish local embedding computation didn't interfere with using my vault at the same time. I successfully got Ollama embeddings working in Obsidian Copilot if it's useful as an example.

Thanks!

Hi @matttrent, I am not able to get it work as Smart Connection prompted unable to connect. My suspect is caused by the path. Is /api/chat folder default installed by ollama? I am not able to find it in my mac mini.

brianpetro · 2024-04-26T15:54:26Z

@atmassrf if you're trying to use chat models (not embeddings), you may already be able to use the "custom local model" configuration to make it work with jan.ai

If that doesn't work, for example, if there are some small differences with the OpenAI API format, then we would have to make an adapter in https://github.com/brianpetro/jsbrains/tree/main/smart-chat-model to account for those differences.

brianpetro · 2024-04-26T15:59:15Z

@kennygokh Ollama embeddings aren't currently supported, but the chat models should work.

The ollama command I use to start the model looks like:

ollama run phi

That runs the phi model. Llama3 might look like:

ollama run llama3

Assuming it's already downloaded/installed via Ollama.

For improving the embedding speed, until the custom local embedding model adapter gets shipped, you might be interested in trying this https://www.youtube.com/watch?v=tGZ6J63UZmw&t=3s

MiracleXYZ · 2024-04-27T08:25:29Z

+1 for this.

ttodosi · 2024-05-01T14:06:11Z

I think this is the feature request I am looking for. Can I use the embeddings created by Smart-Connection in Ollama? Then other plugins I use can know my notes. For some reason the embeddings created are .ajson instead of json?

Mizuna737 · 2024-05-09T16:27:54Z

+1 for this. I'd love to make use of it as well.

I've successfully configured Smart Connections to work with a local Ollama server running Llama 3 for chat.

I'm not certain on how perfectly the Ollama API corresponds to the OpenAI API the choice of Custom Local model my selection expects. But it's close enough that the plugin and the LLM API are communicating correctly for chat.

The Ollama API docs are here: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings

From the looks of it, you could get embeddings from the Ollama server instance using the same request format as you're currently making with the chat client with 2 changes:

path: /api/embeddings

model: nomic-embed-text (or any other embedding model pulled from https://ollama.com/library)

And a very abridged Ollama install for my macbook:
# install
brew install ollama

# pull models
ollama pull llama3
ollama pull nomic-embed-text

# serve
OLLAMA_ORIGINS=app://obsidian.md* ollama serve
Of any of the Obsidian LLM chat + RAG plugins, I think Smart connections has the best RAG and chat responses. I just wish local embedding computation didn't interfere with using my vault at the same time. I successfully got Ollama embeddings working in Obsidian Copilot if it's useful as an example.

Thanks!

Hey there! I'm trying to use llama3 for my chat with smart connections as well. I've got it running in the windows subsystem for linux, and I can confirm that Ollama is working correctly. The problem I'm having is that when I try to input llama3 as the model, I get an error saying "No Smart Connections", then it reverts the model to custom_local. Any ideas?

ljacho · 2024-07-04T13:05:05Z

+1 cant wait!

Moyf · 2024-07-12T18:14:57Z

+1 Also hope for it !

@brianpetro BTW, may I ask how does SC's current local embeding works?
I cannot see any "Model" files locally, did it sent my notes into the web to generate the ajson files, or some other ways? Very curious 'bout it!

brianpetro · 2024-07-13T01:50:47Z

@Moyf by default a local model is used via transformers.js, which caches the model somewhere in a browser cache 🌴

Moyf · 2024-07-14T05:35:00Z

@Moyf by default a local model is used via transformers.js, which caches the model somewhere in a browser cache 🌴

I see, thank you! ☀

lonelygo · 2024-12-09T06:54:30Z

@Moyf by default a local model is used via transformers.js, which caches the model somewhere in a browser cache 🌴

Quoted from the transformer.js document:"By default, when running in the browser, the model will be run on your CPU (via WASM). If you would like to run the model on your GPU (via WebGPU), you can do this by setting device: 'webgpu', for example:"

Is webgpu turned on by default? Judging from the speed of execution, the calculation does not seem to use hardware acceleration.

brianpetro added the enhancement New feature or request label Apr 22, 2024

brianpetro mentioned this issue Apr 23, 2024

System prompt Override of Llama3 broken #560

Open

wwjCMP mentioned this issue May 6, 2024

[FR] RAG and Ollama embedding model DeabLabs/cannoli#30

Open

brianpetro mentioned this issue May 9, 2024

add custom_host setting #229

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Support for embedding models run through Ollama #559

[FR] Support for embedding models run through Ollama #559

wwjCMP commented Apr 22, 2024

brianpetro commented Apr 22, 2024

matttrent commented Apr 22, 2024 •

edited

Loading

jkunczik commented Apr 23, 2024

atmassrf commented Apr 26, 2024

kennygokh commented Apr 26, 2024

brianpetro commented Apr 26, 2024

brianpetro commented Apr 26, 2024

MiracleXYZ commented Apr 27, 2024

ttodosi commented May 1, 2024 •

edited

Loading

Mizuna737 commented May 9, 2024

ljacho commented Jul 4, 2024

Moyf commented Jul 12, 2024

brianpetro commented Jul 13, 2024 •

edited

Loading

Moyf commented Jul 14, 2024

lonelygo commented Dec 9, 2024

[FR] Support for embedding models run through Ollama #559

[FR] Support for embedding models run through Ollama #559

Comments

wwjCMP commented Apr 22, 2024

brianpetro commented Apr 22, 2024

matttrent commented Apr 22, 2024 • edited Loading

jkunczik commented Apr 23, 2024

atmassrf commented Apr 26, 2024

kennygokh commented Apr 26, 2024

brianpetro commented Apr 26, 2024

brianpetro commented Apr 26, 2024

MiracleXYZ commented Apr 27, 2024

ttodosi commented May 1, 2024 • edited Loading

Mizuna737 commented May 9, 2024

ljacho commented Jul 4, 2024

Moyf commented Jul 12, 2024

brianpetro commented Jul 13, 2024 • edited Loading

Moyf commented Jul 14, 2024

lonelygo commented Dec 9, 2024

matttrent commented Apr 22, 2024 •

edited

Loading

ttodosi commented May 1, 2024 •

edited

Loading

brianpetro commented Jul 13, 2024 •

edited

Loading