Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Support for embedding models run through Ollama #559

Open
wwjCMP opened this issue Apr 22, 2024 · 15 comments
Open

[FR] Support for embedding models run through Ollama #559

wwjCMP opened this issue Apr 22, 2024 · 15 comments
Labels
enhancement New feature or request

Comments

@wwjCMP
Copy link

wwjCMP commented Apr 22, 2024

Through Ollama, there is a wide selection of embedding models available, and the operation is very efficient. Supporting Ollama's embedding models will effectively enhance the convenience of use.

@brianpetro brianpetro added the enhancement New feature or request label Apr 22, 2024
@brianpetro
Copy link
Owner

Hi @wwjCMP and thanks for the feature request.

Any relevant documentation you can point me to would help implement this sooner.

Thanks
🌴

@matttrent
Copy link

matttrent commented Apr 22, 2024

+1 for this. I'd love to make use of it as well.

I've successfully configured Smart Connections to work with a local Ollama server running Llama 3 for chat.

image

I'm not certain on how perfectly the Ollama API corresponds to the OpenAI API the choice of Custom Local model my selection expects. But it's close enough that the plugin and the LLM API are communicating correctly for chat.

The Ollama API docs are here: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings

From the looks of it, you could get embeddings from the Ollama server instance using the same request format as you're currently making with the chat client with 2 changes:

And a very abridged Ollama install for my macbook:

# install
brew install ollama

# pull models
ollama pull llama3
ollama pull nomic-embed-text

# serve
OLLAMA_ORIGINS=app://obsidian.md* ollama serve

Of any of the Obsidian LLM chat + RAG plugins, I think Smart connections has the best RAG and chat responses. I just wish local embedding computation didn't interfere with using my vault at the same time. I successfully got Ollama embeddings working in Obsidian Copilot if it's useful as an example.

Thanks!

@jkunczik
Copy link

+1 This feature would be awesome! I have a slow laptop, but a decent GPU server, which is sitting idle most of the time.

@atmassrf
Copy link

@brianpetro jan.ai creates a fully OpenAI compatible server API running locally at port 1337
+1 for this feature 🥇

@kennygokh
Copy link

+1 for this. I'd love to make use of it as well.

I've successfully configured Smart Connections to work with a local Ollama server running Llama 3 for chat.

image

I'm not certain on how perfectly the Ollama API corresponds to the OpenAI API the choice of Custom Local model my selection expects. But it's close enough that the plugin and the LLM API are communicating correctly for chat.

The Ollama API docs are here: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings

From the looks of it, you could get embeddings from the Ollama server instance using the same request format as you're currently making with the chat client with 2 changes:

And a very abridged Ollama install for my macbook:

# install
brew install ollama

# pull models
ollama pull llama3
ollama pull nomic-embed-text

# serve
OLLAMA_ORIGINS=app://obsidian.md* ollama serve

Of any of the Obsidian LLM chat + RAG plugins, I think Smart connections has the best RAG and chat responses. I just wish local embedding computation didn't interfere with using my vault at the same time. I successfully got Ollama embeddings working in Obsidian Copilot if it's useful as an example.

Thanks!

Hi @matttrent, I am not able to get it work as Smart Connection prompted unable to connect. My suspect is caused by the path. Is /api/chat folder default installed by ollama? I am not able to find it in my mac mini.

@brianpetro
Copy link
Owner

@atmassrf if you're trying to use chat models (not embeddings), you may already be able to use the "custom local model" configuration to make it work with jan.ai

If that doesn't work, for example, if there are some small differences with the OpenAI API format, then we would have to make an adapter in https://github.com/brianpetro/jsbrains/tree/main/smart-chat-model to account for those differences.

@brianpetro
Copy link
Owner

@kennygokh Ollama embeddings aren't currently supported, but the chat models should work.

The ollama command I use to start the model looks like:

ollama run phi

That runs the phi model. Llama3 might look like:

ollama run llama3

Assuming it's already downloaded/installed via Ollama.

For improving the embedding speed, until the custom local embedding model adapter gets shipped, you might be interested in trying this https://www.youtube.com/watch?v=tGZ6J63UZmw&t=3s

@MiracleXYZ
Copy link

+1 for this.

@ttodosi
Copy link

ttodosi commented May 1, 2024

I think this is the feature request I am looking for. Can I use the embeddings created by Smart-Connection in Ollama? Then other plugins I use can know my notes. For some reason the embeddings created are .ajson instead of json?

image

@Mizuna737
Copy link

+1 for this. I'd love to make use of it as well.

I've successfully configured Smart Connections to work with a local Ollama server running Llama 3 for chat.

image

I'm not certain on how perfectly the Ollama API corresponds to the OpenAI API the choice of Custom Local model my selection expects. But it's close enough that the plugin and the LLM API are communicating correctly for chat.

The Ollama API docs are here: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings

From the looks of it, you could get embeddings from the Ollama server instance using the same request format as you're currently making with the chat client with 2 changes:

And a very abridged Ollama install for my macbook:

# install
brew install ollama

# pull models
ollama pull llama3
ollama pull nomic-embed-text

# serve
OLLAMA_ORIGINS=app://obsidian.md* ollama serve

Of any of the Obsidian LLM chat + RAG plugins, I think Smart connections has the best RAG and chat responses. I just wish local embedding computation didn't interfere with using my vault at the same time. I successfully got Ollama embeddings working in Obsidian Copilot if it's useful as an example.

Thanks!

Hey there! I'm trying to use llama3 for my chat with smart connections as well. I've got it running in the windows subsystem for linux, and I can confirm that Ollama is working correctly. The problem I'm having is that when I try to input llama3 as the model, I get an error saying "No Smart Connections", then it reverts the model to custom_local. Any ideas?

@ljacho
Copy link

ljacho commented Jul 4, 2024

+1 cant wait!

@Moyf
Copy link

Moyf commented Jul 12, 2024

+1 Also hope for it !


@brianpetro BTW, may I ask how does SC's current local embeding works?
I cannot see any "Model" files locally, did it sent my notes into the web to generate the ajson files, or some other ways? Very curious 'bout it!

@brianpetro
Copy link
Owner

brianpetro commented Jul 13, 2024

@Moyf by default a local model is used via transformers.js, which caches the model somewhere in a browser cache 🌴

@Moyf
Copy link

Moyf commented Jul 14, 2024

@Moyf by default a local model is used via transformers.js, which caches the model somewhere in a browser cache 🌴

I see, thank you! ☀

@lonelygo
Copy link

lonelygo commented Dec 9, 2024

@Moyf by default a local model is used via transformers.js, which caches the model somewhere in a browser cache 🌴

Quoted from the transformer.js document:"By default, when running in the browser, the model will be run on your CPU (via WASM). If you would like to run the model on your GPU (via WebGPU), you can do this by setting device: 'webgpu', for example:"

Is webgpu turned on by default? Judging from the speed of execution, the calculation does not seem to use hardware acceleration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests