-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Support for embedding models run through Ollama #559
Comments
Hi @wwjCMP and thanks for the feature request. Any relevant documentation you can point me to would help implement this sooner. Thanks |
+1 for this. I'd love to make use of it as well. I've successfully configured Smart Connections to work with a local Ollama server running Llama 3 for chat. I'm not certain on how perfectly the Ollama API corresponds to the OpenAI API the choice of Custom Local model my selection expects. But it's close enough that the plugin and the LLM API are communicating correctly for chat. The Ollama API docs are here: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings From the looks of it, you could get embeddings from the Ollama server instance using the same request format as you're currently making with the chat client with 2 changes:
And a very abridged Ollama install for my macbook: # install
brew install ollama
# pull models
ollama pull llama3
ollama pull nomic-embed-text
# serve
OLLAMA_ORIGINS=app://obsidian.md* ollama serve Of any of the Obsidian LLM chat + RAG plugins, I think Smart connections has the best RAG and chat responses. I just wish local embedding computation didn't interfere with using my vault at the same time. I successfully got Ollama embeddings working in Obsidian Copilot if it's useful as an example. Thanks! |
+1 This feature would be awesome! I have a slow laptop, but a decent GPU server, which is sitting idle most of the time. |
@brianpetro jan.ai creates a fully OpenAI compatible server API running locally at port 1337 |
Hi @matttrent, I am not able to get it work as Smart Connection prompted unable to connect. My suspect is caused by the path. Is /api/chat folder default installed by ollama? I am not able to find it in my mac mini. |
@atmassrf if you're trying to use chat models (not embeddings), you may already be able to use the "custom local model" configuration to make it work with jan.ai If that doesn't work, for example, if there are some small differences with the OpenAI API format, then we would have to make an adapter in https://github.com/brianpetro/jsbrains/tree/main/smart-chat-model to account for those differences. |
@kennygokh Ollama embeddings aren't currently supported, but the chat models should work. The ollama command I use to start the model looks like: ollama run phi That runs the phi model. Llama3 might look like: ollama run llama3 Assuming it's already downloaded/installed via Ollama. For improving the embedding speed, until the custom local embedding model adapter gets shipped, you might be interested in trying this https://www.youtube.com/watch?v=tGZ6J63UZmw&t=3s |
+1 for this. |
Hey there! I'm trying to use llama3 for my chat with smart connections as well. I've got it running in the windows subsystem for linux, and I can confirm that Ollama is working correctly. The problem I'm having is that when I try to input llama3 as the model, I get an error saying "No Smart Connections", then it reverts the model to custom_local. Any ideas? |
+1 cant wait! |
+1 Also hope for it ! @brianpetro BTW, may I ask how does SC's current local embeding works? |
@Moyf by default a local model is used via transformers.js, which caches the model somewhere in a browser cache 🌴 |
I see, thank you! ☀ |
Quoted from the transformer.js document:"By default, when running in the browser, the model will be run on your CPU (via WASM). If you would like to run the model on your GPU (via WebGPU), you can do this by setting device: 'webgpu', for example:" Is webgpu turned on by default? Judging from the speed of execution, the calculation does not seem to use hardware acceleration. |
Through Ollama, there is a wide selection of embedding models available, and the operation is very efficient. Supporting Ollama's embedding models will effectively enhance the convenience of use.
The text was updated successfully, but these errors were encountered: