-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smart Collections FR: Pinecone Adapter #4
Comments
It's possible. Integrating Pinecone would require:
I would consider doing this mainly because of performance, but calculating cosine similarity on my vault containing ~1,500 notes runs pretty smoothly at the moment. Is the performance why you are asking about this? Or is there another reason? Thanks! |
My vault is about 20k notes. Part of it is performance. The other is being able to reuse the embeddings for other things rather than paying for the process multiple times. |
20K is significantly more notes than I have tested with myself. Your Regarding reusing the embeddings, the main issue with that is synchronization—the
I feel option 2 goes against the Obsidian.md ethos of "owning your data" since all your notes would be hosted in the cloud. Option 1 has its drawbacks, too. "Secondary" applications outside of Obsidian would be more difficult to develop. However, other Obsidian plugins (i.e., Smart Completions) will have no problem reusing the embeddings stored within Obsidian. So it depends on your use case. What is the average number of notes in an Obsidian vault? If it's much more than what I've anticipated (<1000 notes), then I think option 1 could make sense for performance reasons. That said, performance has been an afterthought at this point. There is still likely a lot of low-hanging fruit in terms of performance that wouldn't require an additional API service provider. I'm thinking out loud here, so any feedback would be appreciated. Thanks! |
Yes, the file is... unwieldy lol. I don't know too much about the specifics of the different options. I know there's also something like Weaviate? Not sure if that's better. It is open source right? Just checked and my largest note is ~4 million characters. And plenty of others over 10k. As far as the average number of notes? I have no idea. I'm probably on the larger end, not the largest I've heard. I'm sure there are plenty over 1,000 notes. |
Thanks for suggesting Weaviate. It's pretty comparable to Pinecone. Hosting your own instance looks non-trivial and may not be easily packed into the plugin. I'll have to look into it more before saying it for sure. It needs further research, but there should be a relatively simple solution to manage the vector calculations better. The storage file can be separated based on a cosine similarity clustering algorithm. Then the calculations could be prioritized based on the nearest cluster. I'm surprised I haven't seen anything like this, but I haven't looked much. I'll continue to look into this. Thanks for the feedback. |
I second this. Being able to pull embedding from pinecone would allow for potentialy leveraging purpose-made embedding tools capable of taking in a large variety of files for example (powerpoints/pdfs for example). This could in turn unlock better query responses while also keeping the base embedding repository across all tools leveraging personal data unique! |
@vguillet I see you already commented on brianpetro/obsidian-smart-connections#27 , thanks! It's a similar idea. I still think a pinecone/weaviate integration will happen. But I need to learn more about how people are using them. |
Recorded response https://youtu.be/J5ARc_91fzs |
Would it be possible to have the option to store the embeddings in Pinecone?
The text was updated successfully, but these errors were encountered: