You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There should be a more "out of the box" transition between the embeddings generated from the EmbeddingsBuilder's build() method and adding these to a vector store.
Motivation
Improve the developer experience with working with embeddings and vector stores. Currently, it looks like this:
let embeddings = EmbeddingsBuilder::new(model.clone()).documents(vec![FakeDefinition{
id:"doc0".to_string(),
word:"flurbo".to_string(),
definitions: vec!["A green alien that lives on cold planets.".to_string(),]}])?
.build().await?;let index = InMemoryVectorStore::default().add_documents(
embeddings
.into_iter().map(|(fake_definition, embedding_vec)| {(fake_definition.id.clone(), fake_definition, embedding_vec)}).collect(),)?
.index(model);
As you can see, the user needs to do more manipulation on the embeddings before adding them to the in memory vector store.
This applies less to mongodb and lancedb vector stores because we do not have full control over what goes in the vector store but maybe something can be done.
Proposal
In memory store - going from embeddings builder build() to add_documents should be out of the box.
LanceDb / MongoDb - maybe provide a default mapping between embeddings builder build() and the type that the store expects (ie. Document for mongodb, RecordBatch for lancedb).
The text was updated successfully, but these errors were encountered:
I think this can be done on a case-by-case basis for the different vector store (I don't think there is a one-size-fits-all solution here). For instance, the InMemoryVectorStore could have the following helper method:
which would allow the developer to easily populate an InMemoryVectorStore with the result of EmbeddingsBuilder::build.
But for other vector stores, this will highly depend on the complexity of T (e.g.: is it a flat struct that can easily be converted to a single table? Or is it a nested struct?), as well as how the user is integrating Rig in their wider application (e.g.: do they want the embeddings and documents to be in the same collection/table? Or do they want to separate them and link them with a foreign key?).
Feature Request
There should be a more "out of the box" transition between the embeddings generated from the
EmbeddingsBuilder
'sbuild()
method and adding these to a vector store.Motivation
Improve the developer experience with working with embeddings and vector stores. Currently, it looks like this:
As you can see, the user needs to do more manipulation on the
embeddings
before adding them to the in memory vector store.This applies less to mongodb and lancedb vector stores because we do not have full control over what goes in the vector store but maybe something can be done.
Proposal
build()
toadd_documents
should be out of the box.build()
and the type that the store expects (ie.Document
for mongodb,RecordBatch
for lancedb).The text was updated successfully, but these errors were encountered: