refactor: Generalize and simplify vector store interface #34

cvauclair · 2024-09-24T21:55:40Z

Feature Request

Generalize and simplify the vector store interface (and the integration of third party vector stores).

Motivation

Rig's current approach to vector stores and vector search is lacking in multiple ways:

The interface forces developers to use the DocumentEmbeddings type, which is somewhat opinionated and a little over-engineered.
The interface doesn't lend itself well to use cases where a developer already has a populated vector store since the interface expects the vector store to be modeled after DocumentEmbeddings.
The process of integrating new vector stores is convoluted for non-document databases (e.g.: Postgres, LanceDB) since DocumentEmbeddings was designed for document vector stores.
The interface assumes that user's would use Rig constructs (e.g.: DocumentEmbeddings) to populate their vector store.

Proposal

Remove the VectorStore trait and simplify the VectorStoreIndex trait to the following methods only:

pub trait VectorStoreIndex: Send + Sync {
    /// Get the top n documents based on the distance to the given embedding.
    /// The documents are deserialized into the given type.
    async fn top_n_from_query<T: for<'a> Deserialize<'a>>(
        &self,
        query: &str,
        n: usize,
    ) -> Result<Vec<(f64, T)>, VectorStoreError>;

    /// Same as `top_n_from_query` but returns the document ids only.
    async fn top_n_ids_from_query(
        &self,
        query: &str,
        n: usize,
    ) -> Result<Vec<(f64, String)>, VectorStoreError>;

    /// Get the top n documents based on the distance to the given embedding.
    /// The documents are deserialized into the given type.
    async fn top_n_from_embedding<T: for<'a> Deserialize<'a>>(
        &self,
        embedding: &Embedding,
        n: usize,
    ) -> Result<Vec<(f64, T)>, VectorStoreError>;

    /// Same as `top_n_from_embedding` but returns the document ids only.
    async fn top_n_ids_from_embedding(
        &self,
        embedding: &Embedding,
        n: usize,
    ) -> Result<Vec<(f64, String)>, VectorStoreError>;
}

Remove the DocumentEmbeddings type entirely.
Update the Agent type accordingly (we could enforce that the type T which is stored in the vector store also implements ToString so that the we can easily insert the dynamic context in the agent's prompt)
Update the EmbeddingsBuilder type accordingly
Update the existing vector stores integration

Alternatives

Open to alternatives

Implementation Checklist

The text was updated successfully, but these errors were encountered:

cvauclair · 2024-09-24T21:55:58Z

@0xMochan @ThierryBleau what do you guys thinks of this?

mateobelanger · 2024-11-29T19:06:31Z

@cvauclair This should close since #52 was merged right? Same for sub-issue #40

cvauclair added the feat label Sep 24, 2024

marieaurore123 self-assigned this Sep 25, 2024

mateobelanger added refactor and removed feat labels Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Generalize and simplify vector store interface #34

refactor: Generalize and simplify vector store interface #34

cvauclair commented Sep 24, 2024 •

edited

Loading

cvauclair commented Sep 24, 2024

mateobelanger commented Nov 29, 2024 •

edited

Loading

refactor: Generalize and simplify vector store interface #34

refactor: Generalize and simplify vector store interface #34

Comments

cvauclair commented Sep 24, 2024 • edited Loading

Feature Request

Motivation

Proposal

Alternatives

Implementation Checklist

cvauclair commented Sep 24, 2024

mateobelanger commented Nov 29, 2024 • edited Loading

cvauclair commented Sep 24, 2024 •

edited

Loading

mateobelanger commented Nov 29, 2024 •

edited

Loading