Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Vector of Floats data type and basic vector search scalar functions #812

Merged
merged 3 commits into from
Sep 26, 2023

Conversation

eolivelli
Copy link
Contributor

Motivation:

In Generative AI Application having the ability to deal with vectors (of float) is fundamental in order to implement the RAG patten.

While HerdDB is not meant to be a Vector database, it is still true the one of the HerdDB goals is to be a small footprint embeddable database, very useful for testing and for small applications.

Contents of the patch:

  • This patch introduces end to end support for the "Float array" data type, both in the core (server side) and in the JDBC Driver
  • We are adding 3 basic scalar functions to compare vectors of floats: cosine_similarity, dot_product and euclidean_distance

In order to support the RAG pattern you only need 2 features:

  • inserting records with a vector field
  • retrieve the records ordering using some "distance" function

This patch doesn't add any special index, it is really out of the scope of HerdDB, but with this key features it is pretty easy to build a small vector database to index thousands of documents and support chat bots.

Sample table:

CREATE TABLE DOCUMENTS (
            FILENAME string
            CHUNKID int, "
            TEXT string, "
             EMBEDDINGSVECTOR floata,
             PRIMARY KEY(filename, chinkid)
)

Sample Vector Search query

SELECT text
FROM DOCUMENTS
ORDER BY cosine_similarity(embeddingsvector,cast(? as FLOAT ARRAY)) DESC
LIMIT 10

Copy link
Contributor

@aluccaroni aluccaroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@eolivelli eolivelli merged commit 5cb6d7c into diennea:master Sep 26, 2023
@eolivelli eolivelli deleted the impl/vector branch September 26, 2023 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants