diff --git a/prj/docs/core/01_overview.adoc b/prj/docs/core/01_overview.adoc index db7a0cb75bce2..805e4f1d27b15 100644 --- a/prj/docs/core/01_overview.adoc +++ b/prj/docs/core/01_overview.adoc @@ -58,6 +58,13 @@ Use Java 21 virtual threads within Coherence. Create sorted client-side views of cached data. -- +[CARD] +.Vector DB +[icon=fa-database,link=docs/core/08_vector_db.adoc] +-- +Use Coherence as a Vector Database. +-- + [CARD] .Queues [icon=line_weight,link=docs/core/09_queues.adoc] diff --git a/prj/docs/core/08_vector_db.adoc b/prj/docs/core/08_vector_db.adoc new file mode 100644 index 0000000000000..7a5c06b28cb4f --- /dev/null +++ b/prj/docs/core/08_vector_db.adoc @@ -0,0 +1,283 @@ +/////////////////////////////////////////////////////////////////////////////// + Copyright (c) 2000, 2024, Oracle and/or its affiliates. + + Licensed under the Universal Permissive License v 1.0 as shown at + https://oss.oracle.com/licenses/upl. +/////////////////////////////////////////////////////////////////////////////// += Vector DB +:description: Coherence Core Improvements +:keywords: coherence, java, documentation + +// DO NOT remove this header - it might look like a duplicate of the header above, but +// both they serve a purpose, and the docs will look wrong if it is removed. +== Vector DB + +With the increased popularity of Gen AI, and Retrieval Augmented Generation (RAG) use cases in particular, the need for a way to store and search a large number of dense vector embeddings efficiently is larger than ever. + +Coherence already provides a number of features that make this possible, such as efficient serialization format, filters and aggregators, which allow users to search across large data sets in parallel by leveraging all the CPU cores in a cluster, and gRPC proxy, which allows remote clients written in any supported language, including Python, to access data in a Coherence cluster efficiently. + +This release adds the missing bits that turn Coherence into a full-fledged Vector Database: + +1. Built-in support for Vector Types + * `float32`, `int8` and `bit` dense vectors of arbitrary dimension +2. Built-in support for Semantic Search, including + * HNSW indexing + * Binary Quantization + * Index-optimized Exact Searches + * Metadata Filtering +3. Built-in support for Document Chunks, addressing a common RAG use case +4. Integration with https://www.langchain.com[LangChain] and https://docs.langchain4j.dev[LangChain4j] +5. Integration with https://docs.spring.io/spring-ai/reference/index.html[Spring AI] + +The following sections provide more details about each of the features above. + +=== Vector Types + +To support arbitrary vector types, Coherence provides `com.oracle.coherence.ai.Vector` interface, with three built-in implementations: + +1. `BitVector`, which internally uses a `java.util.Bitset` to represent each vector element using a single bit, +2. `Int8Vector`, which internally uses a `byte[]`, and +3. `Float32Vector`, which internally uses a `float[]`. + +These types allow users to add a vector property to their own classes the same way they would add any other property: by simply creating a field and accessors for it: + +[source,java] +.Book.java +---- +@PortableType(id = 2001) +public class Book + { + @Portable private String isbn; + @Portable private String title; + @Portable private String author; + @Portable private String summary; + @Portable private Vector summaryEmbedding; + + // constructors, getters and setters omitted + } +---- + +In the example above, the `summaryEmbedding` field is used to store vector representation of the `summary` field, so we can use vector similarity to search book summaries. + +In the subsequent sections, we'll discuss how the `summaryEmbedding` property can be used to define both standard and vector indexes, and to perform similarity search against them. + +=== Performing Similarity Search + +To perform similarity search against vectors stored in Coherence you can use `SimilaritySearch` aggregator. The easiest way to construct one is by using `Aggregators.similaritySearch` factory method. + +You need to specify three arguments when constructing the `SimilaritySearch` aggregator: + +1. A `ValueExtractor` that should be used to retrieve the vector attribute from the map entries +2. The search vector to compare the extracted values with, and +3. The maximum number of the results to return + +For example, to search the map containing `Book` objects, and return up to 10 most similar books, you would create `SimilaritySearch` aggregator instance like this: + +[source,java] +---- +var searchVector = createEmbedding(searchQuery); // outside of Coherence +var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 10); +---- + +By default, the aggregator will use _cosine distance_ to calculate distance between vectors, but you can change that by calling fluent `algorithm` method on the created aggregator instance and passing an instance of a different `DistanceAlgorithm` implementation: + +[source,java] +---- +var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 10) + .algorithm(new L2SquaredDistance()); +---- + +Out of the box Coherence provides `CosineDistance`, `L2SquaredDistance` and `InnerProductDistance` implementation, but you can easily add support for additional algorithms by implementing `DistanceAlgorithm` interface yourself. + +Once you have an instance of a `SimilaritySearch` aggregator, you can perform similarity search by calling `NamedMap.aggregate` method like you normally would: + +[source,java] +---- +NamedMap books = session.getMap("books"); +List> results = books.aggregate(search); +---- + +The result of the search is a list of up to maximum specified `QueryResult` objects (10, in the example above), which contain entry key, value, and calculated distance between the search vector and a vector extracted from the specified entry. The results are sorted by distance, in ascending order, from closest to farthest. + +==== Brute-force Search + +By default, if no index is defined for the vector attribute, Coherence will perform a brute-force search by deserializing every entry, extracting the vector attribute from it, and performing distance calculation between the extracted vector and the search vector using specified distance algorithm. + +This is fine for small or medium-sized data sets, because Coherence will still perform search in parallel across cluster members and aggregate the results, but can be very inefficient as the data sets get larger and larger, in which case using one of supported index types (described below) is recommended. + +However, even when using indexes, it may be beneficial to execute the same query using brute force, in order to test recall by comparing the results returned by the (approximate) index-based search, and the (exact) brute-force search. + +To accomplish that, you can configure `SimilaritySearch` aggregator to ignore any configured index and to perform brute-force search anyway, by calling `bruteForce` method on the aggregator instance: + +[source,java] +---- +var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 10) + .bruteForce(); +---- + +===== Indexed Brute-Force Search + +It is possible to improve performance of a brute-force search by creating a forward-only index on the vector attribute using `DeserializationAccelerator`: + +[source,java] +---- +NamedMap books = session.getMap("books"); +books.addIndex(new DeserializationAccelerator(Book::getSummaryEmbedding)); +---- + +This will avoid repeated deserialization of `Book` values when performing brute-force search, at the cost of additional memory consumed by the indexed vector instances. + +The search will still perform the exact distance calculation, so the results will be exact, just like with the non-indexed brute-force search. + +==== Index-based Search + +While the brute force searches work fine with small data sets, as the data set gets larger it is highly recommended to create a vector index for a vector property. + +Coherence supports two vector index types out of the box: HNSW index and Binary Quantization index. + +===== HNSW Index + +HNSW index performs approximate vector search using https://arxiv.org/abs/1603.09320[Hierarchical Navigable Small World graphs], as described by Malkov and Yashunin. + +Coherence uses embedded native implementation of https://github.com/nmslib/hnswlib[hnswlib] for HNSW index implementation, so in order to use HNSW index you need to add a dependency on `coherence-hnsw` module, which contains all Java code and pre-built native libraries for Linux (ARM and x86), Mac (ARM and x86) and Windows (x86 only) that you need: + +[source,xml] +---- + + ${coherence.groupId} + coherence-hnsw + ${coherence.version} + +---- + +Once you add the dependency above, creating HNSW index is as simple as + +[source,java] +---- +NamedMap books = session.getMap("books"); +books.addIndex(new HnswIndex<>(Book::getSummaryEmbedding, 768)); +---- + +The first argument to `HnswIndex` constructor is the extractor for the vector attribute to index, and the second is the number of dimensions each indexed vector will have (which must be identical), which will allow native index implementation to pre-allocate memory required for index. + +By default, `HnswIndex` will use cosine distance to calculate vector distances, but this can be overridden by specifying `spaceName` argument ina constructor: + +[source,java] +---- +NamedMap books = session.getMap("books"); +books.addIndex(new HnswIndex<>(Book::getSummaryEmbedding, "L2", 768)); +---- + +The valid values for space name are `COSINE`, `L2` and `IP` (inner product). + +`HnswIndex` also provides a number of options that can be used to fine-tune its behavior, which can be specified using fluent API: + +[source,java] +---- +var hnsw = new HnswIndex<>(Book::getSummaryEmbedding, 768) + .setEfConstr(200) + .setEfSearch(50) + .setM(16) + .setRandomSeed(100); +books.addIndex(hnsw); +---- + +The algorithm parameters above are described in more detail in https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md[`hnswlib` documentation]. + +You can also specify maximum index size by calling `setMaxElements` method. By default, the index will be created with a maximum size of 4,096 elements, and will be resized as necessary to accommodate data set growth. However, resize operation is somewhat costly and can be avoided if you know ahead of time how many entries will be stored in a Coherence map you are creating the index on, in which case you should configure the index size accordingly. + +[NOTE] +==== +Remember that Coherence partitions indexes, so there will be as many instances of HNSW index as there are partitions. + +This means that the ideal `maxElements` settings is just a bit over `mapSize / partitionCount`, and not the actual map size, which would be way too big. +==== + +Once you have HNSW index configured and created, you can simply perform searches the same way as we did earlier using brute-force search. Coherence will automatically detect and use HNSW index, if one is available. + +===== Binary Quantization + +Coherence also supports https://huggingface.co/blog/embedding-quantization#binary-quantization[Binary Quantization]-based index, which provides significant space savings (32x) compared to vector indexes that use `float32` vectors, such as HNSW. It does this by converting each 32-bit float in the original vector into either 0 or 1, and representing it using a single bit in a `BitSet`. + +The downside is that the recall may not be as accurate, especially with smaller vectors, but that can be largely addressed by oversampling and re-scoring of the results, which Coherence automatically performs. + +`BinaryQuantIndex` is implemented in pure Java, and is a part of the main Coherence distribution, so it requires no additional dependencies. To create it, simply call `NamedMap.addIndex` method: + +[source,java] +---- +NamedMap books = session.getMap("books"); +books.addIndex(new BinaryQuantIndex<>(Book::getSummaryEmbedding)); +---- + +The only option you can specify is the `oversamplingFactor`, which is the multiplier for the maximum number of the results to return, and is 3 by default, meaning that if your search aggregator is configured to return 10 results, binary quantization search will initially return 30 results based on the Hamming distance between the binary representation of the search vector and index vectors, re-score all 30 results using exact distance calculation and then re-order and return top 10 results based on the calculated exact distance. + +To change `oversamplingFactor`, you can specify it using fluent API when creating an index + +[source,java] +---- +NamedMap books = session.getMap("books"); +books.addIndex(new BinaryQuantIndex<>(Book::getSummaryEmbedding).oversamplingFactor(5)); +---- + +which will cause `SimilaritySearch` aggregator to return and re-score 50 results initially instead of 30, in the example above. + +Just like with HNSW index, once you have Binary Quantization index configured and created, you can simply perform searches the same way as we did earlier using brute-force search. Coherence will automatically detect it and use it. + +==== Metadata Filtering + +In addition to vector-based similarity search, you can use standard Coherence filters to perform metadata-based filtering of the results. For example, if we only wanted to search books by a specific author, we could specify a metadata filter `SimilaritySearch` aggregator should use in conjunction with a vector similarity search: + +[source,java] +---- +var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 3) + .filter(Filters.equal(Book::getAuthor, "Jules Verne")); +var results = books.aggregate(search); +---- + +The above should return only top 3 books written by Jules Verne, sorted according to vector similarity. + +Metadata filtering works the same regardless of whether you use brute-force or index-based search, and will use any indexes you may have on the metadata attributes you are filtering on, such as `Book::getAuthor` in this case, to speed up filter evaluation. + +If you are a long-time Coherence user, you may be wondering why we are setting the filter on the aggregator itself and performing filter evaluation inside the aggregator, instead of using `aggregate` method that accepts a filter and allows us to pre-filter the set of entries to aggregate. + +The reason is that both vector index implementations need to evaluate the filter internally, and only include the result if it evaluates to `true`, so the example above will work in all situations. + +However, if you are using brute-force search, you may achieve the same result, and likely improve performance, by pre-filtering the entries: + +[source,java] +---- +var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 3); +var results = books.aggregate(Filters.equal(Book::getAuthor, "Jules Verne"), search); +---- + +=== Retrieval Augmented Generation (RAG) Support + +Coherence provides `DocumentChunk` class, which can be used to represent document chunks containing text, embedding and metadata, as typically represented in various RAG frameworks. + +While this class can certainly be used on its own, its main purpose is to support various RAG framework integrations in a consistent manner. + +=== Integrations + +The following sections describe integrations with popular AI frameworks that Coherence supports. + +These integrations are not part of Coherence itself, but the contributions we've made to the frameworks below to allow them to use Coherence as a Vector Store, Chat Memory, etc. + +==== LangChain (Python) + +TBD + +==== LangChain4j + +https://github.com/coherence-community/langchain4j/tree/coherence/langchain4j-coherence[langchain4j-coherence] module provides implementation of `EmbeddingStore` and `ChatMemoryStore` interfaces, allowing Coherence to be easily used as either (or both) in LangChain4j applications. + +It also provides a Spring Boot Starter for LangChain4j applications via https://github.com/coherence-community/langchain4j-spring/tree/coherence/langchain4j-coherence-spring-boot-starter[langchain4j-coherence-spring-boot-starter] module. + +For more information, see documentation provided by the integration modules above. + +==== Spring AI + +https://github.com/coherence-community/spring-ai/tree/coherence-store/vector-stores/spring-ai-coherence-store[spring-ai-coherence-store] module provides implementation of `VectorStore` interface, allowing Coherence to be easily used as such in Spring AI applications. + +It also provides a Spring Boot Starter via https://github.com/coherence-community/spring-ai/tree/coherence-store/spring-ai-spring-boot-starters/spring-ai-starter-coherence-store[spring-ai-starter-coherence-store] module. + +For more information, see documentation provided by the integration modules above.