[8.x] Adding new bbq index types behind a feature flag (#114439) (#11…

…4783) * Adding new bbq index types behind a feature flag (#114439) new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties. (cherry picked from commit 6c752ab) * spotless
elastic · Oct 15, 2024 · 64e8f2a · 64e8f2a
1 parent 22deacb
commit 64e8f2a
Show file tree

Hide file tree

Showing 16 changed files with 767 additions and 59 deletions.
diff --git a/docs/changelog/114439.yaml b/docs/changelog/114439.yaml
@@ -0,0 +1,5 @@
+pr: 114439
+summary: Adding new bbq index types behind a feature flag
+area: Vector Search
+type: feature
+issues: []
diff --git a/docs/reference/mapping/types/dense-vector.asciidoc b/docs/reference/mapping/types/dense-vector.asciidoc
@@ -115,22 +115,27 @@ that sacrifices result accuracy for improved speed.
 ==== Automatically quantize vectors for kNN search
 
 The `dense_vector` type supports quantization to reduce the memory footprint required when <<approximate-knn, searching>> `float` vectors.
-The two following quantization strategies are supported:
+The three following quantization strategies are supported:
 
 +
 --
-`int8` - Quantizes each dimension of the vector to 1-byte integers. This can reduce the memory footprint by 75% at the cost of some accuracy.
-`int4` - Quantizes each dimension of the vector to half-byte integers. This can reduce the memory footprint by 87% at the cost of some accuracy.
+`int8` - Quantizes each dimension of the vector to 1-byte integers. This reduces the memory footprint by 75% (or 4x) at the cost of some accuracy.
+`int4` - Quantizes each dimension of the vector to half-byte integers. This reduces the memory footprint by 87% (or 8x) at the cost of accuracy.
+`bbq` - experimental:[] Better binary quantization which reduces each dimension to a single bit precision. This reduces the memory footprint by 96% (or 32x) at a larger cost of accuracy. Generally, oversampling during query time and reranking can help mitigate the accuracy loss.
 --
 
-To use a quantized index, you can set your index type to `int8_hnsw` or `int4_hnsw`. When indexing `float` vectors, the current default
+When using a quantized format, you may want to oversample and rescore the results to improve accuracy. See <<dense-vector-knn-search-reranking, oversampling and rescoring>> for more information.
+
+To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default
 index type is `int8_hnsw`.
 
 NOTE: Quantization will continue to keep the raw float vector values on disk for reranking, reindexing, and quantization improvements over the lifetime of the data.
-This means disk usage will increase by ~25% for `int8` and ~12.5% for `int4` due to the overhead of storing the quantized and raw vectors.
+This means disk usage will increase by ~25% for `int8`, ~12.5% for `int4`, and ~3.1% for `bbq` due to the overhead of storing the quantized and raw vectors.
 
 NOTE: `int4` quantization requires an even number of vector dimensions.
 
+NOTE: experimental:[] `bbq` quantization only supports vector dimensions that are greater than 64.
+
 Here is an example of how to create a byte-quantized index:
 
 [source,console]
@@ -173,6 +178,27 @@ PUT my-byte-quantized-index
 }
 --------------------------------------------------
 
+experimental:[] Here is an example of how to create a binary quantized index:
+
+[source,console]
+--------------------------------------------------
+PUT my-byte-quantized-index
+{
+  "mappings": {
+    "properties": {
+      "my_vector": {
+        "type": "dense_vector",
+        "dims": 64,
+        "index": true,
+        "index_options": {
+          "type": "bbq_hnsw"
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
+
 [role="child_attributes"]
 [[dense-vector-params]]
 ==== Parameters for dense vector fields
@@ -301,11 +327,16 @@ by 4x at the cost of some accuracy. See <<dense-vector-quantization, Automatical
 * `int4_hnsw` - This utilizes the https://arxiv.org/abs/1603.09320[HNSW algorithm] in addition to automatically scalar
 quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint
 by 8x at the cost of some accuracy. See <<dense-vector-quantization, Automatically quantize vectors for kNN search>>.
+* experimental:[] `bbq_hnsw` - This utilizes the https://arxiv.org/abs/1603.09320[HNSW algorithm] in addition to automatically binary
+quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint
+by 32x at the cost of accuracy. See <<dense-vector-quantization, Automatically quantize vectors for kNN search>>.
 * `flat` - This utilizes a brute-force search algorithm for exact kNN search. This supports all `element_type` values.
 * `int8_flat` - This utilizes a brute-force search algorithm in addition to automatically scalar quantization. Only supports
 `element_type` of `float`.
 * `int4_flat` - This utilizes a brute-force search algorithm in addition to automatically half-byte scalar quantization. Only supports
 `element_type` of `float`.
+* experimental:[] `bbq_flat` - This utilizes a brute-force search algorithm in addition to automatically binary quantization. Only supports
+`element_type` of `float`.
 --
 `m`:::
 (Optional, integer)

diff --git a/docs/reference/search/search-your-data/knn-search.asciidoc b/docs/reference/search/search-your-data/knn-search.asciidoc
@@ -1149,3 +1149,95 @@ POST product-index/_search
 ----
 //TEST[continued]
 
+[discrete]
+[[dense-vector-knn-search-reranking]]
+==== Oversampling and rescoring for quantized vectors
+
+All forms of quantization will result in some accuracy loss and as the quantization level increases the accuracy loss will also increase.
+Generally, we have found that:
+- `int8` requires minimal if any rescoring
+- `int4` requires some rescoring for higher accuracy and larger recall scenarios. Generally, oversampling by 1.5x-2x recovers most of the accuracy loss.
+- `bbq` requires rescoring except on exceptionally large indices or models specifically designed for quantization. We have found that between 3x-5x oversampling is generally sufficient. But for fewer dimensions or vectors that do not quantize well, higher oversampling may be required.
+
+There are two main ways to oversample and rescore. The first is to utilize the <<rescore, rescore section>> in the `_search` request.
+
+Here is an example using the top level `knn` search with oversampling and using `rescore` to rerank the results:
+
+[source,console]
+--------------------------------------------------
+POST /my-index/_search
+{
+  "size": 10, <1>
+  "knn": {
+    "query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
+    "field": "my_int4_vector",
+    "k": 20, <2>
+    "num_candidates": 50
+  },
+  "rescore": {
+    "window_size": 20, <3>
+    "query": {
+      "rescore_query": {
+        "script_score": {
+          "query": {
+            "match_all": {}
+          },
+          "script": {
+            "source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)", <4>
+            "params": {
+              "queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
+            }
+          }
+        }
+      },
+      "query_weight": 0, <5>
+      "rescore_query_weight": 1 <6>
+    }
+  }
+}
+--------------------------------------------------
+// TEST[skip: setup not provided]
+<1> The number of results to return, note its only 10 and we will oversample by 2x, gathering 20 nearest neighbors.
+<2> The number of results to return from the KNN search. This will do an approximate KNN search with 50 candidates
+per HNSW graph and use the quantized vectors, returning the 20 most similar vectors
+according to the quantized score. Additionally, since this is the top-level `knn` object, the global top 20 results
+will from all shards will be gathered before rescoring. Combining with `rescore`, this is oversampling by `2x`, meaning
+gathering 20 nearest neighbors according to quantized scoring and rescoring with higher fidelity float vectors.
+<3> The number of results to rescore, if you want to rescore all results, set this to the same value as `k`
+<4> The script to rescore the results. Script score will interact directly with the originally provided float32 vector.
+<5> The weight of the original query, here we simply throw away the original score
+<6> The weight of the rescore query, here we only use the rescore query
+
+The second way is to score per shard with the <<query-dsl-knn-query, knn query>> and <<query-dsl-script-score-query, script_score query >>. Generally, this means that there will be more rescoring per shard, but this
+can increase overall recall at the cost of compute.
+
+[source,console]
+--------------------------------------------------
+POST /my-index/_search
+{
+  "size": 10, <1>
+  "query": {
+    "script_score": {
+      "query": {
+        "knn": { <2>
+          "query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
+          "field": "my_int4_vector",
+          "num_candidates": 20 <3>
+        }
+      },
+      "script": {
+        "source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)", <4>
+        "params": {
+          "queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
+// TEST[skip: setup not provided]
+<1> The number of results to return
+<2> The `knn` query to perform the initial search, this is executed per-shard
+<3> The number of candidates to use for the initial approximate `knn` search. This will search using the quantized vectors
+and return the top 20 candidates per shard to then be scored
+<4> The script to score the results. Script score will interact directly with the originally provided float32 vector.
diff --git a/...c/src/yamlRestTest/resources/rest-api-spec/test/search.vectors/41_knn_search_bbq_hnsw.yml b/...c/src/yamlRestTest/resources/rest-api-spec/test/search.vectors/41_knn_search_bbq_hnsw.yml
@@ -0,0 +1,160 @@
+setup:
+  - requires:
+      cluster_features: "mapper.vectors.bbq"
+      reason: 'kNN float to better-binary quantization is required'
+  - do:
+      indices.create:
+        index: bbq_hnsw
+        body:
+          settings:
+            index:
+              number_of_shards: 1
+          mappings:
+            properties:
+              name:
+                type: keyword
+              vector:
+                type: dense_vector
+                dims: 64
+                index: true
+                similarity: l2_norm
+                index_options:
+                  type: bbq_hnsw
+              another_vector:
+                type: dense_vector
+                dims: 64
+                index: true
+                similarity: l2_norm
+                index_options:
+                  type: bbq_hnsw
+
+  - do:
+      index:
+        index: bbq_hnsw
+        id: "1"
+        body:
+          name: cow.jpg
+          vector: [300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0]
+          another_vector: [115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0]
+  # Flush in order to provoke a merge later
+  - do:
+      indices.flush:
+        index: bbq_hnsw
+
+  - do:
+      index:
+        index: bbq_hnsw
+        id: "2"
+        body:
+          name: moose.jpg
+          vector: [100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0]
+          another_vector: [50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120]
+  # Flush in order to provoke a merge later
+  - do:
+      indices.flush:
+        index: bbq_hnsw
+
+  - do:
+      index:
+        index: bbq_hnsw
+        id: "3"
+        body:
+          name: rabbit.jpg
+          vector: [111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0]
+          another_vector: [11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0]
+  # Flush in order to provoke a merge later
+  - do:
+      indices.flush:
+        index: bbq_hnsw
+
+  - do:
+      indices.forcemerge:
+        index: bbq_hnsw
+        max_num_segments: 1
+---
+"Test knn search":
+  - do:
+      search:
+        index: bbq_hnsw
+        body:
+          knn:
+            field: vector
+            query_vector: [ 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0]
+            k: 3
+            num_candidates: 3
+
+  # Depending on how things are distributed, docs 2 and 3 might be swapped
+  # here we verify that are last hit is always the worst one
+  - match: { hits.hits.2._id: "1" }
+
+---
+"Test bad quantization parameters":
+  - do:
+      catch: bad_request
+      indices.create:
+        index: bad_bbq_hnsw
+        body:
+          mappings:
+            properties:
+              vector:
+                type: dense_vector
+                dims: 64
+                element_type: byte
+                index: true
+                index_options:
+                  type: bbq_hnsw
+
+  - do:
+      catch: bad_request
+      indices.create:
+        index: bad_bbq_hnsw
+        body:
+          mappings:
+            properties:
+              vector:
+                type: dense_vector
+                dims: 64
+                index: false
+                index_options:
+                  type: bbq_hnsw
+---
+"Test few dimensions fail indexing":
+  - do:
+      catch: bad_request
+      indices.create:
+        index: bad_bbq_hnsw
+        body:
+          mappings:
+            properties:
+              vector:
+                type: dense_vector
+                dims: 42
+                index: true
+                index_options:
+                  type: bbq_hnsw
+
+  - do:
+      indices.create:
+        index: dynamic_dim_bbq_hnsw
+        body:
+          mappings:
+            properties:
+              vector:
+                type: dense_vector
+                index: true
+                similarity: l2_norm
+                index_options:
+                  type: bbq_hnsw
+
+  - do:
+      catch: bad_request
+      index:
+        index: dynamic_dim_bbq_hnsw
+        body:
+          vector: [1.0, 2.0, 3.0, 4.0, 5.0]
+
+  - do:
+      index:
+        index: dynamic_dim_bbq_hnsw
+        body:
+          vector: [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0]