Skip to content

Commit

Permalink
[8.x] Adding new bbq index types behind a feature flag (#114439) (#11…
Browse files Browse the repository at this point in the history
…4783)

* Adding new bbq index types behind a feature flag (#114439)

new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties.

(cherry picked from commit 6c752ab)

* spotless
  • Loading branch information
benwtrent authored Oct 15, 2024
1 parent 22deacb commit 64e8f2a
Show file tree
Hide file tree
Showing 16 changed files with 767 additions and 59 deletions.
5 changes: 5 additions & 0 deletions docs/changelog/114439.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 114439
summary: Adding new bbq index types behind a feature flag
area: Vector Search
type: feature
issues: []
41 changes: 36 additions & 5 deletions docs/reference/mapping/types/dense-vector.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -115,22 +115,27 @@ that sacrifices result accuracy for improved speed.
==== Automatically quantize vectors for kNN search

The `dense_vector` type supports quantization to reduce the memory footprint required when <<approximate-knn, searching>> `float` vectors.
The two following quantization strategies are supported:
The three following quantization strategies are supported:

+
--
`int8` - Quantizes each dimension of the vector to 1-byte integers. This can reduce the memory footprint by 75% at the cost of some accuracy.
`int4` - Quantizes each dimension of the vector to half-byte integers. This can reduce the memory footprint by 87% at the cost of some accuracy.
`int8` - Quantizes each dimension of the vector to 1-byte integers. This reduces the memory footprint by 75% (or 4x) at the cost of some accuracy.
`int4` - Quantizes each dimension of the vector to half-byte integers. This reduces the memory footprint by 87% (or 8x) at the cost of accuracy.
`bbq` - experimental:[] Better binary quantization which reduces each dimension to a single bit precision. This reduces the memory footprint by 96% (or 32x) at a larger cost of accuracy. Generally, oversampling during query time and reranking can help mitigate the accuracy loss.
--

To use a quantized index, you can set your index type to `int8_hnsw` or `int4_hnsw`. When indexing `float` vectors, the current default
When using a quantized format, you may want to oversample and rescore the results to improve accuracy. See <<dense-vector-knn-search-reranking, oversampling and rescoring>> for more information.

To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default
index type is `int8_hnsw`.

NOTE: Quantization will continue to keep the raw float vector values on disk for reranking, reindexing, and quantization improvements over the lifetime of the data.
This means disk usage will increase by ~25% for `int8` and ~12.5% for `int4` due to the overhead of storing the quantized and raw vectors.
This means disk usage will increase by ~25% for `int8`, ~12.5% for `int4`, and ~3.1% for `bbq` due to the overhead of storing the quantized and raw vectors.

NOTE: `int4` quantization requires an even number of vector dimensions.

NOTE: experimental:[] `bbq` quantization only supports vector dimensions that are greater than 64.

Here is an example of how to create a byte-quantized index:

[source,console]
Expand Down Expand Up @@ -173,6 +178,27 @@ PUT my-byte-quantized-index
}
--------------------------------------------------

experimental:[] Here is an example of how to create a binary quantized index:

[source,console]
--------------------------------------------------
PUT my-byte-quantized-index
{
"mappings": {
"properties": {
"my_vector": {
"type": "dense_vector",
"dims": 64,
"index": true,
"index_options": {
"type": "bbq_hnsw"
}
}
}
}
}
--------------------------------------------------

[role="child_attributes"]
[[dense-vector-params]]
==== Parameters for dense vector fields
Expand Down Expand Up @@ -301,11 +327,16 @@ by 4x at the cost of some accuracy. See <<dense-vector-quantization, Automatical
* `int4_hnsw` - This utilizes the https://arxiv.org/abs/1603.09320[HNSW algorithm] in addition to automatically scalar
quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint
by 8x at the cost of some accuracy. See <<dense-vector-quantization, Automatically quantize vectors for kNN search>>.
* experimental:[] `bbq_hnsw` - This utilizes the https://arxiv.org/abs/1603.09320[HNSW algorithm] in addition to automatically binary
quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint
by 32x at the cost of accuracy. See <<dense-vector-quantization, Automatically quantize vectors for kNN search>>.
* `flat` - This utilizes a brute-force search algorithm for exact kNN search. This supports all `element_type` values.
* `int8_flat` - This utilizes a brute-force search algorithm in addition to automatically scalar quantization. Only supports
`element_type` of `float`.
* `int4_flat` - This utilizes a brute-force search algorithm in addition to automatically half-byte scalar quantization. Only supports
`element_type` of `float`.
* experimental:[] `bbq_flat` - This utilizes a brute-force search algorithm in addition to automatically binary quantization. Only supports
`element_type` of `float`.
--
`m`:::
(Optional, integer)
Expand Down
92 changes: 92 additions & 0 deletions docs/reference/search/search-your-data/knn-search.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -1149,3 +1149,95 @@ POST product-index/_search
----
//TEST[continued]

[discrete]
[[dense-vector-knn-search-reranking]]
==== Oversampling and rescoring for quantized vectors

All forms of quantization will result in some accuracy loss and as the quantization level increases the accuracy loss will also increase.
Generally, we have found that:
- `int8` requires minimal if any rescoring
- `int4` requires some rescoring for higher accuracy and larger recall scenarios. Generally, oversampling by 1.5x-2x recovers most of the accuracy loss.
- `bbq` requires rescoring except on exceptionally large indices or models specifically designed for quantization. We have found that between 3x-5x oversampling is generally sufficient. But for fewer dimensions or vectors that do not quantize well, higher oversampling may be required.

There are two main ways to oversample and rescore. The first is to utilize the <<rescore, rescore section>> in the `_search` request.

Here is an example using the top level `knn` search with oversampling and using `rescore` to rerank the results:

[source,console]
--------------------------------------------------
POST /my-index/_search
{
"size": 10, <1>
"knn": {
"query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
"field": "my_int4_vector",
"k": 20, <2>
"num_candidates": 50
},
"rescore": {
"window_size": 20, <3>
"query": {
"rescore_query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)", <4>
"params": {
"queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
}
}
}
},
"query_weight": 0, <5>
"rescore_query_weight": 1 <6>
}
}
}
--------------------------------------------------
// TEST[skip: setup not provided]
<1> The number of results to return, note its only 10 and we will oversample by 2x, gathering 20 nearest neighbors.
<2> The number of results to return from the KNN search. This will do an approximate KNN search with 50 candidates
per HNSW graph and use the quantized vectors, returning the 20 most similar vectors
according to the quantized score. Additionally, since this is the top-level `knn` object, the global top 20 results
will from all shards will be gathered before rescoring. Combining with `rescore`, this is oversampling by `2x`, meaning
gathering 20 nearest neighbors according to quantized scoring and rescoring with higher fidelity float vectors.
<3> The number of results to rescore, if you want to rescore all results, set this to the same value as `k`
<4> The script to rescore the results. Script score will interact directly with the originally provided float32 vector.
<5> The weight of the original query, here we simply throw away the original score
<6> The weight of the rescore query, here we only use the rescore query

The second way is to score per shard with the <<query-dsl-knn-query, knn query>> and <<query-dsl-script-score-query, script_score query >>. Generally, this means that there will be more rescoring per shard, but this
can increase overall recall at the cost of compute.

[source,console]
--------------------------------------------------
POST /my-index/_search
{
"size": 10, <1>
"query": {
"script_score": {
"query": {
"knn": { <2>
"query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
"field": "my_int4_vector",
"num_candidates": 20 <3>
}
},
"script": {
"source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)", <4>
"params": {
"queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
}
}
}
}
}
--------------------------------------------------
// TEST[skip: setup not provided]
<1> The number of results to return
<2> The `knn` query to perform the initial search, this is executed per-shard
<3> The number of candidates to use for the initial approximate `knn` search. This will search using the quantized vectors
and return the top 20 candidates per shard to then be scored
<4> The script to score the results. Script score will interact directly with the originally provided float32 vector.
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
setup:
- requires:
cluster_features: "mapper.vectors.bbq"
reason: 'kNN float to better-binary quantization is required'
- do:
indices.create:
index: bbq_hnsw
body:
settings:
index:
number_of_shards: 1
mappings:
properties:
name:
type: keyword
vector:
type: dense_vector
dims: 64
index: true
similarity: l2_norm
index_options:
type: bbq_hnsw
another_vector:
type: dense_vector
dims: 64
index: true
similarity: l2_norm
index_options:
type: bbq_hnsw

- do:
index:
index: bbq_hnsw
id: "1"
body:
name: cow.jpg
vector: [300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0, 230.0, 300.33, -34.8988, 15.555, -200.0]
another_vector: [115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0, 130.0, 115.0, -1.02, 15.555, -100.0]
# Flush in order to provoke a merge later
- do:
indices.flush:
index: bbq_hnsw

- do:
index:
index: bbq_hnsw
id: "2"
body:
name: moose.jpg
vector: [100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0, -0.5, 100.0, -13, 14.8, -156.0]
another_vector: [50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120, -0.5, 50.0, -1, 1, 120]
# Flush in order to provoke a merge later
- do:
indices.flush:
index: bbq_hnsw

- do:
index:
index: bbq_hnsw
id: "3"
body:
name: rabbit.jpg
vector: [111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0, 0.5, 111.3, -13.0, 14.8, -156.0]
another_vector: [11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0, -0.5, 11.0, 0, 12, 111.0]
# Flush in order to provoke a merge later
- do:
indices.flush:
index: bbq_hnsw

- do:
indices.forcemerge:
index: bbq_hnsw
max_num_segments: 1
---
"Test knn search":
- do:
search:
index: bbq_hnsw
body:
knn:
field: vector
query_vector: [ 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0, -0.5, 90.0, -10, 14.8, -156.0]
k: 3
num_candidates: 3

# Depending on how things are distributed, docs 2 and 3 might be swapped
# here we verify that are last hit is always the worst one
- match: { hits.hits.2._id: "1" }

---
"Test bad quantization parameters":
- do:
catch: bad_request
indices.create:
index: bad_bbq_hnsw
body:
mappings:
properties:
vector:
type: dense_vector
dims: 64
element_type: byte
index: true
index_options:
type: bbq_hnsw

- do:
catch: bad_request
indices.create:
index: bad_bbq_hnsw
body:
mappings:
properties:
vector:
type: dense_vector
dims: 64
index: false
index_options:
type: bbq_hnsw
---
"Test few dimensions fail indexing":
- do:
catch: bad_request
indices.create:
index: bad_bbq_hnsw
body:
mappings:
properties:
vector:
type: dense_vector
dims: 42
index: true
index_options:
type: bbq_hnsw

- do:
indices.create:
index: dynamic_dim_bbq_hnsw
body:
mappings:
properties:
vector:
type: dense_vector
index: true
similarity: l2_norm
index_options:
type: bbq_hnsw

- do:
catch: bad_request
index:
index: dynamic_dim_bbq_hnsw
body:
vector: [1.0, 2.0, 3.0, 4.0, 5.0]

- do:
index:
index: dynamic_dim_bbq_hnsw
body:
vector: [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0]
Loading

0 comments on commit 64e8f2a

Please sign in to comment.