From 8980923f0f51e69d80c3a388414d283101839231 Mon Sep 17 00:00:00 2001 From: Naveen Tatikonda Date: Tue, 23 Jan 2024 18:07:46 -0600 Subject: [PATCH 01/19] Add Documentation for k-NN Faiss SQFP16 Signed-off-by: Naveen Tatikonda --- _search-plugins/knn/knn-index.md | 70 +++++++++++- .../knn/knn-vector-quantization.md | 103 ++++++++++++++++++ 2 files changed, 168 insertions(+), 5 deletions(-) create mode 100644 _search-plugins/knn/knn-vector-quantization.md diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 4a527f3bcb..f5f4ef18af 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -15,6 +15,16 @@ The k-NN plugin introduces a custom data type, the `knn_vector`, that allows use Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). +## SIMD optimization for Faiss + +Starting with k-NN plugin version 2.13, [SIMD(Single instruction, multiple data)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) is supported by default on Linux machines only for Faiss engine if the underlying processor on the system supports SIMD instructions (`AVX2` on `x64` architecture and `NEON` on `ARM64` architecture) which helps to boost the overall performance. +For x64 architecture, two different versions of Faiss library(`libopensearchknn_faiss.so` and `libopensearchknn_faiss_avx2.so`) are built and shipped with the artifact where the library with `_avx2` suffix has the AVX2 SIMD instructions. During runtime, detects if the underlying system supports AVX2 or not and loads the corresponding library. + +Users can override and disable AVX2 and load the default Faiss library(`libopensearchknn_faiss.so`) even if system supports avx2 by setting `knn.faiss.avx2.disabled`(Static) to `true` in opensearch.yml (which is by default `false`). +{: .note} + +For arm64 architecture, only one Faiss library(`libopensearchknn_faiss.so`) is built and shipped which contains the NEON SIMD instructions and unlike avx2, it can't be disabled. + ## Method definitions A method definition refers to the underlying configuration of the Approximate k-NN algorithm you want to use. Method definitions are used to either create a `knn_vector` field (when the method does not require training) or [create a model during training]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model) that can then be used to [create a `knn_vector` field]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). @@ -48,7 +58,7 @@ For nmslib, *ef_search* is set in the [index settings](#index-settings). An index created in OpenSearch version 2.11 or earlier will still use the old `ef_construction` value (`512`). {: .note} -### Supported faiss methods +### Supported Faiss methods Method name | Requires training | Supported spaces | Description :--- | :--- | :--- | :--- @@ -122,10 +132,10 @@ An index created in OpenSearch version 2.11 or earlier will still use the old `e } ``` -### Supported faiss encoders +### Supported Faiss encoders -You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. faiss has -several encoder types, but the plugin currently only supports *flat* and *pq* encoding. +You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. Faiss has +several encoder types, but the plugin currently only supports `flat`, `pq`, and `sq` encoding. The following example method definition specifies the `hnsw` method and a `pq` encoder: @@ -153,6 +163,7 @@ Encoder name | Requires training | Description :--- | :--- | :--- `flat` | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint. `pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388). +`sq` | false | sq stands for Scalar Quantization. Starting with k-NN plugin version 2.13, you can use the sq encoder(by default [SQFP16]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-sqfp16)) to quantize 32-bit floating-point vectors into 16-bit floats by using the built-in Faiss ScalarQuantizer in order to reduce the memory footprint with a minimal loss of precision. Besides optimizing memory use, sq improves the overall performance with the SIMD optimization (using `AVX2` on `x86` architecture and using `NEON` on `ARM` architecture). #### Examples @@ -204,13 +215,62 @@ The following example uses the `hnsw` method without specifying an encoder (by d } ``` +The following example uses the `hnsw` method with a `sq` encoder of type `fp16` with `clip` enabled: + +```json +"method": { + "name":"hnsw", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "encoder": { + "name": "sq", + "parameters": { + "type": "fp16", + "clip": true + } + }, + "ef_construction": 256, + "m": 8 + } +} +``` + +The following example uses the `ivf` method with a `sq` encoder of type `fp16`: + +```json +"method": { + "name":"ivf", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "encoder": { + "name": "sq", + "parameters": { + "type": "fp16", + "clip": false + } + }, + "nprobes": 2 + } + } +``` + + #### PQ parameters -Paramater Name | Required | Default | Updatable | Description +Parameter name | Required | Default | Updatable | Description :--- | :--- | :--- | :--- | :--- `m` | false | 1 | false | Determines the number of subvectors into which to break the vector. Subvectors are encoded independently of each other. This dimension of the vector must be divisible by `m`. Maximum value is 1,024. `code_size` | false | 8 | false | Determines the number of bits into which to encode a subvector. Maximum value is 8. For IVF, this value must be less than or equal to 8. For HNSW, this value can only be 8. +#### SQ parameters + +Parameter name | Required | Default | Updatable | Description +:--- | :--- | :-- | :--- | :--- +`type` | false | fp16 | false | Determines the type of scalar quantization to be used to encode the 32 bit float vectors into the corresponding type. By default, it is `fp16`. +`clip` | false | false | false | When set to `true`, clips the vectors that are outside of the range to bring them into the range. + ### Choosing the right method There are a lot of options to choose from when building your `knn_vector` field. To determine the correct methods and parameters to choose, you should first understand what requirements you have for your workload and what trade-offs you are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, (4) indexing latency. diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md new file mode 100644 index 0000000000..2cc2c4c6e8 --- /dev/null +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -0,0 +1,103 @@ +--- +layout: default +title: k-NN vector quantization +nav_order: 50 +parent: k-NN search +grand_parent: Search methods +has_children: false +has_math: true +--- + +# k-NN vector quantization + +The OpenSearch k-NN plugin by default supports the indexing and querying of vectors of type float where each dimension of the vector occupies 4 bytes of memory. This is getting expensive in terms of memory for use cases that requires ingestion on a large scale where we need to construct, load, save and search graphs(for native engines `nmslib` and `faiss`) which is getting even more costlier. To reduce these memory footprints, we can use these vector quantization features supported by k-NN plugin. + +## Lucene byte vector + +Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of memory needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). + +## Faiss sqfp16 + +Starting with k-NN plugin version 2.13, users can ingest `fp16` vectors with `faiss` engine where when user provides the 32 bit float vectors, the Faiss engine quantize the vector into FP16 using scalar quantization (users don’t need to do any quantization on their end), stores it and decodes it back to FP32 for distance computation during search operations. Using this feature, users can +reduce memory footprints by a factor of 2, significant reduction in search latencies (with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-faiss)), with a very minimal loss in recall(depends on distribution of vectors). + +To use this feature, users needs to set `encoder` name as `sq` and to know the type of quantization in SQ, we are introducing a new optional field, `type` in the encoder parameters. The data indexed by users should be within the FP16 range of [-65504.0, 65504.0]. If the data lies out of this range then an exception is thrown and the request is rejected. + +We also introduced another optional encoder parameter `clip` and if this is set to `true`(by default `false`) in the index mapping, then if the data lies out of FP16 range it will be clipped to the MIN(`-65504.0`) and MAX(`65504.0`) of FP16 range and ingested into the index without throwing any exception. But, clipping the values might cause a drop in recall. + +For Example - when `clip` is set to `true`, `65510.82` will be clipped and indexed as `65504.0` and `-65504.1` will be clipped and indexed as `-65504.0`. + +Ideally, `clip` parameter is recommended to be set as `true` only when most of the vector elements are within the fp16 range and very few elements lies outside of the range. +{: .note} + +* `type` - Set this as `fp16` if we want to quantize the indexed vectors into fp16 using Faiss SQFP16; Default value is `fp16`. +* `clip` - Set this as `true` if you want to skip the FP16 validation check and clip vector value to bring it into FP16 MIN or MAX range; Default value is `false`. + +This is an example of a method definition using Faiss SQfp16 with `clip` as `true` +```json +"method": { + "name":"hnsw", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "encoder":{ + "name":"sq", + "parameters":{ + "type": "fp16", + "clip": true + } + } + } +} + +``` + +During ingestion, make sure each dimension of the vector is in the supported range [-65504.0, 65504.0] if `clip` is set as `false`: +```json +PUT test-index/_doc/1 +{ + "my_vector1": [-65504.0, 65503.845, 55.82] +} +``` + +During querying, there is no range limitation for query vector: +```json +GET test-index/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector1": { + "vector": [265436.876, -120906.256, 99.84], + "k": 2 + } + } + } +} +``` + +### Memory estimation + +Ideally, Faiss SQfp16 requires 50% of the memory consumed by FP32 vectors. + +#### HNSW memory estimation + +The memory required for HNSW is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector. + +As an example, assume you have a million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: + +``` +1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB +``` + +#### IVF memory estimation + +The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_vectors) + (4 * nlist * d))` bytes. + +As an example, assume you have a million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows: + +``` +1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB + +``` + From e9db03ffc52b4be424e408e026c20dca37e7c0b7 Mon Sep 17 00:00:00 2001 From: Naveen Tatikonda Date: Mon, 18 Mar 2024 22:34:10 -0500 Subject: [PATCH 02/19] Address Review Comments Signed-off-by: Naveen Tatikonda --- _search-plugins/knn/knn-index.md | 4 ++-- _search-plugins/knn/knn-vector-quantization.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index f5f4ef18af..07ac5521c3 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -163,7 +163,7 @@ Encoder name | Requires training | Description :--- | :--- | :--- `flat` | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint. `pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388). -`sq` | false | sq stands for Scalar Quantization. Starting with k-NN plugin version 2.13, you can use the sq encoder(by default [SQFP16]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-sqfp16)) to quantize 32-bit floating-point vectors into 16-bit floats by using the built-in Faiss ScalarQuantizer in order to reduce the memory footprint with a minimal loss of precision. Besides optimizing memory use, sq improves the overall performance with the SIMD optimization (using `AVX2` on `x86` architecture and using `NEON` on `ARM` architecture). +`sq` | false | sq stands for Scalar Quantization. Starting with k-NN plugin version 2.13, you can use the sq encoder(by default [SQFP16]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization-fp16)) to quantize 32-bit floating-point vectors into 16-bit floats by using the built-in Faiss ScalarQuantizer in order to reduce the memory footprint with a minimal loss of precision. Besides optimizing memory use, sq improves the overall performance with the SIMD optimization (using `AVX2` on `x86` architecture and using `NEON` on `ARM` architecture). #### Examples @@ -269,7 +269,7 @@ Parameter name | Required | Default | Updatable | Description Parameter name | Required | Default | Updatable | Description :--- | :--- | :-- | :--- | :--- `type` | false | fp16 | false | Determines the type of scalar quantization to be used to encode the 32 bit float vectors into the corresponding type. By default, it is `fp16`. -`clip` | false | false | false | When set to `true`, clips the vectors that are outside of the range to bring them into the range. +`clip` | false | false | false | When set to `true`, clips the vectors that are outside of the range to bring them into the range. If it is `false` and any vector element is out of range, then it rejects the request and throws an exception. ### Choosing the right method diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 2cc2c4c6e8..c2c933623e 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -16,7 +16,7 @@ The OpenSearch k-NN plugin by default supports the indexing and querying of vect Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of memory needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). -## Faiss sqfp16 +## Faiss scalar quantization fp16 Starting with k-NN plugin version 2.13, users can ingest `fp16` vectors with `faiss` engine where when user provides the 32 bit float vectors, the Faiss engine quantize the vector into FP16 using scalar quantization (users don’t need to do any quantization on their end), stores it and decodes it back to FP32 for distance computation during search operations. Using this feature, users can reduce memory footprints by a factor of 2, significant reduction in search latencies (with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-faiss)), with a very minimal loss in recall(depends on distribution of vectors). @@ -31,7 +31,7 @@ Ideally, `clip` parameter is recommended to be set as `true` only when most of t {: .note} * `type` - Set this as `fp16` if we want to quantize the indexed vectors into fp16 using Faiss SQFP16; Default value is `fp16`. -* `clip` - Set this as `true` if you want to skip the FP16 validation check and clip vector value to bring it into FP16 MIN or MAX range; Default value is `false`. +* `clip` - Set this as `true` if you want to skip the FP16 validation check and clip vector value to bring it into FP16 MIN or MAX range. If it is `false` and any vector element is out of range, then it rejects the request and throws an exception; Default value is `false`. This is an example of a method definition using Faiss SQfp16 with `clip` as `true` ```json From 9266d6e33def234d37a82725fe25a70738e4e013 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Tue, 19 Mar 2024 16:50:34 -0400 Subject: [PATCH 03/19] Doc review Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-index.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 07ac5521c3..d6b8687a8f 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -17,13 +17,13 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce ## SIMD optimization for Faiss -Starting with k-NN plugin version 2.13, [SIMD(Single instruction, multiple data)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) is supported by default on Linux machines only for Faiss engine if the underlying processor on the system supports SIMD instructions (`AVX2` on `x64` architecture and `NEON` on `ARM64` architecture) which helps to boost the overall performance. -For x64 architecture, two different versions of Faiss library(`libopensearchknn_faiss.so` and `libopensearchknn_faiss_avx2.so`) are built and shipped with the artifact where the library with `_avx2` suffix has the AVX2 SIMD instructions. During runtime, detects if the underlying system supports AVX2 or not and loads the corresponding library. +Starting with k-NN plugin version 2.13, [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) is supported by default on Linux machines only for the Faiss engine if the underlying processor on the system supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD architecture helps boost the overall performance. +For the x64 architecture, two different versions of the Faiss library (`libopensearchknn_faiss.so` and `libopensearchknn_faiss_avx2.so`) are built and shipped with the artifact. The library with the `_avx2` suffix contains the AVX2 SIMD instructions. At runtime, the k-NN plugin detects whether AVX2 is supported and loads the appropriate library. -Users can override and disable AVX2 and load the default Faiss library(`libopensearchknn_faiss.so`) even if system supports avx2 by setting `knn.faiss.avx2.disabled`(Static) to `true` in opensearch.yml (which is by default `false`). +You can override or disable AVX2 and load the default Faiss library (`libopensearchknn_faiss.so`) even if your hardware supports AVX2 by specifying the `knn.faiss.avx2.disabled` static setting as `true` in opensearch.yml (the default is `false`). {: .note} -For arm64 architecture, only one Faiss library(`libopensearchknn_faiss.so`) is built and shipped which contains the NEON SIMD instructions and unlike avx2, it can't be disabled. +For the ARM64 architecture, only one performance-boosting Faiss library (`libopensearchknn_faiss.so`) is built and shipped. The library is based on Neon SIMD instructions and, unlike AVX2, it cannot be disabled. ## Method definitions @@ -163,7 +163,7 @@ Encoder name | Requires training | Description :--- | :--- | :--- `flat` | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint. `pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388). -`sq` | false | sq stands for Scalar Quantization. Starting with k-NN plugin version 2.13, you can use the sq encoder(by default [SQFP16]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization-fp16)) to quantize 32-bit floating-point vectors into 16-bit floats by using the built-in Faiss ScalarQuantizer in order to reduce the memory footprint with a minimal loss of precision. Besides optimizing memory use, sq improves the overall performance with the SIMD optimization (using `AVX2` on `x86` architecture and using `NEON` on `ARM` architecture). +`sq` | false | Stands for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. The default `sq` encoder is [SQFP16]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization-fp16). The built-in Faiss scalar quantizer reduces memory footprint with a minimal loss of precision. Besides optimizing memory use, `sq` improves the overall performance by employing SIMD optimization (using AVX2 on x86 architecture or Neon on ARM architecture). #### Examples From 9ecab9d3d290c8ea707b7e6db05557cc1113b95c Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Thu, 21 Mar 2024 14:19:58 -0400 Subject: [PATCH 04/19] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/knn/knn-index.md | 12 ++++++------ _search-plugins/knn/knn-vector-quantization.md | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index d6b8687a8f..5c27f68888 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -20,10 +20,10 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce Starting with k-NN plugin version 2.13, [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) is supported by default on Linux machines only for the Faiss engine if the underlying processor on the system supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD architecture helps boost the overall performance. For the x64 architecture, two different versions of the Faiss library (`libopensearchknn_faiss.so` and `libopensearchknn_faiss_avx2.so`) are built and shipped with the artifact. The library with the `_avx2` suffix contains the AVX2 SIMD instructions. At runtime, the k-NN plugin detects whether AVX2 is supported and loads the appropriate library. -You can override or disable AVX2 and load the default Faiss library (`libopensearchknn_faiss.so`) even if your hardware supports AVX2 by specifying the `knn.faiss.avx2.disabled` static setting as `true` in opensearch.yml (the default is `false`). +You can override or disable AVX2 and load the default Faiss library (`libopensearchknn_faiss.so`), even if your hardware supports AVX2, by specifying the `knn.faiss.avx2.disabled` static setting as `true` in `opensearch.yml` (default is `false`). {: .note} -For the ARM64 architecture, only one performance-boosting Faiss library (`libopensearchknn_faiss.so`) is built and shipped. The library is based on Neon SIMD instructions and, unlike AVX2, it cannot be disabled. +For the ARM64 architecture, only one performance-boosting Faiss library (`libopensearchknn_faiss.so`) is built and shipped. The library is based on Neon SIMD instructions, and, unlike AVX2, it cannot be disabled. ## Method definitions @@ -163,7 +163,7 @@ Encoder name | Requires training | Description :--- | :--- | :--- `flat` | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint. `pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388). -`sq` | false | Stands for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. The default `sq` encoder is [SQFP16]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization-fp16). The built-in Faiss scalar quantizer reduces memory footprint with a minimal loss of precision. Besides optimizing memory use, `sq` improves the overall performance by employing SIMD optimization (using AVX2 on x86 architecture or Neon on ARM architecture). +`sq` | false | Stands for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. The default `sq` encoder is [SQFP16]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization-fp16). The built-in Faiss scalar quantizer reduces memory footprint with a minimal loss of precision. In addition to optimizing memory use, `sq` improves overall performance by employing SIMD optimization (using AVX2 on x86 architecture or Neon on ARM architecture). #### Examples @@ -215,7 +215,7 @@ The following example uses the `hnsw` method without specifying an encoder (by d } ``` -The following example uses the `hnsw` method with a `sq` encoder of type `fp16` with `clip` enabled: +The following example uses the `hnsw` method with an `sq` encoder of type `fp16` with `clip` enabled: ```json "method": { @@ -236,7 +236,7 @@ The following example uses the `hnsw` method with a `sq` encoder of type `fp16` } ``` -The following example uses the `ivf` method with a `sq` encoder of type `fp16`: +The following example uses the `ivf` method with an `sq` encoder of type `fp16`: ```json "method": { @@ -269,7 +269,7 @@ Parameter name | Required | Default | Updatable | Description Parameter name | Required | Default | Updatable | Description :--- | :--- | :-- | :--- | :--- `type` | false | fp16 | false | Determines the type of scalar quantization to be used to encode the 32 bit float vectors into the corresponding type. By default, it is `fp16`. -`clip` | false | false | false | When set to `true`, clips the vectors that are outside of the range to bring them into the range. If it is `false` and any vector element is out of range, then it rejects the request and throws an exception. +`clip` | `false` | `false` | `false` | When set to `true`, clips any out of range vectors to bring them into the range. If `clip` is `false` and any vector element is out of range, the request throws an exception. ### Choosing the right method diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index c2c933623e..ab62a2120d 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -10,11 +10,11 @@ has_math: true # k-NN vector quantization -The OpenSearch k-NN plugin by default supports the indexing and querying of vectors of type float where each dimension of the vector occupies 4 bytes of memory. This is getting expensive in terms of memory for use cases that requires ingestion on a large scale where we need to construct, load, save and search graphs(for native engines `nmslib` and `faiss`) which is getting even more costlier. To reduce these memory footprints, we can use these vector quantization features supported by k-NN plugin. +By default, the k-NN plugin supports indexing and querying vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors is expensive because OpenSearch needs to construct, load, save, and search graphs (for native engines `nmslib` and `faiss`). To reduce memory use, you can use vector quantization. ## Lucene byte vector -Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of memory needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). +Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). ## Faiss scalar quantization fp16 From e93e1425ba5602496bb321d06a79edfb8c4078bc Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 22 Mar 2024 12:47:36 -0400 Subject: [PATCH 05/19] Doc review Signed-off-by: Fanit Kolchina --- .../styles/Vocab/OpenSearch/Words/accept.txt | 2 + _search-plugins/knn/knn-index.md | 241 +++++++++++------- .../knn/knn-vector-quantization.md | 117 ++++++--- _search-plugins/knn/settings.md | 1 + 4 files changed, 244 insertions(+), 117 deletions(-) diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt index 0a14e17e7d..3c67d94358 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt @@ -80,6 +80,7 @@ Levenshtein [Oo]nboarding pebibyte [Pp]erformant +[Pp]laintext [Pp]luggable [Pp]reconfigure [Pp]refetch @@ -91,6 +92,7 @@ pebibyte [Pp]reprocess [Pp]retrain [Pp]seudocode +[Quantiz](e|ation|ing|er) [Rr]ebalance [Rr]ebalancing [Rr]edownload diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 5c27f68888..ab9dd91be0 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -11,19 +11,61 @@ has_children: false The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. For more information, see [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/). +To create a k-NN index, set the `settings.index.knn` parameter to `true`: + +```json +PUT /test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 3, + "method": { + "name": "hnsw", + "space_type": "l2", + "engine": "lucene", + "parameters": { + "ef_construction": 128, + "m": 24 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + ## Lucene byte vector Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). -## SIMD optimization for Faiss +## SIMD optimization for the Faiss engine -Starting with k-NN plugin version 2.13, [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) is supported by default on Linux machines only for the Faiss engine if the underlying processor on the system supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD architecture helps boost the overall performance. -For the x64 architecture, two different versions of the Faiss library (`libopensearchknn_faiss.so` and `libopensearchknn_faiss_avx2.so`) are built and shipped with the artifact. The library with the `_avx2` suffix contains the AVX2 SIMD instructions. At runtime, the k-NN plugin detects whether AVX2 is supported and loads the appropriate library. +Starting with version 2.13, the k-NN plugin supports [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) processing if the underlying hardware supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD is supported by default on Linux machines only for the Faiss engine. SIMD architecture helps boost the overall performance, improving indexing throughput at indexing time and reducing latency at search time. -You can override or disable AVX2 and load the default Faiss library (`libopensearchknn_faiss.so`), even if your hardware supports AVX2, by specifying the `knn.faiss.avx2.disabled` static setting as `true` in `opensearch.yml` (default is `false`). -{: .note} + +### x64 architecture + + +For the x64 architecture, two different versions of the Faiss library are built and shipped with the artifact: + +- `libopensearchknn_faiss.so`: The default Faiss library. +- `libopensearchknn_faiss_avx2.so`: The Faiss library that contains AVX2 SIMD instructions. + +If your hardware supports AVX2, the k-NN plugin loads the `libopensearchknn_faiss_avx2.so` library at runtime. + +To disable AVX2 and load the default Faiss library (`libopensearchknn_faiss.so`), specify the `knn.faiss.avx2.disabled` static setting as `true` in `opensearch.yml` (default is `false`). Note that to update a static setting, you must stop the cluster, change the setting, and restart the cluster. For more information, see [Static settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). -For the ARM64 architecture, only one performance-boosting Faiss library (`libopensearchknn_faiss.so`) is built and shipped. The library is based on Neon SIMD instructions, and, unlike AVX2, it cannot be disabled. +### ARM64 architecture + +For the ARM64 architecture, only one performance-boosting Faiss library (`libopensearchknn_faiss.so`) is built and shipped. The library contains Neon SIMD instructions and cannot be disabled. ## Method definitions @@ -140,16 +182,20 @@ several encoder types, but the plugin currently only supports `flat`, `pq`, and The following example method definition specifies the `hnsw` method and a `pq` encoder: ```json -"method": { - "name":"hnsw", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "encoder":{ - "name":"pq", - "parameters":{ - "code_size": 8, - "m": 8 +{ + "type": "knn_vector", + "dimension": 100, + "method": { + "name":"hnsw", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "encoder":{ + "name":"pq", + "parameters":{ + "code_size": 8, + "m": 8 + } } } } @@ -161,23 +207,42 @@ The `hnsw` method supports the `pq` encoder for OpenSearch versions 2.10 and lat Encoder name | Requires training | Description :--- | :--- | :--- -`flat` | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint. +`flat` (Default) | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint. `pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388). -`sq` | false | Stands for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. The default `sq` encoder is [SQFP16]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization-fp16). The built-in Faiss scalar quantizer reduces memory footprint with a minimal loss of precision. In addition to optimizing memory use, `sq` improves overall performance by employing SIMD optimization (using AVX2 on x86 architecture or Neon on ARM architecture). +`sq` | false | Stands for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization). -#### Examples +#### PQ parameters + +Parameter name | Required | Default | Updatable | Description +:--- | :--- | :--- | :--- | :--- +`m` | false | 1 | false | Determines the number of subvectors into which to break the vector. Subvectors are encoded independently of each other. This dimension of the vector must be divisible by `m`. Maximum value is 1,024. +`code_size` | false | 8 | false | Determines the number of bits into which to encode a subvector. Maximum value is 8. For IVF, this value must be less than or equal to 8. For HNSW, this value can only be 8. + +#### SQ parameters + +Parameter name | Required | Default | Updatable | Description +:--- | :--- | :-- | :--- | :--- +`type` | false | `fp16` | false | The type of scalar quantization to be used to encode 32-bit float vectors into the corresponding type. As of OpenSearch 2.13, only the `fp16` encoder type is supported. For the `fp16` encoder, vector values must be in the [-65504.0, 65504.0] range. +`clip` | false | `false` | false | If `true`, any vector values that are out of the supported range for the specified vector type are rounded so they are in the range. If `false`, the request is rejected if any vector values are out of the supported range. Setting `clip` to `true` may decrease recall. +For more information and examples, see [Using Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#using-faiss-scalar-quantization). + +#### Examples The following example uses the `ivf` method without specifying an encoder (by default, OpenSearch uses the `flat` encoder): ```json -"method": { - "name":"ivf", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "nlist": 4, - "nprobes": 2 +{ + "type": "knn_vector", + "dimension": 100, + "method": { + "name":"ivf", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "nlist": 4, + "nprobes": 2 + } } } ``` @@ -185,16 +250,20 @@ The following example uses the `ivf` method without specifying an encoder (by d The following example uses the `ivf` method with a `pq` encoder: ```json -"method": { - "name":"ivf", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "encoder":{ - "name":"pq", - "parameters":{ - "code_size": 8, - "m": 8 +{ + "type": "knn_vector", + "dimension": 100, + "method": { + "name":"ivf", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "encoder":{ + "name":"pq", + "parameters":{ + "code_size": 8, + "m": 8 + } } } } @@ -204,13 +273,17 @@ The following example uses the `ivf` method with a `pq` encoder: The following example uses the `hnsw` method without specifying an encoder (by default, OpenSearch uses the `flat` encoder): ```json -"method": { - "name":"hnsw", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "ef_construction": 256, - "m": 8 +{ + "type": "knn_vector", + "dimension": 100, + "method": { + "name":"hnsw", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "ef_construction": 256, + "m": 8 + } } } ``` @@ -218,20 +291,24 @@ The following example uses the `hnsw` method without specifying an encoder (by d The following example uses the `hnsw` method with an `sq` encoder of type `fp16` with `clip` enabled: ```json -"method": { - "name":"hnsw", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "encoder": { - "name": "sq", - "parameters": { - "type": "fp16", - "clip": true - } - }, - "ef_construction": 256, - "m": 8 +{ + "type": "knn_vector", + "dimension": 100, + "method": { + "name":"hnsw", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "encoder": { + "name": "sq", + "parameters": { + "type": "fp16", + "clip": true + } + }, + "ef_construction": 256, + "m": 8 + } } } ``` @@ -239,38 +316,27 @@ The following example uses the `hnsw` method with an `sq` encoder of type `fp16` The following example uses the `ivf` method with an `sq` encoder of type `fp16`: ```json -"method": { - "name":"ivf", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "encoder": { - "name": "sq", - "parameters": { - "type": "fp16", - "clip": false - } - }, - "nprobes": 2 +{ + "type": "knn_vector", + "dimension": 100, + "method": { + "name":"ivf", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "encoder": { + "name": "sq", + "parameters": { + "type": "fp16", + "clip": false + } + }, + "nprobes": 2 + } } - } +} ``` - -#### PQ parameters - -Parameter name | Required | Default | Updatable | Description -:--- | :--- | :--- | :--- | :--- -`m` | false | 1 | false | Determines the number of subvectors into which to break the vector. Subvectors are encoded independently of each other. This dimension of the vector must be divisible by `m`. Maximum value is 1,024. -`code_size` | false | 8 | false | Determines the number of bits into which to encode a subvector. Maximum value is 8. For IVF, this value must be less than or equal to 8. For HNSW, this value can only be 8. - -#### SQ parameters - -Parameter name | Required | Default | Updatable | Description -:--- | :--- | :-- | :--- | :--- -`type` | false | fp16 | false | Determines the type of scalar quantization to be used to encode the 32 bit float vectors into the corresponding type. By default, it is `fp16`. -`clip` | `false` | `false` | `false` | When set to `true`, clips any out of range vectors to bring them into the range. If `clip` is `false` and any vector element is out of range, the request throws an exception. - ### Choosing the right method There are a lot of options to choose from when building your `knn_vector` field. To determine the correct methods and parameters to choose, you should first understand what requirements you have for your workload and what trade-offs you are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, (4) indexing latency. @@ -290,6 +356,9 @@ the `circuit_breaker_limit` cluster setting. By default, the limit is set at 50% Having a replica doubles the total number of vectors. {: .note } +For memory estimation when using vector quantization, see the [vector quantization documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#memory-estimation). +{: .note } + #### HNSW memory estimation The memory required for HNSW is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector. diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index ab62a2120d..520d087e46 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -1,7 +1,7 @@ --- layout: default title: k-NN vector quantization -nav_order: 50 +nav_order: 27 parent: k-NN search grand_parent: Search methods has_children: false @@ -10,57 +10,112 @@ has_math: true # k-NN vector quantization -By default, the k-NN plugin supports indexing and querying vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors is expensive because OpenSearch needs to construct, load, save, and search graphs (for native engines `nmslib` and `faiss`). To reduce memory use, you can use vector quantization. +By default, the k-NN plugin supports indexing and querying vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors is expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization. ## Lucene byte vector -Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). +Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). -## Faiss scalar quantization fp16 +## Faiss scalar quantization + +Starting with version 2.13, the k-NN plugin supports vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 vector quantization can decrease the memory footprint by a factor of 2. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies. -Starting with k-NN plugin version 2.13, users can ingest `fp16` vectors with `faiss` engine where when user provides the 32 bit float vectors, the Faiss engine quantize the vector into FP16 using scalar quantization (users don’t need to do any quantization on their end), stores it and decodes it back to FP32 for distance computation during search operations. Using this feature, users can -reduce memory footprints by a factor of 2, significant reduction in search latencies (with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-faiss)), with a very minimal loss in recall(depends on distribution of vectors). +SQfp16 vector quantization generally introduces minimal recall loss. The extent of this reduction depends on the closeness of the original vector values. When the values are very similar (for example, `[10.000006, 10.000007, 10.000008]`), quantization can round them to the same value or closer values. In such cases, a slight decrease in recall accuracy may occur. -To use this feature, users needs to set `encoder` name as `sq` and to know the type of quantization in SQ, we are introducing a new optional field, `type` in the encoder parameters. The data indexed by users should be within the FP16 range of [-65504.0, 65504.0]. If the data lies out of this range then an exception is thrown and the request is rejected. +### Using Faiss scalar quantization -We also introduced another optional encoder parameter `clip` and if this is set to `true`(by default `false`) in the index mapping, then if the data lies out of FP16 range it will be clipped to the MIN(`-65504.0`) and MAX(`65504.0`) of FP16 range and ingested into the index without throwing any exception. But, clipping the values might cause a drop in recall. +To use Faiss scalar quantization, set the `method.parameters.encoder.name` to `sq` for the [k-NN vector field]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) when creating a k-NN index: -For Example - when `clip` is set to `true`, `65510.82` will be clipped and indexed as `65504.0` and `-65504.1` will be clipped and indexed as `-65504.0`. +```json +PUT /test-index +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100 + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 3, + "method": { + "name": "hnsw", + "engine": "faiss", + "space_type": "l2", + "parameters": { + "encoder": { + "name": "sq", + }, + "ef_construction": 256, + "m": 8 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about parameters within the `encoder` object, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters). -Ideally, `clip` parameter is recommended to be set as `true` only when most of the vector elements are within the fp16 range and very few elements lies outside of the range. +The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define handling out-of-range values, the preceding request specifies the `clip` parameter. By default, this parameter is `false` and any vectors containing out-of-range values are rejected. When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will indexed as a 16-bit vector `[65504.0, -65504.0]`. + +We recommend setting `clip` to `true` only if very few elements lie outside the supported range. Rounding the values might cause a drop in recall. {: .note} -* `type` - Set this as `fp16` if we want to quantize the indexed vectors into fp16 using Faiss SQFP16; Default value is `fp16`. -* `clip` - Set this as `true` if you want to skip the FP16 validation check and clip vector value to bring it into FP16 MIN or MAX range. If it is `false` and any vector element is out of range, then it rejects the request and throws an exception; Default value is `false`. +The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that contains out-of-range vector values (because the `clip` parameter is `false` by default): -This is an example of a method definition using Faiss SQfp16 with `clip` as `true` ```json -"method": { - "name":"hnsw", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "encoder":{ - "name":"sq", - "parameters":{ - "type": "fp16", - "clip": true +PUT /test-index +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100 + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 3, + "method": { + "name": "hnsw", + "engine": "faiss", + "space_type": "l2", + "parameters": { + "encoder": { + "name": "sq", + "parameters": { + "type": "fp16" + } + }, + "ef_construction": 256, + "m": 8 + } + } } } } } - ``` +{% include copy-curl.html %} + +During ingestion, make sure each dimension of the vector is in the supported range ([-65504.0, 65504.0]): -During ingestion, make sure each dimension of the vector is in the supported range [-65504.0, 65504.0] if `clip` is set as `false`: ```json PUT test-index/_doc/1 { "my_vector1": [-65504.0, 65503.845, 55.82] } ``` +{% include copy-curl.html %} + +During querying, there is no range limitation for the query vector: -During querying, there is no range limitation for query vector: ```json GET test-index/_search { @@ -75,10 +130,11 @@ GET test-index/_search } } ``` +{% include copy-curl.html %} -### Memory estimation +## Memory estimation -Ideally, Faiss SQfp16 requires 50% of the memory consumed by FP32 vectors. +In the best case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit vectors require. #### HNSW memory estimation @@ -86,7 +142,7 @@ The memory required for HNSW is estimated to be `1.1 * (2 * dimension + 8 * M)` As an example, assume you have a million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: -``` +```bash 1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB ``` @@ -96,8 +152,7 @@ The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_ve As an example, assume you have a million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows: -``` +```bash 1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB - ``` diff --git a/_search-plugins/knn/settings.md b/_search-plugins/knn/settings.md index 1f43654fbe..5f3078e490 100644 --- a/_search-plugins/knn/settings.md +++ b/_search-plugins/knn/settings.md @@ -25,3 +25,4 @@ Setting | Default | Description `knn.model.index.number_of_shards`| 1 | The number of shards to use for the model system index, the OpenSearch index that stores the models used for Approximate Nearest Neighbor (ANN) search. `knn.model.index.number_of_replicas`| 1 | The number of replica shards to use for the model system index. Generally, in a multi-node cluster, this should be at least 1 to increase stability. `knn.advanced.filtered_exact_search_threshold`| null | The threshold value for the filtered IDs that is used to switch to exact search during filtered ANN search. If the number of filtered IDs in a segment is less than this setting's value, exact search will be performed on the filtered IDs. +`knn.faiss.avx2.disabled` | False | Whether to disable the SIMD-based `libopensearchknn_faiss_avx2.so` library and load the default `libopensearchknn_faiss.so` library for the Faiss engine on machines with x64 architecture. For more information, see [SIMD optimization for the Faiss engine]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#simd-optimization-for-the-faiss-engine). From 51cafbe2170a2abff45865c0121935e6aacf199d Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 22 Mar 2024 12:54:33 -0400 Subject: [PATCH 06/19] Add sentence to choosing the right method Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-index.md | 2 ++ _search-plugins/knn/knn-vector-quantization.md | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index ab9dd91be0..8c45fbbc9f 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -347,6 +347,8 @@ If you want to use less memory and index faster than HNSW, while maintaining sim If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop. +If you want to reduce the memory requirements by a factor of 2 (with very minimal loss of search quality) or by a factor of 4 (with a significant drop in search quality), consider vector quantization. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). + ### Memory estimation In a typical OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 520d087e46..8126d10984 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -18,7 +18,7 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce ## Faiss scalar quantization -Starting with version 2.13, the k-NN plugin supports vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 vector quantization can decrease the memory footprint by a factor of 2. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies. +Starting with version 2.13, the k-NN plugin supports performing vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 vector quantization can decrease the memory footprint by a factor of 2. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies. SQfp16 vector quantization generally introduces minimal recall loss. The extent of this reduction depends on the closeness of the original vector values. When the values are very similar (for example, `[10.000006, 10.000007, 10.000008]`), quantization can round them to the same value or closer values. In such cases, a slight decrease in recall accuracy may occur. From 90c66bb34aedd0fb747bba4b93da0fccc046165c Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 22 Mar 2024 13:43:26 -0400 Subject: [PATCH 07/19] Update _search-plugins/knn/knn-index.md Co-authored-by: Naveen Tatikonda Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/knn/knn-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 8c45fbbc9f..b0970c6f83 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -48,7 +48,7 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce ## SIMD optimization for the Faiss engine -Starting with version 2.13, the k-NN plugin supports [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) processing if the underlying hardware supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD is supported by default on Linux machines only for the Faiss engine. SIMD architecture helps boost the overall performance, improving indexing throughput at indexing time and reducing latency at search time. +Starting with version 2.13, the k-NN plugin supports [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) processing if the underlying hardware supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD is supported by default on Linux machines only for the Faiss engine. SIMD architecture helps boost the overall performance by improving indexing throughput and reducing search latency. ### x64 architecture From 365a10c318742ba1d07e55c4b6f7d499815de964 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 22 Mar 2024 13:53:59 -0400 Subject: [PATCH 08/19] Tech review comments Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-index.md | 168 ++++++++---------- .../knn/knn-vector-quantization.md | 6 +- _search-plugins/knn/settings.md | 2 +- 3 files changed, 73 insertions(+), 103 deletions(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index b0970c6f83..992ae135e0 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -56,12 +56,12 @@ Starting with version 2.13, the k-NN plugin supports [Single Instruction Multipl For the x64 architecture, two different versions of the Faiss library are built and shipped with the artifact: -- `libopensearchknn_faiss.so`: The default Faiss library. +- `libopensearchknn_faiss.so`: The non-optimized Faiss library without SIMD instructions. - `libopensearchknn_faiss_avx2.so`: The Faiss library that contains AVX2 SIMD instructions. If your hardware supports AVX2, the k-NN plugin loads the `libopensearchknn_faiss_avx2.so` library at runtime. -To disable AVX2 and load the default Faiss library (`libopensearchknn_faiss.so`), specify the `knn.faiss.avx2.disabled` static setting as `true` in `opensearch.yml` (default is `false`). Note that to update a static setting, you must stop the cluster, change the setting, and restart the cluster. For more information, see [Static settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). +To disable AVX2 and load the non-optimized Faiss library (`libopensearchknn_faiss.so`), specify the `knn.faiss.avx2.disabled` static setting as `true` in `opensearch.yml` (default is `false`). Note that to update a static setting, you must stop the cluster, change the setting, and restart the cluster. For more information, see [Static settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). ### ARM64 architecture @@ -159,17 +159,13 @@ An index created in OpenSearch version 2.11 or earlier will still use the old `e {: .note} ```json -{ - "type": "knn_vector", - "dimension": 100, - "method": { - "name":"hnsw", - "engine":"lucene", - "space_type": "l2", - "parameters":{ - "m":2048, - "ef_construction": 245 - } +"method": { + "name":"hnsw", + "engine":"lucene", + "space_type": "l2", + "parameters":{ + "m":2048, + "ef_construction": 245 } } ``` @@ -182,20 +178,16 @@ several encoder types, but the plugin currently only supports `flat`, `pq`, and The following example method definition specifies the `hnsw` method and a `pq` encoder: ```json -{ - "type": "knn_vector", - "dimension": 100, - "method": { - "name":"hnsw", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "encoder":{ - "name":"pq", - "parameters":{ - "code_size": 8, - "m": 8 - } +"method": { + "name":"hnsw", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "encoder":{ + "name":"pq", + "parameters":{ + "code_size": 8, + "m": 8 } } } @@ -232,17 +224,13 @@ For more information and examples, see [Using Faiss scalar quantization]({{site. The following example uses the `ivf` method without specifying an encoder (by default, OpenSearch uses the `flat` encoder): ```json -{ - "type": "knn_vector", - "dimension": 100, - "method": { - "name":"ivf", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "nlist": 4, - "nprobes": 2 - } +"method": { + "name":"ivf", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "nlist": 4, + "nprobes": 2 } } ``` @@ -250,20 +238,16 @@ The following example uses the `ivf` method without specifying an encoder (by d The following example uses the `ivf` method with a `pq` encoder: ```json -{ - "type": "knn_vector", - "dimension": 100, - "method": { - "name":"ivf", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "encoder":{ - "name":"pq", - "parameters":{ - "code_size": 8, - "m": 8 - } +"method": { + "name":"ivf", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "encoder":{ + "name":"pq", + "parameters":{ + "code_size": 8, + "m": 8 } } } @@ -273,17 +257,13 @@ The following example uses the `ivf` method with a `pq` encoder: The following example uses the `hnsw` method without specifying an encoder (by default, OpenSearch uses the `flat` encoder): ```json -{ - "type": "knn_vector", - "dimension": 100, - "method": { - "name":"hnsw", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "ef_construction": 256, - "m": 8 - } +"method": { + "name":"hnsw", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "ef_construction": 256, + "m": 8 } } ``` @@ -291,24 +271,20 @@ The following example uses the `hnsw` method without specifying an encoder (by d The following example uses the `hnsw` method with an `sq` encoder of type `fp16` with `clip` enabled: ```json -{ - "type": "knn_vector", - "dimension": 100, - "method": { - "name":"hnsw", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "encoder": { - "name": "sq", - "parameters": { - "type": "fp16", - "clip": true - } - }, - "ef_construction": 256, - "m": 8 - } +"method": { + "name":"hnsw", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "encoder": { + "name": "sq", + "parameters": { + "type": "fp16", + "clip": true + } + }, + "ef_construction": 256, + "m": 8 } } ``` @@ -316,23 +292,19 @@ The following example uses the `hnsw` method with an `sq` encoder of type `fp16` The following example uses the `ivf` method with an `sq` encoder of type `fp16`: ```json -{ - "type": "knn_vector", - "dimension": 100, - "method": { - "name":"ivf", - "engine":"faiss", - "space_type": "l2", - "parameters":{ - "encoder": { - "name": "sq", - "parameters": { - "type": "fp16", - "clip": false - } - }, - "nprobes": 2 - } +"method": { + "name":"ivf", + "engine":"faiss", + "space_type": "l2", + "parameters":{ + "encoder": { + "name": "sq", + "parameters": { + "type": "fp16", + "clip": false + } + }, + "nprobes": 2 } } ``` diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 8126d10984..67d8a22850 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -18,9 +18,7 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce ## Faiss scalar quantization -Starting with version 2.13, the k-NN plugin supports performing vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 vector quantization can decrease the memory footprint by a factor of 2. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies. - -SQfp16 vector quantization generally introduces minimal recall loss. The extent of this reduction depends on the closeness of the original vector values. When the values are very similar (for example, `[10.000006, 10.000007, 10.000008]`), quantization can round them to the same value or closer values. In such cases, a slight decrease in recall accuracy may occur. +Starting with version 2.13, the k-NN plugin supports performing vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 vector quantization can decrease the memory footprint by a factor of 2, with minimal loss in recall when vector values are not very similar. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies and improve indexing throughput. ### Using Faiss scalar quantization @@ -148,7 +146,7 @@ As an example, assume you have a million vectors with a dimension of 256 and M o #### IVF memory estimation -The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_vectors) + (4 * nlist * d))` bytes. +The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_vectors) + (4 * nlist * d))` bytes/vector. As an example, assume you have a million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows: diff --git a/_search-plugins/knn/settings.md b/_search-plugins/knn/settings.md index 5f3078e490..f4ef057cfb 100644 --- a/_search-plugins/knn/settings.md +++ b/_search-plugins/knn/settings.md @@ -25,4 +25,4 @@ Setting | Default | Description `knn.model.index.number_of_shards`| 1 | The number of shards to use for the model system index, the OpenSearch index that stores the models used for Approximate Nearest Neighbor (ANN) search. `knn.model.index.number_of_replicas`| 1 | The number of replica shards to use for the model system index. Generally, in a multi-node cluster, this should be at least 1 to increase stability. `knn.advanced.filtered_exact_search_threshold`| null | The threshold value for the filtered IDs that is used to switch to exact search during filtered ANN search. If the number of filtered IDs in a segment is less than this setting's value, exact search will be performed on the filtered IDs. -`knn.faiss.avx2.disabled` | False | Whether to disable the SIMD-based `libopensearchknn_faiss_avx2.so` library and load the default `libopensearchknn_faiss.so` library for the Faiss engine on machines with x64 architecture. For more information, see [SIMD optimization for the Faiss engine]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#simd-optimization-for-the-faiss-engine). +`knn.faiss.avx2.disabled` | False | A static setting that specifies whether to disable the SIMD-based `libopensearchknn_faiss_avx2.so` library and load the non-optimized `libopensearchknn_faiss.so` library for the Faiss engine on machines with x64 architecture. For more information, see [SIMD optimization for the Faiss engine]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#simd-optimization-for-the-faiss-engine). From 1e57c91c2f3a1583e63a0845c7fd67f0f82f48e2 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 22 Mar 2024 14:10:10 -0400 Subject: [PATCH 09/19] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 67d8a22850..8b9dc82afd 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -18,7 +18,7 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce ## Faiss scalar quantization -Starting with version 2.13, the k-NN plugin supports performing vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 vector quantization can decrease the memory footprint by a factor of 2, with minimal loss in recall when vector values are not very similar. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies and improve indexing throughput. +Starting with version 2.13, the k-NN plugin supports performing vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 vector quantization can decrease the memory footprint by a factor of 2, with minimal loss in recall when vector values are not too closely distributed. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies and improve indexing throughput. ### Using Faiss scalar quantization From 533f5941ff7f375a6a0819f83deb424e2e05cbdf Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 22 Mar 2024 16:08:33 -0400 Subject: [PATCH 10/19] Add note about SIMD Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-index.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 992ae135e0..dbf91163a4 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -50,6 +50,9 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce Starting with version 2.13, the k-NN plugin supports [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) processing if the underlying hardware supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD is supported by default on Linux machines only for the Faiss engine. SIMD architecture helps boost the overall performance by improving indexing throughput and reducing search latency. +SIMD optimization is applicable only if the vector dimension is a multiple of 8. +{: .note} + ### x64 architecture From 318ab5bd0b0118ba1e2370eaf6539851098ddd17 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 22 Mar 2024 16:48:08 -0400 Subject: [PATCH 11/19] Reworded recall loss Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 8b9dc82afd..1703cee008 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -18,7 +18,7 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce ## Faiss scalar quantization -Starting with version 2.13, the k-NN plugin supports performing vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 vector quantization can decrease the memory footprint by a factor of 2, with minimal loss in recall when vector values are not too closely distributed. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies and improve indexing throughput. +Starting with version 2.13, the k-NN plugin supports performing vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 vector quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies and improve indexing throughput. ### Using Faiss scalar quantization From b98837f583f1d6c8ed139d275f729bb96d9a255a Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 22 Mar 2024 16:49:50 -0400 Subject: [PATCH 12/19] Reword according to tech review feedback Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 1703cee008..404899a83c 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -18,7 +18,7 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce ## Faiss scalar quantization -Starting with version 2.13, the k-NN plugin supports performing vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 vector quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies and improve indexing throughput. +Starting with version 2.13, the k-NN plugin supports performing vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies and improve indexing throughput. ### Using Faiss scalar quantization From b26511a520d9f1e0e3d9b9bbd14cad05056555d1 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 22 Mar 2024 17:43:19 -0400 Subject: [PATCH 13/19] Tech review comment Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 404899a83c..2eb068ccd5 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -18,7 +18,7 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce ## Faiss scalar quantization -Starting with version 2.13, the k-NN plugin supports performing vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies and improve indexing throughput. +Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. ### Using Faiss scalar quantization From e84d905c72d4a4eaf94944df1e466d599b5151ce Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Mon, 25 Mar 2024 16:01:12 -0400 Subject: [PATCH 14/19] Add warning about Windows Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-vector-quantization.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 2eb068ccd5..3106742e2f 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -20,6 +20,9 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. +SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop in performance, including decreased indexing throughput and increased search latencies. +{: .warning} + ### Using Faiss scalar quantization To use Faiss scalar quantization, set the `method.parameters.encoder.name` to `sq` for the [k-NN vector field]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) when creating a k-NN index: From 4d62a9bab899b7e49bd72ab49681ca482a5da4fa Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Thu, 28 Mar 2024 17:05:49 -0400 Subject: [PATCH 15/19] Tech review comments Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index dbf91163a4..8f697cc69d 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -175,8 +175,8 @@ An index created in OpenSearch version 2.11 or earlier will still use the old `e ### Supported Faiss encoders -You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. Faiss has -several encoder types, but the plugin currently only supports `flat`, `pq`, and `sq` encoding. +You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. The k-NN plugin currently supports the +`flat`, `pq`, and `sq` encoders implemented in the Faiss library. The following example method definition specifies the `hnsw` method and a `pq` encoder: @@ -322,7 +322,7 @@ If you want to use less memory and index faster than HNSW, while maintaining sim If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop. -If you want to reduce the memory requirements by a factor of 2 (with very minimal loss of search quality) or by a factor of 4 (with a significant drop in search quality), consider vector quantization. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). +You can reduce the memory footprint by factor of 2 by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-scalar-quantization) with a minimal loss in search quality. If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) in order to reduce memory footprint by factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). ### Memory estimation From 6a6d38e3a080316dd53bdff4d26ad2c4a89d126a Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 29 Mar 2024 10:54:37 -0400 Subject: [PATCH 16/19] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/knn/knn-index.md | 16 ++++++++-------- _search-plugins/knn/knn-vector-quantization.md | 18 +++++++++--------- 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 8f697cc69d..7c21bdbeb1 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -48,7 +48,7 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce ## SIMD optimization for the Faiss engine -Starting with version 2.13, the k-NN plugin supports [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) processing if the underlying hardware supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD is supported by default on Linux machines only for the Faiss engine. SIMD architecture helps boost the overall performance by improving indexing throughput and reducing search latency. +Starting with version 2.13, the k-NN plugin supports [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) processing if the underlying hardware supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD is supported by default on Linux machines only for the Faiss engine. SIMD architecture helps boost overall performance by improving indexing throughput and reducing search latency. SIMD optimization is applicable only if the vector dimension is a multiple of 8. {: .note} @@ -176,7 +176,7 @@ An index created in OpenSearch version 2.11 or earlier will still use the old `e ### Supported Faiss encoders You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. The k-NN plugin currently supports the -`flat`, `pq`, and `sq` encoders implemented in the Faiss library. +`flat`, `pq`, and `sq` encoders in the Faiss library. The following example method definition specifies the `hnsw` method and a `pq` encoder: @@ -202,15 +202,15 @@ The `hnsw` method supports the `pq` encoder for OpenSearch versions 2.10 and lat Encoder name | Requires training | Description :--- | :--- | :--- -`flat` (Default) | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint. +`flat` (Default) | false | Encode vectors as floating-point arrays. This encoding does not reduce memory footprint. `pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388). -`sq` | false | Stands for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization). +`sq` | false | An abbreviation for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization). #### PQ parameters Parameter name | Required | Default | Updatable | Description :--- | :--- | :--- | :--- | :--- -`m` | false | 1 | false | Determines the number of subvectors into which to break the vector. Subvectors are encoded independently of each other. This dimension of the vector must be divisible by `m`. Maximum value is 1,024. +`m` | false | 1 | false | Determines the number of subvectors into which to break the vector. Subvectors are encoded independently of each other. This vector dimension must be divisible by `m`. Maximum value is 1,024. `code_size` | false | 8 | false | Determines the number of bits into which to encode a subvector. Maximum value is 8. For IVF, this value must be less than or equal to 8. For HNSW, this value can only be 8. #### SQ parameters @@ -218,7 +218,7 @@ Parameter name | Required | Default | Updatable | Description Parameter name | Required | Default | Updatable | Description :--- | :--- | :-- | :--- | :--- `type` | false | `fp16` | false | The type of scalar quantization to be used to encode 32-bit float vectors into the corresponding type. As of OpenSearch 2.13, only the `fp16` encoder type is supported. For the `fp16` encoder, vector values must be in the [-65504.0, 65504.0] range. -`clip` | false | `false` | false | If `true`, any vector values that are out of the supported range for the specified vector type are rounded so they are in the range. If `false`, the request is rejected if any vector values are out of the supported range. Setting `clip` to `true` may decrease recall. +`clip` | false | `false` | false | If `true`, then any vector values outside of the supported range for the specified vector type are rounded so that they are inside the range. If `false`, then the request is rejected if any vector values are outside of the supported range. Setting `clip` to `true` may decrease recall. For more information and examples, see [Using Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#using-faiss-scalar-quantization). @@ -322,7 +322,7 @@ If you want to use less memory and index faster than HNSW, while maintaining sim If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop. -You can reduce the memory footprint by factor of 2 by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-scalar-quantization) with a minimal loss in search quality. If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) in order to reduce memory footprint by factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). +You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) in order to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). ### Memory estimation @@ -333,7 +333,7 @@ the `circuit_breaker_limit` cluster setting. By default, the limit is set at 50% Having a replica doubles the total number of vectors. {: .note } -For memory estimation when using vector quantization, see the [vector quantization documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#memory-estimation). +For information about using memory estimation with vector quantization, see the [vector quantization documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#memory-estimation). {: .note } #### HNSW memory estimation diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 3106742e2f..3373f104c2 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -10,7 +10,7 @@ has_math: true # k-NN vector quantization -By default, the k-NN plugin supports indexing and querying vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors is expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization. +By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization. ## Lucene byte vector @@ -18,14 +18,14 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce ## Faiss scalar quantization -Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. +Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop in performance, including decreased indexing throughput and increased search latencies. {: .warning} ### Using Faiss scalar quantization -To use Faiss scalar quantization, set the `method.parameters.encoder.name` to `sq` for the [k-NN vector field]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) when creating a k-NN index: +To use Faiss scalar quantization, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a k-NN index: ```json PUT /test-index @@ -60,11 +60,11 @@ PUT /test-index ``` {% include copy-curl.html %} -Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about parameters within the `encoder` object, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters). +Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about `encoder` object parameters, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters). -The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define handling out-of-range values, the preceding request specifies the `clip` parameter. By default, this parameter is `false` and any vectors containing out-of-range values are rejected. When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will indexed as a 16-bit vector `[65504.0, -65504.0]`. +The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, the preceding request specifies the `clip` parameter. By default, this parameter is `false`, and any vectors containing out-of-range values are rejected. When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will be indexed as a 16-bit vector `[65504.0, -65504.0]`. -We recommend setting `clip` to `true` only if very few elements lie outside the supported range. Rounding the values might cause a drop in recall. +We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values may cause a drop in recall. {: .note} The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that contains out-of-range vector values (because the `clip` parameter is `false` by default): @@ -135,13 +135,13 @@ GET test-index/_search ## Memory estimation -In the best case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit vectors require. +In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit vectors require. #### HNSW memory estimation The memory required for HNSW is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector. -As an example, assume you have a million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: +As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: ```bash 1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB @@ -151,7 +151,7 @@ As an example, assume you have a million vectors with a dimension of 256 and M o The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_vectors) + (4 * nlist * d))` bytes/vector. -As an example, assume you have a million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows: +As an example, assume that you have 1 million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows: ```bash 1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB From 341daadce0b6f0cfc2c502dc483b55a669bf7e1e Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 29 Mar 2024 10:55:15 -0400 Subject: [PATCH 17/19] Define IVF Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 7c21bdbeb1..bccbc56d16 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -108,7 +108,7 @@ An index created in OpenSearch version 2.11 or earlier will still use the old `e Method name | Requires training | Supported spaces | Description :--- | :--- | :--- | :--- `hnsw` | false | l2, innerproduct | Hierarchical proximity graph approach to Approximate k-NN search. -`ivf` | true | l2, innerproduct | Bucketing approach where vectors are assigned different buckets based on clustering and, during search, only a subset of the buckets is searched. +`ivf` | true | l2, innerproduct | Stands for _inverted file index_. Bucketing approach where vectors are assigned different buckets based on clustering and, during search, only a subset of the buckets is searched. For hnsw, "innerproduct" is not available when PQ is used. {: .note} From 74c4c75b28672b1343af238cd7fa6ba34e1ccba3 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 29 Mar 2024 11:39:19 -0400 Subject: [PATCH 18/19] Update _search-plugins/knn/knn-index.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/knn/knn-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index bccbc56d16..f6005d3f96 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -218,7 +218,7 @@ Parameter name | Required | Default | Updatable | Description Parameter name | Required | Default | Updatable | Description :--- | :--- | :-- | :--- | :--- `type` | false | `fp16` | false | The type of scalar quantization to be used to encode 32-bit float vectors into the corresponding type. As of OpenSearch 2.13, only the `fp16` encoder type is supported. For the `fp16` encoder, vector values must be in the [-65504.0, 65504.0] range. -`clip` | false | `false` | false | If `true`, then any vector values outside of the supported range for the specified vector type are rounded so that they are inside the range. If `false`, then the request is rejected if any vector values are outside of the supported range. Setting `clip` to `true` may decrease recall. +`clip` | false | `false` | false | If `true`, then any vector values outside of the supported range for the specified vector type are rounded so that they are in the range. If `false`, then the request is rejected if any vector values are outside of the supported range. Setting `clip` to `true` may decrease recall. For more information and examples, see [Using Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#using-faiss-scalar-quantization). From 9a6c4e2d51c98a28a923533e2dead540c7bd8511 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 29 Mar 2024 11:41:05 -0400 Subject: [PATCH 19/19] Update _search-plugins/knn/knn-index.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/knn/knn-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index f6005d3f96..d63fe831cb 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -204,7 +204,7 @@ Encoder name | Requires training | Description :--- | :--- | :--- `flat` (Default) | false | Encode vectors as floating-point arrays. This encoding does not reduce memory footprint. `pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388). -`sq` | false | An abbreviation for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization). +`sq` | false | An abbreviation for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM64 architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization). #### PQ parameters