Update OpenSearch k-NN Faiss Documentation (#5226) (#5319)

* Update OpenSearch k-NN Faiss Documentation * Address Review Comments * Apply suggestions from code review * Additional rewrites and format * Update _search-plugins/knn/knn-index.md * Update _search-plugins/knn/knn-index.md * Apply suggestions from code review --------- (cherry picked from commit 25233c2) Signed-off-by: Naveen Tatikonda <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: kolchfa-aws <[email protected]> Co-authored-by: Fanit Kolchina <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
opensearch-project · Oct 19, 2023 · e2a52ef · e2a52ef
1 parent fbb8ace
commit e2a52ef
Showing 1 changed file with 57 additions and 3 deletions.
diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md
@@ -72,7 +72,7 @@ Parameter name | Required | Default | Updatable | Description
 `nprobes` | false | 1 | false | Number of buckets to search during query. Higher values lead to more accurate but slower searches.
 `encoder` | false | flat | false | Encoder definition for encoding vectors. Encoders can reduce the memory footprint of your index, at the expense of search accuracy.
 
-For more information about setting these parameters, please refer to [*faiss*'s documentation](https://github.com/facebookresearch/faiss/wiki/Faiss-indexes).
+For more information about setting these parameters, refer to the [Faiss documentation](https://github.com/facebookresearch/faiss/wiki/Faiss-indexes).
 
 #### IVF training requirements
 
@@ -118,12 +118,13 @@ Lucene HNSW implementation ignores `ef_search`  and dynamically sets it to the v
 You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. faiss has
 several encoder types, but the plugin currently only supports *flat* and *pq* encoding.
 
-An example method definition that specifies an encoder may look something like this:
+The following example method definition specifies the `hnsw` method and a `pq` encoder:
 
 ```json
 "method": {
   "name":"hnsw",
   "engine":"faiss",
+  "space_type": "l2",
   "parameters":{
     "encoder":{
       "name":"pq",
@@ -136,10 +137,63 @@ An example method definition that specifies an encoder may look something like t
 }
 ```
 
+The `hnsw` method supports the `pq` encoder for OpenSearch versions 2.10 and later. The `code_size` parameter of a `pq` encoder with the `hnsw` method must be **8**.
+{: .important}
+
 Encoder name | Requires training | Description
 :--- | :--- | :---
 `flat` | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint.
-`pq` | true | Short for product quantization, it is a lossy compression technique that encodes a vector into a fixed size of bytes using clustering, with the goal of minimizing the drop in k-NN search accuracy. From a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more details on product quantization, here is a [great blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388)!
+`pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388).
+
+#### Examples
+
+
+The following example uses the `ivf` method  without specifying an encoder (by default, OpenSearch uses the `flat` encoder):
+
+```json
+"method": {
+  "name":"ivf",
+  "engine":"faiss",
+  "space_type": "l2",
+  "parameters":{
+    "nlist": 4,
+    "nprobes": 2
+  }
+}
+```
+
+The following example uses the `ivf` method with a `pq` encoder:
+
+```json
+"method": {
+  "name":"ivf",
+  "engine":"faiss",
+  "space_type": "l2",
+  "parameters":{
+    "encoder":{
+      "name":"pq",
+      "parameters":{
+        "code_size": 8,
+        "m": 8
+      }
+    }
+  }
+}
+```
+
+The following example uses the `hnsw` method without specifying an encoder (by default, OpenSearch uses the `flat` encoder):
+
+```json
+"method": {
+  "name":"hnsw",
+  "engine":"faiss",
+  "space_type": "l2",
+  "parameters":{
+    "ef_construction": 256,
+    "m": 8
+  }
+}
+```
 
 #### PQ parameters