Merge branch 'main' of github.com:elasticsearch/elasticsearch into un…

…mute_mljobit_tests
edsavage · Oct 15, 2024 · d5e1a50 · d5e1a50
2 parents 7d6897c + 6c752ab
commit d5e1a50
Show file tree

Hide file tree

Showing 72 changed files with 2,112 additions and 1,136 deletions.
diff --git a/docs/changelog/114439.yaml b/docs/changelog/114439.yaml
@@ -0,0 +1,5 @@
+pr: 114439
+summary: Adding new bbq index types behind a feature flag
+area: Vector Search
+type: feature
+issues: []
diff --git a/docs/changelog/114636.yaml b/docs/changelog/114636.yaml
@@ -0,0 +1,5 @@
+pr: 114636
+summary: Dynamically get of num allocations
+area: Machine Learning
+type: enhancement
+issues: []
diff --git a/docs/changelog/114683.yaml b/docs/changelog/114683.yaml
@@ -0,0 +1,5 @@
+pr: 114683
+summary: Default inference endpoint for the multilingual-e5-small model
+area: Machine Learning
+type: enhancement
+issues: []
diff --git a/docs/changelog/114732.yaml b/docs/changelog/114732.yaml
@@ -0,0 +1,5 @@
+pr: 114732
+summary: Stream Bedrock Completion
+area: Machine Learning
+type: enhancement
+issues: []
diff --git a/docs/reference/mapping/types/dense-vector.asciidoc b/docs/reference/mapping/types/dense-vector.asciidoc
@@ -115,22 +115,27 @@ that sacrifices result accuracy for improved speed.
 ==== Automatically quantize vectors for kNN search
 
 The `dense_vector` type supports quantization to reduce the memory footprint required when <<approximate-knn, searching>> `float` vectors.
-The two following quantization strategies are supported:
+The three following quantization strategies are supported:
 
 +
 --
-`int8` - Quantizes each dimension of the vector to 1-byte integers. This can reduce the memory footprint by 75% at the cost of some accuracy.
-`int4` - Quantizes each dimension of the vector to half-byte integers. This can reduce the memory footprint by 87% at the cost of some accuracy.
+`int8` - Quantizes each dimension of the vector to 1-byte integers. This reduces the memory footprint by 75% (or 4x) at the cost of some accuracy.
+`int4` - Quantizes each dimension of the vector to half-byte integers. This reduces the memory footprint by 87% (or 8x) at the cost of accuracy.
+`bbq` - experimental:[] Better binary quantization which reduces each dimension to a single bit precision. This reduces the memory footprint by 96% (or 32x) at a larger cost of accuracy. Generally, oversampling during query time and reranking can help mitigate the accuracy loss.
 --
 
-To use a quantized index, you can set your index type to `int8_hnsw` or `int4_hnsw`. When indexing `float` vectors, the current default
+When using a quantized format, you may want to oversample and rescore the results to improve accuracy. See <<dense-vector-knn-search-reranking, oversampling and rescoring>> for more information.
+
+To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default
 index type is `int8_hnsw`.
 
 NOTE: Quantization will continue to keep the raw float vector values on disk for reranking, reindexing, and quantization improvements over the lifetime of the data.
-This means disk usage will increase by ~25% for `int8` and ~12.5% for `int4` due to the overhead of storing the quantized and raw vectors.
+This means disk usage will increase by ~25% for `int8`, ~12.5% for `int4`, and ~3.1% for `bbq` due to the overhead of storing the quantized and raw vectors.
 
 NOTE: `int4` quantization requires an even number of vector dimensions.
 
+NOTE: experimental:[] `bbq` quantization only supports vector dimensions that are greater than 64.
+
 Here is an example of how to create a byte-quantized index:
 
 [source,console]
@@ -173,6 +178,27 @@ PUT my-byte-quantized-index
 }
 --------------------------------------------------
 
+experimental:[] Here is an example of how to create a binary quantized index:
+
+[source,console]
+--------------------------------------------------
+PUT my-byte-quantized-index
+{
+  "mappings": {
+    "properties": {
+      "my_vector": {
+        "type": "dense_vector",
+        "dims": 64,
+        "index": true,
+        "index_options": {
+          "type": "bbq_hnsw"
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
+
 [role="child_attributes"]
 [[dense-vector-params]]
 ==== Parameters for dense vector fields
@@ -301,11 +327,16 @@ by 4x at the cost of some accuracy. See <<dense-vector-quantization, Automatical
 * `int4_hnsw` - This utilizes the https://arxiv.org/abs/1603.09320[HNSW algorithm] in addition to automatically scalar
 quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint
 by 8x at the cost of some accuracy. See <<dense-vector-quantization, Automatically quantize vectors for kNN search>>.
+* experimental:[] `bbq_hnsw` - This utilizes the https://arxiv.org/abs/1603.09320[HNSW algorithm] in addition to automatically binary
+quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint
+by 32x at the cost of accuracy. See <<dense-vector-quantization, Automatically quantize vectors for kNN search>>.
 * `flat` - This utilizes a brute-force search algorithm for exact kNN search. This supports all `element_type` values.
 * `int8_flat` - This utilizes a brute-force search algorithm in addition to automatically scalar quantization. Only supports
 `element_type` of `float`.
 * `int4_flat` - This utilizes a brute-force search algorithm in addition to automatically half-byte scalar quantization. Only supports
 `element_type` of `float`.
+* experimental:[] `bbq_flat` - This utilizes a brute-force search algorithm in addition to automatically binary quantization. Only supports
+`element_type` of `float`.
 --
 `m`:::
 (Optional, integer)

diff --git a/docs/reference/rest-api/usage.asciidoc b/docs/reference/rest-api/usage.asciidoc
@@ -210,7 +210,12 @@ GET /_xpack/usage
         "service": "elasticsearch",
         "task_type": "SPARSE_EMBEDDING",
         "count": 1
-      }
+      },
+      {
+        "service": "elasticsearch",
+        "task_type": "TEXT_EMBEDDING",
+        "count": 1
+      },
     ]
   },
   "logstash" : {

diff --git a/docs/reference/search/search-your-data/knn-search.asciidoc b/docs/reference/search/search-your-data/knn-search.asciidoc
@@ -1149,3 +1149,95 @@ POST product-index/_search
 ----
 //TEST[continued]
 
+[discrete]
+[[dense-vector-knn-search-reranking]]
+==== Oversampling and rescoring for quantized vectors
+
+All forms of quantization will result in some accuracy loss and as the quantization level increases the accuracy loss will also increase.
+Generally, we have found that:
+- `int8` requires minimal if any rescoring
+- `int4` requires some rescoring for higher accuracy and larger recall scenarios. Generally, oversampling by 1.5x-2x recovers most of the accuracy loss.
+- `bbq` requires rescoring except on exceptionally large indices or models specifically designed for quantization. We have found that between 3x-5x oversampling is generally sufficient. But for fewer dimensions or vectors that do not quantize well, higher oversampling may be required.
+
+There are two main ways to oversample and rescore. The first is to utilize the <<rescore, rescore section>> in the `_search` request.
+
+Here is an example using the top level `knn` search with oversampling and using `rescore` to rerank the results:
+
+[source,console]
+--------------------------------------------------
+POST /my-index/_search
+{
+  "size": 10, <1>
+  "knn": {
+    "query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
+    "field": "my_int4_vector",
+    "k": 20, <2>
+    "num_candidates": 50
+  },
+  "rescore": {
+    "window_size": 20, <3>
+    "query": {
+      "rescore_query": {
+        "script_score": {
+          "query": {
+            "match_all": {}
+          },
+          "script": {
+            "source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)", <4>
+            "params": {
+              "queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
+            }
+          }
+        }
+      },
+      "query_weight": 0, <5>
+      "rescore_query_weight": 1 <6>
+    }
+  }
+}
+--------------------------------------------------
+// TEST[skip: setup not provided]
+<1> The number of results to return, note its only 10 and we will oversample by 2x, gathering 20 nearest neighbors.
+<2> The number of results to return from the KNN search. This will do an approximate KNN search with 50 candidates
+per HNSW graph and use the quantized vectors, returning the 20 most similar vectors
+according to the quantized score. Additionally, since this is the top-level `knn` object, the global top 20 results
+will from all shards will be gathered before rescoring. Combining with `rescore`, this is oversampling by `2x`, meaning
+gathering 20 nearest neighbors according to quantized scoring and rescoring with higher fidelity float vectors.
+<3> The number of results to rescore, if you want to rescore all results, set this to the same value as `k`
+<4> The script to rescore the results. Script score will interact directly with the originally provided float32 vector.
+<5> The weight of the original query, here we simply throw away the original score
+<6> The weight of the rescore query, here we only use the rescore query
+
+The second way is to score per shard with the <<query-dsl-knn-query, knn query>> and <<query-dsl-script-score-query, script_score query >>. Generally, this means that there will be more rescoring per shard, but this
+can increase overall recall at the cost of compute.
+
+[source,console]
+--------------------------------------------------
+POST /my-index/_search
+{
+  "size": 10, <1>
+  "query": {
+    "script_score": {
+      "query": {
+        "knn": { <2>
+          "query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
+          "field": "my_int4_vector",
+          "num_candidates": 20 <3>
+        }
+      },
+      "script": {
+        "source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)", <4>
+        "params": {
+          "queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
+// TEST[skip: setup not provided]
+<1> The number of results to return
+<2> The `knn` query to perform the initial search, this is executed per-shard
+<3> The number of candidates to use for the initial approximate `knn` search. This will search using the quantized vectors
+and return the top 20 candidates per shard to then be scored
+<4> The script to score the results. Script score will interact directly with the originally provided float32 vector.
diff --git a/modules/ingest-geoip/src/main/java/module-info.java b/modules/ingest-geoip/src/main/java/module-info.java
@@ -18,4 +18,6 @@
 
     exports org.elasticsearch.ingest.geoip.direct to org.elasticsearch.server;
     exports org.elasticsearch.ingest.geoip.stats to org.elasticsearch.server;
+
+    exports org.elasticsearch.ingest.geoip to com.maxmind.db;
 }
diff --git a/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/ConfigDatabases.java b/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/ConfigDatabases.java
@@ -73,14 +73,14 @@ void updateDatabase(Path file, boolean update) {
         String databaseFileName = file.getFileName().toString();
         try {
             if (update) {
-                logger.info("database file changed [{}], reload database...", file);
+                logger.info("database file changed [{}], reloading database...", file);
                 DatabaseReaderLazyLoader loader = new DatabaseReaderLazyLoader(cache, file, null);
                 DatabaseReaderLazyLoader existing = configDatabases.put(databaseFileName, loader);
                 if (existing != null) {
                     existing.shutdown();
                 }
             } else {
-                logger.info("database file removed [{}], close database...", file);
+                logger.info("database file removed [{}], closing database...", file);
                 DatabaseReaderLazyLoader existing = configDatabases.remove(databaseFileName);
                 assert existing != null;
                 existing.shutdown();

diff --git a/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java b/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java
@@ -196,13 +196,19 @@ public IpDatabase get() throws IOException {
             }
 
             if (Assertions.ENABLED) {
-                // Only check whether the suffix has changed and not the entire database type.
-                // To sanity check whether a city db isn't overwriting with a country or asn db.
-                // For example overwriting a geoip lite city db with geoip city db is a valid change, but the db type is slightly different,
-                // by checking just the suffix this assertion doesn't fail.
-                String expectedSuffix = databaseType.substring(databaseType.lastIndexOf('-'));
-                assert loader.getDatabaseType().endsWith(expectedSuffix)
-                    : "database type [" + loader.getDatabaseType() + "] doesn't match with expected suffix [" + expectedSuffix + "]";
+                // Note that the expected suffix might be null for providers that aren't amenable to using dashes as separator for
+                // determining the database type.
+                int last = databaseType.lastIndexOf('-');
+                final String expectedSuffix = last == -1 ? null : databaseType.substring(last);
+
+                // If the entire database type matches, then that's a match. Otherwise, if there's a suffix to compare on, then
+                // check whether the suffix has changed (not the entire database type).
+                // This is to sanity check, for example, that a city db isn't overwritten with a country or asn db.
+                // But there are permissible overwrites that make sense, for example overwriting a geolite city db with a geoip city db
+                // is a valid change, but the db type is slightly different -- by checking just the suffix this assertion won't fail.
+                final String loaderType = loader.getDatabaseType();
+                assert loaderType.equals(databaseType) || expectedSuffix == null || loaderType.endsWith(expectedSuffix)
+                    : "database type [" + loaderType + "] doesn't match with expected suffix [" + expectedSuffix + "]";
             }
             return loader;
         }

diff --git a/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/IpDataLookupFactories.java b/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/IpDataLookupFactories.java
@@ -13,9 +13,16 @@
 import org.elasticsearch.core.Nullable;
 
 import java.util.List;
+import java.util.Locale;
 import java.util.Set;
 import java.util.function.Function;
 
+import static org.elasticsearch.ingest.geoip.IpinfoIpDataLookups.IPINFO_PREFIX;
+import static org.elasticsearch.ingest.geoip.IpinfoIpDataLookups.getIpinfoDatabase;
+import static org.elasticsearch.ingest.geoip.IpinfoIpDataLookups.getIpinfoLookup;
+import static org.elasticsearch.ingest.geoip.MaxmindIpDataLookups.getMaxmindDatabase;
+import static org.elasticsearch.ingest.geoip.MaxmindIpDataLookups.getMaxmindLookup;
+
 final class IpDataLookupFactories {
 
     private IpDataLookupFactories() {
@@ -26,78 +33,44 @@ interface IpDataLookupFactory {
         IpDataLookup create(List<String> properties);
     }
 
-    private static final String CITY_DB_SUFFIX = "-City";
-    private static final String COUNTRY_DB_SUFFIX = "-Country";
-    private static final String ASN_DB_SUFFIX = "-ASN";
-    private static final String ANONYMOUS_IP_DB_SUFFIX = "-Anonymous-IP";
-    private static final String CONNECTION_TYPE_DB_SUFFIX = "-Connection-Type";
-    private static final String DOMAIN_DB_SUFFIX = "-Domain";
-    private static final String ENTERPRISE_DB_SUFFIX = "-Enterprise";
-    private static final String ISP_DB_SUFFIX = "-ISP";
-
-    @Nullable
-    private static Database getMaxmindDatabase(final String databaseType) {
-        if (databaseType.endsWith(CITY_DB_SUFFIX)) {
-            return Database.City;
-        } else if (databaseType.endsWith(COUNTRY_DB_SUFFIX)) {
-            return Database.Country;
-        } else if (databaseType.endsWith(ASN_DB_SUFFIX)) {
-            return Database.Asn;
-        } else if (databaseType.endsWith(ANONYMOUS_IP_DB_SUFFIX)) {
-            return Database.AnonymousIp;
-        } else if (databaseType.endsWith(CONNECTION_TYPE_DB_SUFFIX)) {
-            return Database.ConnectionType;
-        } else if (databaseType.endsWith(DOMAIN_DB_SUFFIX)) {
-            return Database.Domain;
-        } else if (databaseType.endsWith(ENTERPRISE_DB_SUFFIX)) {
-            return Database.Enterprise;
-        } else if (databaseType.endsWith(ISP_DB_SUFFIX)) {
-            return Database.Isp;
-        } else {
-            return null; // no match was found
-        }
-    }
-
     /**
      * Parses the passed-in databaseType and return the Database instance that is
      * associated with that databaseType.
      *
      * @param databaseType the database type String from the metadata of the database file
-     * @return the Database instance that is associated with the databaseType
+     * @return the Database instance that is associated with the databaseType (or null)
      */
     @Nullable
     static Database getDatabase(final String databaseType) {
         Database database = null;
 
         if (Strings.hasText(databaseType)) {
-            database = getMaxmindDatabase(databaseType);
+            final String databaseTypeLowerCase = databaseType.toLowerCase(Locale.ROOT);
+            if (databaseTypeLowerCase.startsWith(IPINFO_PREFIX)) {
+                database = getIpinfoDatabase(databaseTypeLowerCase); // all lower case!
+            } else {
+                // for historical reasons, fall back to assuming maxmind-like type parsing
+                database = getMaxmindDatabase(databaseType);
+            }
         }
 
         return database;
     }
 
-    @Nullable
-    static Function<Set<Database.Property>, IpDataLookup> getMaxmindLookup(final Database database) {
-        return switch (database) {
-            case City -> MaxmindIpDataLookups.City::new;
-            case Country -> MaxmindIpDataLookups.Country::new;
-            case Asn -> MaxmindIpDataLookups.Asn::new;
-            case AnonymousIp -> MaxmindIpDataLookups.AnonymousIp::new;
-            case ConnectionType -> MaxmindIpDataLookups.ConnectionType::new;
-            case Domain -> MaxmindIpDataLookups.Domain::new;
-            case Enterprise -> MaxmindIpDataLookups.Enterprise::new;
-            case Isp -> MaxmindIpDataLookups.Isp::new;
-            default -> null;
-        };
-    }
-
     static IpDataLookupFactory get(final String databaseType, final String databaseFile) {
         final Database database = getDatabase(databaseType);
         if (database == null) {
             throw new IllegalArgumentException("Unsupported database type [" + databaseType + "] for file [" + databaseFile + "]");
         }
 
-        final Function<Set<Database.Property>, IpDataLookup> factoryMethod = getMaxmindLookup(database);
+        final Function<Set<Database.Property>, IpDataLookup> factoryMethod;
+        final String databaseTypeLowerCase = databaseType.toLowerCase(Locale.ROOT);
+        if (databaseTypeLowerCase.startsWith(IPINFO_PREFIX)) {
+            factoryMethod = getIpinfoLookup(database);
+        } else {
+            // for historical reasons, fall back to assuming maxmind-like types
+            factoryMethod = getMaxmindLookup(database);
+        }
 
         if (factoryMethod == null) {
             throw new IllegalArgumentException("Unsupported database type [" + databaseType + "] for file [" + databaseFile + "]");