Skip to content

Commit

Permalink
Merge branch 'main' of github.com:elasticsearch/elasticsearch into un…
Browse files Browse the repository at this point in the history
…mute_mljobit_tests
  • Loading branch information
edsavage committed Oct 15, 2024
2 parents 7d6897c + 6c752ab commit d5e1a50
Show file tree
Hide file tree
Showing 72 changed files with 2,112 additions and 1,136 deletions.
5 changes: 5 additions & 0 deletions docs/changelog/114439.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 114439
summary: Adding new bbq index types behind a feature flag
area: Vector Search
type: feature
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/114636.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 114636
summary: Dynamically get of num allocations
area: Machine Learning
type: enhancement
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/114683.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 114683
summary: Default inference endpoint for the multilingual-e5-small model
area: Machine Learning
type: enhancement
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/114732.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 114732
summary: Stream Bedrock Completion
area: Machine Learning
type: enhancement
issues: []
41 changes: 36 additions & 5 deletions docs/reference/mapping/types/dense-vector.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -115,22 +115,27 @@ that sacrifices result accuracy for improved speed.
==== Automatically quantize vectors for kNN search

The `dense_vector` type supports quantization to reduce the memory footprint required when <<approximate-knn, searching>> `float` vectors.
The two following quantization strategies are supported:
The three following quantization strategies are supported:

+
--
`int8` - Quantizes each dimension of the vector to 1-byte integers. This can reduce the memory footprint by 75% at the cost of some accuracy.
`int4` - Quantizes each dimension of the vector to half-byte integers. This can reduce the memory footprint by 87% at the cost of some accuracy.
`int8` - Quantizes each dimension of the vector to 1-byte integers. This reduces the memory footprint by 75% (or 4x) at the cost of some accuracy.
`int4` - Quantizes each dimension of the vector to half-byte integers. This reduces the memory footprint by 87% (or 8x) at the cost of accuracy.
`bbq` - experimental:[] Better binary quantization which reduces each dimension to a single bit precision. This reduces the memory footprint by 96% (or 32x) at a larger cost of accuracy. Generally, oversampling during query time and reranking can help mitigate the accuracy loss.
--

To use a quantized index, you can set your index type to `int8_hnsw` or `int4_hnsw`. When indexing `float` vectors, the current default
When using a quantized format, you may want to oversample and rescore the results to improve accuracy. See <<dense-vector-knn-search-reranking, oversampling and rescoring>> for more information.

To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default
index type is `int8_hnsw`.

NOTE: Quantization will continue to keep the raw float vector values on disk for reranking, reindexing, and quantization improvements over the lifetime of the data.
This means disk usage will increase by ~25% for `int8` and ~12.5% for `int4` due to the overhead of storing the quantized and raw vectors.
This means disk usage will increase by ~25% for `int8`, ~12.5% for `int4`, and ~3.1% for `bbq` due to the overhead of storing the quantized and raw vectors.

NOTE: `int4` quantization requires an even number of vector dimensions.

NOTE: experimental:[] `bbq` quantization only supports vector dimensions that are greater than 64.

Here is an example of how to create a byte-quantized index:

[source,console]
Expand Down Expand Up @@ -173,6 +178,27 @@ PUT my-byte-quantized-index
}
--------------------------------------------------

experimental:[] Here is an example of how to create a binary quantized index:

[source,console]
--------------------------------------------------
PUT my-byte-quantized-index
{
"mappings": {
"properties": {
"my_vector": {
"type": "dense_vector",
"dims": 64,
"index": true,
"index_options": {
"type": "bbq_hnsw"
}
}
}
}
}
--------------------------------------------------

[role="child_attributes"]
[[dense-vector-params]]
==== Parameters for dense vector fields
Expand Down Expand Up @@ -301,11 +327,16 @@ by 4x at the cost of some accuracy. See <<dense-vector-quantization, Automatical
* `int4_hnsw` - This utilizes the https://arxiv.org/abs/1603.09320[HNSW algorithm] in addition to automatically scalar
quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint
by 8x at the cost of some accuracy. See <<dense-vector-quantization, Automatically quantize vectors for kNN search>>.
* experimental:[] `bbq_hnsw` - This utilizes the https://arxiv.org/abs/1603.09320[HNSW algorithm] in addition to automatically binary
quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint
by 32x at the cost of accuracy. See <<dense-vector-quantization, Automatically quantize vectors for kNN search>>.
* `flat` - This utilizes a brute-force search algorithm for exact kNN search. This supports all `element_type` values.
* `int8_flat` - This utilizes a brute-force search algorithm in addition to automatically scalar quantization. Only supports
`element_type` of `float`.
* `int4_flat` - This utilizes a brute-force search algorithm in addition to automatically half-byte scalar quantization. Only supports
`element_type` of `float`.
* experimental:[] `bbq_flat` - This utilizes a brute-force search algorithm in addition to automatically binary quantization. Only supports
`element_type` of `float`.
--
`m`:::
(Optional, integer)
Expand Down
7 changes: 6 additions & 1 deletion docs/reference/rest-api/usage.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,12 @@ GET /_xpack/usage
"service": "elasticsearch",
"task_type": "SPARSE_EMBEDDING",
"count": 1
}
},
{
"service": "elasticsearch",
"task_type": "TEXT_EMBEDDING",
"count": 1
},
]
},
"logstash" : {
Expand Down
92 changes: 92 additions & 0 deletions docs/reference/search/search-your-data/knn-search.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -1149,3 +1149,95 @@ POST product-index/_search
----
//TEST[continued]

[discrete]
[[dense-vector-knn-search-reranking]]
==== Oversampling and rescoring for quantized vectors

All forms of quantization will result in some accuracy loss and as the quantization level increases the accuracy loss will also increase.
Generally, we have found that:
- `int8` requires minimal if any rescoring
- `int4` requires some rescoring for higher accuracy and larger recall scenarios. Generally, oversampling by 1.5x-2x recovers most of the accuracy loss.
- `bbq` requires rescoring except on exceptionally large indices or models specifically designed for quantization. We have found that between 3x-5x oversampling is generally sufficient. But for fewer dimensions or vectors that do not quantize well, higher oversampling may be required.

There are two main ways to oversample and rescore. The first is to utilize the <<rescore, rescore section>> in the `_search` request.

Here is an example using the top level `knn` search with oversampling and using `rescore` to rerank the results:

[source,console]
--------------------------------------------------
POST /my-index/_search
{
"size": 10, <1>
"knn": {
"query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
"field": "my_int4_vector",
"k": 20, <2>
"num_candidates": 50
},
"rescore": {
"window_size": 20, <3>
"query": {
"rescore_query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)", <4>
"params": {
"queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
}
}
}
},
"query_weight": 0, <5>
"rescore_query_weight": 1 <6>
}
}
}
--------------------------------------------------
// TEST[skip: setup not provided]
<1> The number of results to return, note its only 10 and we will oversample by 2x, gathering 20 nearest neighbors.
<2> The number of results to return from the KNN search. This will do an approximate KNN search with 50 candidates
per HNSW graph and use the quantized vectors, returning the 20 most similar vectors
according to the quantized score. Additionally, since this is the top-level `knn` object, the global top 20 results
will from all shards will be gathered before rescoring. Combining with `rescore`, this is oversampling by `2x`, meaning
gathering 20 nearest neighbors according to quantized scoring and rescoring with higher fidelity float vectors.
<3> The number of results to rescore, if you want to rescore all results, set this to the same value as `k`
<4> The script to rescore the results. Script score will interact directly with the originally provided float32 vector.
<5> The weight of the original query, here we simply throw away the original score
<6> The weight of the rescore query, here we only use the rescore query

The second way is to score per shard with the <<query-dsl-knn-query, knn query>> and <<query-dsl-script-score-query, script_score query >>. Generally, this means that there will be more rescoring per shard, but this
can increase overall recall at the cost of compute.

[source,console]
--------------------------------------------------
POST /my-index/_search
{
"size": 10, <1>
"query": {
"script_score": {
"query": {
"knn": { <2>
"query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
"field": "my_int4_vector",
"num_candidates": 20 <3>
}
},
"script": {
"source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)", <4>
"params": {
"queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
}
}
}
}
}
--------------------------------------------------
// TEST[skip: setup not provided]
<1> The number of results to return
<2> The `knn` query to perform the initial search, this is executed per-shard
<3> The number of candidates to use for the initial approximate `knn` search. This will search using the quantized vectors
and return the top 20 candidates per shard to then be scored
<4> The script to score the results. Script score will interact directly with the originally provided float32 vector.
2 changes: 2 additions & 0 deletions modules/ingest-geoip/src/main/java/module-info.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,6 @@

exports org.elasticsearch.ingest.geoip.direct to org.elasticsearch.server;
exports org.elasticsearch.ingest.geoip.stats to org.elasticsearch.server;

exports org.elasticsearch.ingest.geoip to com.maxmind.db;
}
Original file line number Diff line number Diff line change
Expand Up @@ -73,14 +73,14 @@ void updateDatabase(Path file, boolean update) {
String databaseFileName = file.getFileName().toString();
try {
if (update) {
logger.info("database file changed [{}], reload database...", file);
logger.info("database file changed [{}], reloading database...", file);
DatabaseReaderLazyLoader loader = new DatabaseReaderLazyLoader(cache, file, null);
DatabaseReaderLazyLoader existing = configDatabases.put(databaseFileName, loader);
if (existing != null) {
existing.shutdown();
}
} else {
logger.info("database file removed [{}], close database...", file);
logger.info("database file removed [{}], closing database...", file);
DatabaseReaderLazyLoader existing = configDatabases.remove(databaseFileName);
assert existing != null;
existing.shutdown();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -196,13 +196,19 @@ public IpDatabase get() throws IOException {
}

if (Assertions.ENABLED) {
// Only check whether the suffix has changed and not the entire database type.
// To sanity check whether a city db isn't overwriting with a country or asn db.
// For example overwriting a geoip lite city db with geoip city db is a valid change, but the db type is slightly different,
// by checking just the suffix this assertion doesn't fail.
String expectedSuffix = databaseType.substring(databaseType.lastIndexOf('-'));
assert loader.getDatabaseType().endsWith(expectedSuffix)
: "database type [" + loader.getDatabaseType() + "] doesn't match with expected suffix [" + expectedSuffix + "]";
// Note that the expected suffix might be null for providers that aren't amenable to using dashes as separator for
// determining the database type.
int last = databaseType.lastIndexOf('-');
final String expectedSuffix = last == -1 ? null : databaseType.substring(last);

// If the entire database type matches, then that's a match. Otherwise, if there's a suffix to compare on, then
// check whether the suffix has changed (not the entire database type).
// This is to sanity check, for example, that a city db isn't overwritten with a country or asn db.
// But there are permissible overwrites that make sense, for example overwriting a geolite city db with a geoip city db
// is a valid change, but the db type is slightly different -- by checking just the suffix this assertion won't fail.
final String loaderType = loader.getDatabaseType();
assert loaderType.equals(databaseType) || expectedSuffix == null || loaderType.endsWith(expectedSuffix)
: "database type [" + loaderType + "] doesn't match with expected suffix [" + expectedSuffix + "]";
}
return loader;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,16 @@
import org.elasticsearch.core.Nullable;

import java.util.List;
import java.util.Locale;
import java.util.Set;
import java.util.function.Function;

import static org.elasticsearch.ingest.geoip.IpinfoIpDataLookups.IPINFO_PREFIX;
import static org.elasticsearch.ingest.geoip.IpinfoIpDataLookups.getIpinfoDatabase;
import static org.elasticsearch.ingest.geoip.IpinfoIpDataLookups.getIpinfoLookup;
import static org.elasticsearch.ingest.geoip.MaxmindIpDataLookups.getMaxmindDatabase;
import static org.elasticsearch.ingest.geoip.MaxmindIpDataLookups.getMaxmindLookup;

final class IpDataLookupFactories {

private IpDataLookupFactories() {
Expand All @@ -26,78 +33,44 @@ interface IpDataLookupFactory {
IpDataLookup create(List<String> properties);
}

private static final String CITY_DB_SUFFIX = "-City";
private static final String COUNTRY_DB_SUFFIX = "-Country";
private static final String ASN_DB_SUFFIX = "-ASN";
private static final String ANONYMOUS_IP_DB_SUFFIX = "-Anonymous-IP";
private static final String CONNECTION_TYPE_DB_SUFFIX = "-Connection-Type";
private static final String DOMAIN_DB_SUFFIX = "-Domain";
private static final String ENTERPRISE_DB_SUFFIX = "-Enterprise";
private static final String ISP_DB_SUFFIX = "-ISP";

@Nullable
private static Database getMaxmindDatabase(final String databaseType) {
if (databaseType.endsWith(CITY_DB_SUFFIX)) {
return Database.City;
} else if (databaseType.endsWith(COUNTRY_DB_SUFFIX)) {
return Database.Country;
} else if (databaseType.endsWith(ASN_DB_SUFFIX)) {
return Database.Asn;
} else if (databaseType.endsWith(ANONYMOUS_IP_DB_SUFFIX)) {
return Database.AnonymousIp;
} else if (databaseType.endsWith(CONNECTION_TYPE_DB_SUFFIX)) {
return Database.ConnectionType;
} else if (databaseType.endsWith(DOMAIN_DB_SUFFIX)) {
return Database.Domain;
} else if (databaseType.endsWith(ENTERPRISE_DB_SUFFIX)) {
return Database.Enterprise;
} else if (databaseType.endsWith(ISP_DB_SUFFIX)) {
return Database.Isp;
} else {
return null; // no match was found
}
}

/**
* Parses the passed-in databaseType and return the Database instance that is
* associated with that databaseType.
*
* @param databaseType the database type String from the metadata of the database file
* @return the Database instance that is associated with the databaseType
* @return the Database instance that is associated with the databaseType (or null)
*/
@Nullable
static Database getDatabase(final String databaseType) {
Database database = null;

if (Strings.hasText(databaseType)) {
database = getMaxmindDatabase(databaseType);
final String databaseTypeLowerCase = databaseType.toLowerCase(Locale.ROOT);
if (databaseTypeLowerCase.startsWith(IPINFO_PREFIX)) {
database = getIpinfoDatabase(databaseTypeLowerCase); // all lower case!
} else {
// for historical reasons, fall back to assuming maxmind-like type parsing
database = getMaxmindDatabase(databaseType);
}
}

return database;
}

@Nullable
static Function<Set<Database.Property>, IpDataLookup> getMaxmindLookup(final Database database) {
return switch (database) {
case City -> MaxmindIpDataLookups.City::new;
case Country -> MaxmindIpDataLookups.Country::new;
case Asn -> MaxmindIpDataLookups.Asn::new;
case AnonymousIp -> MaxmindIpDataLookups.AnonymousIp::new;
case ConnectionType -> MaxmindIpDataLookups.ConnectionType::new;
case Domain -> MaxmindIpDataLookups.Domain::new;
case Enterprise -> MaxmindIpDataLookups.Enterprise::new;
case Isp -> MaxmindIpDataLookups.Isp::new;
default -> null;
};
}

static IpDataLookupFactory get(final String databaseType, final String databaseFile) {
final Database database = getDatabase(databaseType);
if (database == null) {
throw new IllegalArgumentException("Unsupported database type [" + databaseType + "] for file [" + databaseFile + "]");
}

final Function<Set<Database.Property>, IpDataLookup> factoryMethod = getMaxmindLookup(database);
final Function<Set<Database.Property>, IpDataLookup> factoryMethod;
final String databaseTypeLowerCase = databaseType.toLowerCase(Locale.ROOT);
if (databaseTypeLowerCase.startsWith(IPINFO_PREFIX)) {
factoryMethod = getIpinfoLookup(database);
} else {
// for historical reasons, fall back to assuming maxmind-like types
factoryMethod = getMaxmindLookup(database);
}

if (factoryMethod == null) {
throw new IllegalArgumentException("Unsupported database type [" + databaseType + "] for file [" + databaseFile + "]");
Expand Down
Loading

0 comments on commit d5e1a50

Please sign in to comment.