Skip to content

Commit

Permalink
Merge branch '8.11' into backport/8.11/pr-110050
Browse files Browse the repository at this point in the history
  • Loading branch information
elasticmachine authored Jan 17, 2025
2 parents 469a8e9 + d186cab commit a9ebbbd
Show file tree
Hide file tree
Showing 16 changed files with 294 additions and 107 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -78,45 +78,45 @@ Additional settings are:
<<indices-reload-analyzers,reloading>> search analyzers to pick up
changes to synonym files. Only to be used for search analyzers.
* `expand` (defaults to `true`).
* `lenient` (defaults to `false`). If `true` ignores exceptions while parsing the synonym configuration. It is important
to note that only those synonym rules which cannot get parsed are ignored. For instance consider the following request:

[source,console]
--------------------------------------------------
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": [ "my_stop", "synonym_graph" ]
}
},
"filter": {
"my_stop": {
"type": "stop",
"stopwords": [ "bar" ]
},
"synonym_graph": {
"type": "synonym_graph",
"lenient": true,
"synonyms": [ "foo, bar => baz" ]
}
}
}
}
}
}
--------------------------------------------------
Expands definitions for equivalent synonym rules.
See <<synonym-graph-tokenizer-expand-equivalent-synonyms,expand equivalent synonyms>>.
* `lenient` (defaults to `false`).
If `true` ignores errors while parsing the synonym configuration.
It is important to note that only those synonym rules which cannot get parsed are ignored.
See <<synonym-graph-tokenizer-stop-token-filter,synonyms and stop token filters>> for an example of `lenient` behaviour for invalid synonym rules.

[discrete]
[[synonym-graph-tokenizer-expand-equivalent-synonyms]]
===== `expand` equivalent synonym rules

The `expand` parameter controls whether to expand equivalent synonym rules.
Consider a synonym defined like:

`foo, bar, baz`

Using `expand: true`, the synonym rule would be expanded into:

With the above request the word `bar` gets skipped but a mapping `foo => baz` is still added. However, if the mapping
being added was `foo, baz => bar` nothing would get added to the synonym list. This is because the target word for the
mapping is itself eliminated because it was a stop word. Similarly, if the mapping was "bar, foo, baz" and `expand` was
set to `false` no mapping would get added as when `expand=false` the target mapping is the first word. However, if
`expand=true` then the mappings added would be equivalent to `foo, baz => foo, baz` i.e, all mappings other than the
stop word.
```
foo => foo
foo => bar
foo => baz
bar => foo
bar => bar
bar => baz
baz => foo
baz => bar
baz => baz
```

When `expand` is set to `false`, the synonym rule is not expanded and the first synonym is treated as the canonical representation. The synonym would be equivalent to:

```
foo => foo
bar => foo
baz => foo
```

The `expand` parameter does not affect explicit synonym rules, like `foo, bar => baz`.

[discrete]
[[synonym-graph-tokenizer-ignore_case-deprecated]]
Expand Down Expand Up @@ -153,12 +153,65 @@ Text will be processed first through filters preceding the synonym filter before
In the above example, text will be lowercased by the `lowercase` filter before being processed by the `synonyms_filter`.
This means that all the synonyms defined there needs to be in lowercase, or they won't be found by the synonyms filter.

The synonym rules should not contain words that are removed by a filter that appears later in the chain (like a `stop` filter).
Removing a term from a synonym rule means there will be no matching for it at query time.

Because entries in the synonym map cannot have stacked positions, some token filters may cause issues here.
Token filters that produce multiple versions of a token may choose which version of the token to emit when parsing synonyms.
For example, `asciifolding` will only produce the folded version of the token.
Others, like `multiplexer`, `word_delimiter_graph` or `ngram` will throw an error.

If you need to build analyzers that include both multi-token filters and synonym filters, consider using the <<analysis-multiplexer-tokenfilter,multiplexer>> filter, with the multi-token filters in one branch and the synonym filter in the other.

[discrete]
[[synonym-graph-tokenizer-stop-token-filter]]
===== Synonyms and `stop` token filters

Synonyms and <<analysis-stop-tokenfilter,stop token filters>> interact with each other in the following ways:

[discrete]
====== Stop token filter *before* synonym token filter

Stop words will be removed from the synonym rule definition.
This can can cause errors on the synonym rule.

[WARNING]
====
Invalid synonym rules can cause errors when applying analyzer changes.
For reloadable analyzers, this prevents reloading and applying changes.
You must correct errors in the synonym rules and reload the analyzer.
An index with invalid synonym rules cannot be reopened, making it inoperable when:
* A node containing the index starts
* The index is opened from a closed state
* A node restart occurs (which reopens the node assigned shards)
====

For *explicit synonym rules* like `foo, bar => baz` with a stop filter that removes `bar`:

- If `lenient` is set to `false`, an error will be raised as `bar` would be removed from the left hand side of the synonym rule.
- If `lenient` is set to `true`, the rule `foo => baz` will be added and `bar => baz` will be ignored.

If the stop filter removed `baz` instead:

- If `lenient` is set to `false`, an error will be raised as `baz` would be removed from the right hand side of the synonym rule.
- If `lenient` is set to `true`, the synonym will have no effect as the target word is removed.

For *equivalent synonym rules* like `foo, bar, baz` and `expand: true, with a stop filter that removes `bar`:

- If `lenient` is set to `false`, an error will be raised as `bar` would be removed from the synonym rule.
- If `lenient` is set to `true`, the synonyms added would be equivalent to the following synonym rules, which do not contain the removed word:

```
foo => foo
foo => baz
baz => foo
baz => baz
```

[discrete]
====== Stop token filter *after* synonym token filter

The stop filter will remove the terms from the resulting synonym expansion.

For example, a synonym rule like `foo, bar => baz` and a stop filter that removes `baz` will get no matches for `foo` or `bar`, as both would get expanded to `baz` which is removed by the stop filter.

If the stop filter removed `foo` instead, then searching for `foo` would get expanded to `baz`, which is not removed by the stop filter thus potentially providing matches for `baz`.
141 changes: 96 additions & 45 deletions docs/reference/analysis/tokenfilters/synonym-tokenfilter.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -66,47 +66,45 @@ Additional settings are:
<<indices-reload-analyzers,reloading>> search analyzers to pick up
changes to synonym files. Only to be used for search analyzers.
* `expand` (defaults to `true`).
* `lenient` (defaults to `false`). If `true` ignores exceptions while parsing the synonym configuration. It is important
to note that only those synonym rules which cannot get parsed are ignored. For instance consider the following request:


[source,console]
--------------------------------------------------
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": [ "my_stop", "synonym" ]
}
},
"filter": {
"my_stop": {
"type": "stop",
"stopwords": [ "bar" ]
},
"synonym": {
"type": "synonym",
"lenient": true,
"synonyms": [ "foo, bar => baz" ]
}
}
}
}
}
}
--------------------------------------------------
Expands definitions for equivalent synonym rules.
See <<synonym-tokenizer-expand-equivalent-synonyms,expand equivalent synonyms>>.
* `lenient` (defaults to `false`).
If `true` ignores errors while parsing the synonym configuration.
It is important to note that only those synonym rules which cannot get parsed are ignored.
See <<synonym-tokenizer-stop-token-filter,synonyms and stop token filters>> for an example of `lenient` behaviour for invalid synonym rules.

[discrete]
[[synonym-tokenizer-expand-equivalent-synonyms]]
===== `expand` equivalent synonym rules

The `expand` parameter controls whether to expand equivalent synonym rules.
Consider a synonym defined like:

`foo, bar, baz`

Using `expand: true`, the synonym rule would be expanded into:

With the above request the word `bar` gets skipped but a mapping `foo => baz` is still added. However, if the mapping
being added was `foo, baz => bar` nothing would get added to the synonym list. This is because the target word for the
mapping is itself eliminated because it was a stop word. Similarly, if the mapping was "bar, foo, baz" and `expand` was
set to `false` no mapping would get added as when `expand=false` the target mapping is the first word. However, if
`expand=true` then the mappings added would be equivalent to `foo, baz => foo, baz` i.e, all mappings other than the
stop word.
```
foo => foo
foo => bar
foo => baz
bar => foo
bar => bar
bar => baz
baz => foo
baz => bar
baz => baz
```

When `expand` is set to `false`, the synonym rule is not expanded and the first synonym is treated as the canonical representation. The synonym would be equivalent to:

```
foo => foo
bar => foo
baz => foo
```

The `expand` parameter does not affect explicit synonym rules, like `foo, bar => baz`.

[discrete]
[[synonym-tokenizer-ignore_case-deprecated]]
Expand All @@ -128,7 +126,7 @@ To apply synonyms, you will need to include a synonym token filters into an anal
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "synonym"]
"filter": ["stemmer", "synonym"]
}
}
----
Expand All @@ -140,15 +138,68 @@ To apply synonyms, you will need to include a synonym token filters into an anal
Order is important for your token filters.
Text will be processed first through filters preceding the synonym filter before being processed by the synonym filter.

In the above example, text will be lowercased by the `lowercase` filter before being processed by the `synonyms_filter`.
This means that all the synonyms defined there needs to be in lowercase, or they won't be found by the synonyms filter.

The synonym rules should not contain words that are removed by a filter that appears later in the chain (like a `stop` filter).
Removing a term from a synonym rule means there will be no matching for it at query time.
{es} will also use the token filters preceding the synonym filter in a tokenizer chain to parse the entries in a synonym file or synonym set.
In the above example, the synonyms token filter is placed after a stemmer. The stemmer will also be applied to the synonym entries.

Because entries in the synonym map cannot have stacked positions, some token filters may cause issues here.
Token filters that produce multiple versions of a token may choose which version of the token to emit when parsing synonyms.
For example, `asciifolding` will only produce the folded version of the token.
Others, like `multiplexer`, `word_delimiter_graph` or `ngram` will throw an error.

If you need to build analyzers that include both multi-token filters and synonym filters, consider using the <<analysis-multiplexer-tokenfilter,multiplexer>> filter, with the multi-token filters in one branch and the synonym filter in the other.

[discrete]
[[synonym-tokenizer-stop-token-filter]]
===== Synonyms and `stop` token filters

Synonyms and <<analysis-stop-tokenfilter,stop token filters>> interact with each other in the following ways:

[discrete]
====== Stop token filter *before* synonym token filter

Stop words will be removed from the synonym rule definition.
This can can cause errors on the synonym rule.

[WARNING]
====
Invalid synonym rules can cause errors when applying analyzer changes.
For reloadable analyzers, this prevents reloading and applying changes.
You must correct errors in the synonym rules and reload the analyzer.
An index with invalid synonym rules cannot be reopened, making it inoperable when:
* A node containing the index starts
* The index is opened from a closed state
* A node restart occurs (which reopens the node assigned shards)
====

For *explicit synonym rules* like `foo, bar => baz` with a stop filter that removes `bar`:

- If `lenient` is set to `false`, an error will be raised as `bar` would be removed from the left hand side of the synonym rule.
- If `lenient` is set to `true`, the rule `foo => baz` will be added and `bar => baz` will be ignored.

If the stop filter removed `baz` instead:

- If `lenient` is set to `false`, an error will be raised as `baz` would be removed from the right hand side of the synonym rule.
- If `lenient` is set to `true`, the synonym will have no effect as the target word is removed.

For *equivalent synonym rules* like `foo, bar, baz` and `expand: true, with a stop filter that removes `bar`:

- If `lenient` is set to `false`, an error will be raised as `bar` would be removed from the synonym rule.
- If `lenient` is set to `true`, the synonyms added would be equivalent to the following synonym rules, which do not contain the removed word:

```
foo => foo
foo => baz
baz => foo
baz => baz
```

[discrete]
====== Stop token filter *after* synonym token filter

The stop filter will remove the terms from the resulting synonym expansion.

For example, a synonym rule like `foo, bar => baz` and a stop filter that removes `baz` will get no matches for `foo` or `bar`, as both would get expanded to `baz` which is removed by the stop filter.

If the stop filter removed `foo` instead, then searching for `foo` would get expanded to `baz`, which is not removed by the stop filter thus potentially providing matches for `baz`.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This format uses two different definitions:
ipod, i-pod, i pod
computer, pc, laptop
----
* Explicit mappings: Matches a group of words to other words. Words on the left hand side of the rule definition are expanded into all the possibilities described on the right hand side. Example:
* Explicit synonyms: Matches a group of words to other words. Words on the left hand side of the rule definition are expanded into all the possibilities described on the right hand side. Example:
+
[source,synonyms]
----
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/esql/esql-query-api.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,6 @@ Column headings for the search results. Each object is a column.
(string) Data type for the column.
====

`rows`::
`values`::
(array of arrays)
Values for the search results.
5 changes: 4 additions & 1 deletion docs/reference/index-modules.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,10 @@ breaking change].
compression ratio, at the expense of slower stored fields performance.
If you are updating the compression type, the new one will be applied
after segments are merged. Segment merging can be forced using
<<indices-forcemerge,force merge>>.
<<indices-forcemerge,force merge>>. Experiments with indexing log datasets
have shown that `best_compression` gives up to ~18% lower storage usage in
the most ideal scenario compared to `default` while only minimally affecting
indexing throughput (~2%).

[[routing-partition-size]] `index.routing_partition_size`::

Expand Down
5 changes: 2 additions & 3 deletions docs/reference/mapping/types/geo-shape.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,8 @@ Documents using this type can be used:
** a <<query-dsl-geo-shape-query,`geo_shape` query>> (for example, intersecting polygons).
* to aggregate documents by geographic grids:
** either <<search-aggregations-bucket-geohashgrid-aggregation,`geo_hash`>>
** or <<search-aggregations-bucket-geotilegrid-aggregation,`geo_tile`>>.

Grid aggregations over `geo_hex` grids are not supported for `geo_shape` fields.
** or <<search-aggregations-bucket-geotilegrid-aggregation,`geo_tile`>>
** or <<search-aggregations-bucket-geohexgrid-aggregation,`geo_hex`>>

[[geo-shape-mapping-options]]
[discrete]
Expand Down
10 changes: 10 additions & 0 deletions docs/reference/mapping/types/rank-features.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,23 @@ GET my-index-000001/_search
}
}
}
GET my-index-000001/_search
{
"query": { <6>
"term": {
"topics": "economics"
}
}
}
--------------------------------------------------

<1> Rank features fields must use the `rank_features` field type
<2> Rank features that correlate negatively with the score need to declare it
<3> Rank features fields must be a hash with string keys and strictly positive numeric values
<4> This query ranks documents by how much they are about the "politics" topic.
<5> This query ranks documents inversely to the number of "1star" reviews they received.
<6> This query returns documents that store the "economics" feature in the "topics" field.


NOTE: `rank_features` fields only support single-valued features and strictly
Expand Down
Loading

0 comments on commit a9ebbbd

Please sign in to comment.