-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
search as you type fieldmapper (#35600)
Adds the search_as_you_type field type that acts like a text field optimized for as-you-type search completion. It creates a couple subfields that analyze the indexed terms as shingles, against which full terms are queried, and a prefix subfield that analyze terms as the largest shingle size used and edge-ngrams, against which partial terms are queried Adds a match_bool_prefix query type that creates a boolean clause of a term query for each term except the last, for which a boolean clause with a prefix query is created. The match_bool_prefix query is the recommended way of querying a search as you type field, which will boil down to term queries for each shingle of the input text on the appropriate shingle field, and the final (possibly partial) term as a term query on the prefix field. This field type also supports phrase and phrase prefix queries however
- Loading branch information
1 parent
c0c6d70
commit 23395a9
Showing
27 changed files
with
5,198 additions
and
100 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
258 changes: 258 additions & 0 deletions
258
docs/reference/mapping/types/search-as-you-type.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,258 @@ | ||
[[search-as-you-type]] | ||
=== Search as you type datatype | ||
|
||
experimental[] | ||
|
||
The `search_as_you_type` field type is a text-like field that is optimized to | ||
provide out-of-the-box support for queries that serve an as-you-type completion | ||
use case. It creates a series of subfields that are analyzed to index terms | ||
that can be efficiently matched by a query that partially matches the entire | ||
indexed text value. Both prefix completion (i.e matching terms starting at the | ||
beginning of the input) and infix completion (i.e. matching terms at any | ||
position within the input) are supported. | ||
|
||
When adding a field of this type to a mapping | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
PUT my_index | ||
{ | ||
"mappings": { | ||
"properties": { | ||
"my_field": { | ||
"type": "search_as_you_type" | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
This creates the following fields | ||
|
||
[horizontal] | ||
|
||
`my_field`:: | ||
|
||
Analyzed as configured in the mapping. If an analyzer is not configured, | ||
the default analyzer for the index is used | ||
|
||
`my_field._2gram`:: | ||
|
||
Wraps the analyzer of `my_field` with a shingle token filter of shingle | ||
size 2 | ||
|
||
`my_field._3gram`:: | ||
|
||
Wraps the analyzer of `my_field` with a shingle token filter of shingle | ||
size 3 | ||
|
||
`my_field._index_prefix`:: | ||
|
||
Wraps the analyzer of `my_field._3gram` with an edge ngram token filter | ||
|
||
|
||
The size of shingles in subfields can be configured with the `max_shingle_size` | ||
mapping parameter. The default is 3, and valid values for this parameter are | ||
integer values 2 - 4 inclusive. Shingle subfields will be created for each | ||
shingle size from 2 up to and including the `max_shingle_size`. The | ||
`my_field._index_prefix` subfield will always use the analyzer from the shingle | ||
subfield with the `max_shingle_size` when constructing its own analyzer. | ||
|
||
Increasing the `max_shingle_size` will improve matches for queries with more | ||
consecutive terms, at the cost of larger index size. The default | ||
`max_shingle_size` should usually be sufficient. | ||
|
||
The same input text is indexed into each of these fields automatically, with | ||
their differing analysis chains, when an indexed document has a value for the | ||
root field `my_field`. | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
PUT my_index/_doc/1?refresh | ||
{ | ||
"my_field": "quick brown fox jump lazy dog" | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
// TEST[continued] | ||
|
||
The most efficient way of querying to serve a search-as-you-type use case is | ||
usually a <<query-dsl-multi-match-query,`multi_match`>> query of type | ||
<<query-dsl-match-bool-prefix-query,`bool_prefix`>> that targets the root | ||
`search_as_you_type` field and its shingle subfields. This can match the query | ||
terms in any order, but will score documents higher if they contain the terms | ||
in order in a shingle subfield. | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET my_index/_search | ||
{ | ||
"query": { | ||
"multi_match": { | ||
"query": "brown f", | ||
"type": "bool_prefix", | ||
"fields": [ | ||
"my_field", | ||
"my_field._2gram", | ||
"my_field._3gram" | ||
] | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
// TEST[continued] | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"took" : 44, | ||
"timed_out" : false, | ||
"_shards" : { | ||
"total" : 1, | ||
"successful" : 1, | ||
"skipped" : 0, | ||
"failed" : 0 | ||
}, | ||
"hits" : { | ||
"total" : { | ||
"value" : 1, | ||
"relation" : "eq" | ||
}, | ||
"max_score" : 0.8630463, | ||
"hits" : [ | ||
{ | ||
"_index" : "my_index", | ||
"_type" : "_doc", | ||
"_id" : "1", | ||
"_score" : 0.8630463, | ||
"_source" : { | ||
"my_field" : "quick brown fox jump lazy dog" | ||
} | ||
} | ||
] | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/"took" : 44/"took" : $body.took/] | ||
// TESTRESPONSE[s/"max_score" : 0.8630463/"max_score" : $body.hits.max_score/] | ||
// TESTRESPONSE[s/"_score" : 0.8630463/"_score" : $body.hits.hits.0._score/] | ||
|
||
To search for documents that strictly match the query terms in order, or to | ||
search using other properties of phrase queries, use a | ||
<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix` query>> on the root | ||
field. A <<query-dsl-match-query-phrase,`match_phrase` query>> can also be used | ||
if the last term should be matched exactly, and not as a prefix. Using phrase | ||
queries may be less efficient than using the `match_bool_prefix` query. | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET my_index/_search | ||
{ | ||
"query": { | ||
"match_phrase_prefix": { | ||
"my_field": "brown f" | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
// TEST[continued] | ||
|
||
[[specific-params]] | ||
==== Parameters specific to the `search_as_you_type` field | ||
|
||
The following parameters are accepted in a mapping for the `search_as_you_type` | ||
field and are specific to this field type | ||
|
||
[horizontal] | ||
|
||
`max_shingle_size`:: | ||
|
||
The largest shingle size to index the input with and create subfields for, | ||
creating one subfield for each shingle size between 2 and | ||
`max_shingle_size`. Accepts integer values between 2 and 4 inclusive. This | ||
option defaults to 3. | ||
|
||
|
||
[[general-params]] | ||
==== Parameters of the field type as a text field | ||
|
||
The following parameters are accepted in a mapping for the `search_as_you_type` | ||
field due to its nature as a text-like field, and behave similarly to their | ||
behavior when configuring a field of the <<text,`text`>> datatype. Unless | ||
otherwise noted, these options configure the root fields subfields in | ||
the same way. | ||
|
||
<<analyzer,`analyzer`>>:: | ||
|
||
The <<analysis,analyzer>> which should be used for | ||
<<mapping-index,`analyzed`>> string fields, both at index-time and at | ||
search-time (unless overridden by the | ||
<<search-analyzer,`search_analyzer`>>). Defaults to the default index | ||
analyzer, or the <<analysis-standard-analyzer,`standard` analyzer>>. | ||
|
||
<<mapping-index,`index`>>:: | ||
|
||
Should the field be searchable? Accepts `true` (default) or `false`. | ||
|
||
<<index-options,`index_options`>>:: | ||
|
||
What information should be stored in the index, for search and highlighting | ||
purposes. Defaults to `positions`. | ||
|
||
<<norms,`norms`>>:: | ||
|
||
Whether field-length should be taken into account when scoring queries. | ||
Accepts `true` or `false`. This option configures the root field | ||
and shingle subfields, where its default is `true`. It does not configure | ||
the prefix subfield, where it it `false`. | ||
|
||
<<mapping-store,`store`>>:: | ||
|
||
Whether the field value should be stored and retrievable separately from | ||
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false` | ||
(default). This option only configures the root field, and does not | ||
configure any subfields. | ||
|
||
<<search-analyzer,`search_analyzer`>>:: | ||
|
||
The <<analyzer,`analyzer`>> that should be used at search time on | ||
<<mapping-index,`analyzed`>> fields. Defaults to the `analyzer` setting. | ||
|
||
<<search-quote-analyzer,`search_quote_analyzer`>>:: | ||
|
||
The <<analyzer,`analyzer`>> that should be used at search time when a | ||
phrase is encountered. Defaults to the `search_analyzer` setting. | ||
|
||
<<similarity,`similarity`>>:: | ||
|
||
Which scoring algorithm or _similarity_ should be used. Defaults | ||
to `BM25`. | ||
|
||
<<term-vector,`term_vector`>>:: | ||
|
||
Whether term vectors should be stored for an <<mapping-index,`analyzed`>> | ||
field. Defaults to `no`. This option configures the root field and shingle | ||
subfields, but not the prefix subfield. | ||
|
||
|
||
[[prefix-queries]] | ||
==== Optimization of prefix queries | ||
|
||
When making a <<query-dsl-prefix-query,`prefix`>> query to the root field or | ||
any of its subfields, the query will be rewritten to a | ||
<<query-dsl-term-query,`term`>> query on the `._index_prefix` subfield. This | ||
matches more efficiently than is typical of `prefix` queries on text fields, | ||
as prefixes up to a certain length of each shingle are indexed directly as | ||
terms in the `._index_prefix` subfield. | ||
|
||
The analyzer of the `._index_prefix` subfield slightly modifies the | ||
shingle-building behavior to also index prefixes of the terms at the end of the | ||
field's value that normally would not be produced as shingles. For example, if | ||
the value `quick brown fox` is indexed into a `search_as_you_type` field with | ||
`max_shingle_size` of 3, prefixes for `brown fox` and `fox` are also indexed | ||
into the `._index_prefix` subfield even though they do not appear as terms in | ||
the `._3gram` subfield. This allows for completion of all the terms in the | ||
field's input. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
[[query-dsl-match-bool-prefix-query]] | ||
=== Match Bool Prefix Query | ||
|
||
A `match_bool_prefix` query analyzes its input and constructs a | ||
<<query-dsl-bool-query,`bool` query>> from the terms. Each term except the last | ||
is used in a `term` query. The last term is used in a `prefix` query. A | ||
`match_bool_prefix` query such as | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /_search | ||
{ | ||
"query": { | ||
"match_bool_prefix" : { | ||
"message" : "quick brown f" | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
where analysis produces the terms `quick`, `brown`, and `f` is similar to the | ||
following `bool` query | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /_search | ||
{ | ||
"query": { | ||
"bool" : { | ||
"should": [ | ||
{ "term": { "message": "quick" }}, | ||
{ "term": { "message": "brown" }}, | ||
{ "prefix": { "message": "f"}} | ||
] | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
An important difference between the `match_bool_prefix` query and | ||
<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix`>> is that the | ||
`match_phrase_prefix` query matches its terms as a phrase, but the | ||
`match_bool_prefix` query can match its terms in any position. The example | ||
`match_bool_prefix` query above could match a field containing containing | ||
`quick brown fox`, but it could also match `brown fox quick`. It could also | ||
match a field containing the term `quick`, the term `brown` and a term | ||
starting with `f`, appearing in any position. | ||
|
||
==== Parameters | ||
|
||
By default, `match_bool_prefix` queries' input text will be analyzed using the | ||
analyzer from the queried field's mapping. A different search analyzer can be | ||
configured with the `analyzer` parameter | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /_search | ||
{ | ||
"query": { | ||
"match_bool_prefix" : { | ||
"message": { | ||
"query": "quick brown f", | ||
"analyzer": "keyword" | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
`match_bool_prefix` queries support the | ||
<<query-dsl-minimum-should-match,`minimum_should_match`>> and `operator` | ||
parameters as described for the | ||
<<query-dsl-match-query-boolean,`match` query>>, applying the setting to the | ||
constructed `bool` query. The number of clauses in the constructed `bool` | ||
query will in most cases be the number of terms produced by analysis of the | ||
query text. | ||
|
||
The <<query-dsl-match-query-fuzziness,`fuzziness`>>, `prefix_length`, | ||
`max_expansions`, `fuzzy_transpositions`, and `fuzzy_rewrite` parameters can | ||
be applied to the `term` subqueries constructed for all terms but the final | ||
term. They do not have any effect on the prefix query constructed for the | ||
final term. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.