Skip to content

Commit

Permalink
search as you type fieldmapper (#35600)
Browse files Browse the repository at this point in the history
Adds the search_as_you_type field type that acts like a text field optimized
for as-you-type search completion. It creates a couple subfields that analyze
the indexed terms as shingles, against which full terms are queried, and a
prefix subfield that analyze terms as the largest shingle size used and
edge-ngrams, against which partial terms are queried

Adds a match_bool_prefix query type that creates a boolean clause of a term
query for each term except the last, for which a boolean clause with a prefix
query is created.

The match_bool_prefix query is the recommended way of querying a search as you
type field, which will boil down to term queries for each shingle of the input
text on the appropriate shingle field, and the final (possibly partial) term
as a term query on the prefix field. This field type also supports phrase and
phrase prefix queries however
  • Loading branch information
andyb-elastic authored Mar 27, 2019
1 parent c0c6d70 commit 23395a9
Show file tree
Hide file tree
Showing 27 changed files with 5,198 additions and 100 deletions.
3 changes: 3 additions & 0 deletions docs/reference/mapping/types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>

<<sparse-vector>>:: Record sparse vectors of float values.

<<search-as-you-type>>:: A text-like field optimized for queries to implement as-you-type completion

[float]
=== Multi-fields
Expand Down Expand Up @@ -110,3 +111,5 @@ include::types/rank-features.asciidoc[]
include::types/dense-vector.asciidoc[]

include::types/sparse-vector.asciidoc[]

include::types/search-as-you-type.asciidoc[]
258 changes: 258 additions & 0 deletions docs/reference/mapping/types/search-as-you-type.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
[[search-as-you-type]]
=== Search as you type datatype

experimental[]

The `search_as_you_type` field type is a text-like field that is optimized to
provide out-of-the-box support for queries that serve an as-you-type completion
use case. It creates a series of subfields that are analyzed to index terms
that can be efficiently matched by a query that partially matches the entire
indexed text value. Both prefix completion (i.e matching terms starting at the
beginning of the input) and infix completion (i.e. matching terms at any
position within the input) are supported.

When adding a field of this type to a mapping

[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"properties": {
"my_field": {
"type": "search_as_you_type"
}
}
}
}
--------------------------------------------------
// CONSOLE

This creates the following fields

[horizontal]

`my_field`::

Analyzed as configured in the mapping. If an analyzer is not configured,
the default analyzer for the index is used

`my_field._2gram`::

Wraps the analyzer of `my_field` with a shingle token filter of shingle
size 2

`my_field._3gram`::

Wraps the analyzer of `my_field` with a shingle token filter of shingle
size 3

`my_field._index_prefix`::

Wraps the analyzer of `my_field._3gram` with an edge ngram token filter


The size of shingles in subfields can be configured with the `max_shingle_size`
mapping parameter. The default is 3, and valid values for this parameter are
integer values 2 - 4 inclusive. Shingle subfields will be created for each
shingle size from 2 up to and including the `max_shingle_size`. The
`my_field._index_prefix` subfield will always use the analyzer from the shingle
subfield with the `max_shingle_size` when constructing its own analyzer.

Increasing the `max_shingle_size` will improve matches for queries with more
consecutive terms, at the cost of larger index size. The default
`max_shingle_size` should usually be sufficient.

The same input text is indexed into each of these fields automatically, with
their differing analysis chains, when an indexed document has a value for the
root field `my_field`.

[source,js]
--------------------------------------------------
PUT my_index/_doc/1?refresh
{
"my_field": "quick brown fox jump lazy dog"
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

The most efficient way of querying to serve a search-as-you-type use case is
usually a <<query-dsl-multi-match-query,`multi_match`>> query of type
<<query-dsl-match-bool-prefix-query,`bool_prefix`>> that targets the root
`search_as_you_type` field and its shingle subfields. This can match the query
terms in any order, but will score documents higher if they contain the terms
in order in a shingle subfield.

[source,js]
--------------------------------------------------
GET my_index/_search
{
"query": {
"multi_match": {
"query": "brown f",
"type": "bool_prefix",
"fields": [
"my_field",
"my_field._2gram",
"my_field._3gram"
]
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

[source,js]
--------------------------------------------------
{
"took" : 44,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.8630463,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.8630463,
"_source" : {
"my_field" : "quick brown fox jump lazy dog"
}
}
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took" : 44/"took" : $body.took/]
// TESTRESPONSE[s/"max_score" : 0.8630463/"max_score" : $body.hits.max_score/]
// TESTRESPONSE[s/"_score" : 0.8630463/"_score" : $body.hits.hits.0._score/]

To search for documents that strictly match the query terms in order, or to
search using other properties of phrase queries, use a
<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix` query>> on the root
field. A <<query-dsl-match-query-phrase,`match_phrase` query>> can also be used
if the last term should be matched exactly, and not as a prefix. Using phrase
queries may be less efficient than using the `match_bool_prefix` query.

[source,js]
--------------------------------------------------
GET my_index/_search
{
"query": {
"match_phrase_prefix": {
"my_field": "brown f"
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

[[specific-params]]
==== Parameters specific to the `search_as_you_type` field

The following parameters are accepted in a mapping for the `search_as_you_type`
field and are specific to this field type

[horizontal]

`max_shingle_size`::

The largest shingle size to index the input with and create subfields for,
creating one subfield for each shingle size between 2 and
`max_shingle_size`. Accepts integer values between 2 and 4 inclusive. This
option defaults to 3.


[[general-params]]
==== Parameters of the field type as a text field

The following parameters are accepted in a mapping for the `search_as_you_type`
field due to its nature as a text-like field, and behave similarly to their
behavior when configuring a field of the <<text,`text`>> datatype. Unless
otherwise noted, these options configure the root fields subfields in
the same way.

<<analyzer,`analyzer`>>::

The <<analysis,analyzer>> which should be used for
<<mapping-index,`analyzed`>> string fields, both at index-time and at
search-time (unless overridden by the
<<search-analyzer,`search_analyzer`>>). Defaults to the default index
analyzer, or the <<analysis-standard-analyzer,`standard` analyzer>>.

<<mapping-index,`index`>>::

Should the field be searchable? Accepts `true` (default) or `false`.

<<index-options,`index_options`>>::

What information should be stored in the index, for search and highlighting
purposes. Defaults to `positions`.

<<norms,`norms`>>::

Whether field-length should be taken into account when scoring queries.
Accepts `true` or `false`. This option configures the root field
and shingle subfields, where its default is `true`. It does not configure
the prefix subfield, where it it `false`.

<<mapping-store,`store`>>::

Whether the field value should be stored and retrievable separately from
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
(default). This option only configures the root field, and does not
configure any subfields.

<<search-analyzer,`search_analyzer`>>::

The <<analyzer,`analyzer`>> that should be used at search time on
<<mapping-index,`analyzed`>> fields. Defaults to the `analyzer` setting.

<<search-quote-analyzer,`search_quote_analyzer`>>::

The <<analyzer,`analyzer`>> that should be used at search time when a
phrase is encountered. Defaults to the `search_analyzer` setting.

<<similarity,`similarity`>>::

Which scoring algorithm or _similarity_ should be used. Defaults
to `BM25`.

<<term-vector,`term_vector`>>::

Whether term vectors should be stored for an <<mapping-index,`analyzed`>>
field. Defaults to `no`. This option configures the root field and shingle
subfields, but not the prefix subfield.


[[prefix-queries]]
==== Optimization of prefix queries

When making a <<query-dsl-prefix-query,`prefix`>> query to the root field or
any of its subfields, the query will be rewritten to a
<<query-dsl-term-query,`term`>> query on the `._index_prefix` subfield. This
matches more efficiently than is typical of `prefix` queries on text fields,
as prefixes up to a certain length of each shingle are indexed directly as
terms in the `._index_prefix` subfield.

The analyzer of the `._index_prefix` subfield slightly modifies the
shingle-building behavior to also index prefixes of the terms at the end of the
field's value that normally would not be produced as shingles. For example, if
the value `quick brown fox` is indexed into a `search_as_you_type` field with
`max_shingle_size` of 3, prefixes for `brown fox` and `fox` are also indexed
into the `._index_prefix` subfield even though they do not appear as terms in
the `._3gram` subfield. This allows for completion of all the terms in the
field's input.
9 changes: 8 additions & 1 deletion docs/reference/query-dsl/full-text-queries.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,12 @@ The queries in this group are:

<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix` query>>::

The poor man's _search-as-you-type_. Like the `match_phrase` query, but does a wildcard search on the final word.
Like the `match_phrase` query, but does a wildcard search on the final word.

<<query-dsl-match-bool-prefix-query,`match_bool_prefix` query>>::

Creates a `bool` query that matches each term as a `term` query, except for
the last term, which is matched as a `prefix` query

<<query-dsl-multi-match-query,`multi_match` query>>::

Expand Down Expand Up @@ -50,6 +55,8 @@ include::match-phrase-query.asciidoc[]

include::match-phrase-prefix-query.asciidoc[]

include::match-bool-prefix-query.asciidoc[]

include::multi-match-query.asciidoc[]

include::common-terms-query.asciidoc[]
Expand Down
85 changes: 85 additions & 0 deletions docs/reference/query-dsl/match-bool-prefix-query.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
[[query-dsl-match-bool-prefix-query]]
=== Match Bool Prefix Query

A `match_bool_prefix` query analyzes its input and constructs a
<<query-dsl-bool-query,`bool` query>> from the terms. Each term except the last
is used in a `term` query. The last term is used in a `prefix` query. A
`match_bool_prefix` query such as

[source,js]
--------------------------------------------------
GET /_search
{
"query": {
"match_bool_prefix" : {
"message" : "quick brown f"
}
}
}
--------------------------------------------------
// CONSOLE

where analysis produces the terms `quick`, `brown`, and `f` is similar to the
following `bool` query

[source,js]
--------------------------------------------------
GET /_search
{
"query": {
"bool" : {
"should": [
{ "term": { "message": "quick" }},
{ "term": { "message": "brown" }},
{ "prefix": { "message": "f"}}
]
}
}
}
--------------------------------------------------
// CONSOLE

An important difference between the `match_bool_prefix` query and
<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix`>> is that the
`match_phrase_prefix` query matches its terms as a phrase, but the
`match_bool_prefix` query can match its terms in any position. The example
`match_bool_prefix` query above could match a field containing containing
`quick brown fox`, but it could also match `brown fox quick`. It could also
match a field containing the term `quick`, the term `brown` and a term
starting with `f`, appearing in any position.

==== Parameters

By default, `match_bool_prefix` queries' input text will be analyzed using the
analyzer from the queried field's mapping. A different search analyzer can be
configured with the `analyzer` parameter

[source,js]
--------------------------------------------------
GET /_search
{
"query": {
"match_bool_prefix" : {
"message": {
"query": "quick brown f",
"analyzer": "keyword"
}
}
}
}
--------------------------------------------------
// CONSOLE

`match_bool_prefix` queries support the
<<query-dsl-minimum-should-match,`minimum_should_match`>> and `operator`
parameters as described for the
<<query-dsl-match-query-boolean,`match` query>>, applying the setting to the
constructed `bool` query. The number of clauses in the constructed `bool`
query will in most cases be the number of terms produced by analysis of the
query text.

The <<query-dsl-match-query-fuzziness,`fuzziness`>>, `prefix_length`,
`max_expansions`, `fuzzy_transpositions`, and `fuzzy_rewrite` parameters can
be applied to the `term` subqueries constructed for all terms but the final
term. They do not have any effect on the prefix query constructed for the
final term.
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,6 @@ for appears.
For better solutions for _search-as-you-type_ see the
<<search-suggesters-completion,completion suggester>> and
{defguide}/_index_time_search_as_you_type.html[Index-Time Search-as-You-Type].
the <<search-as-you-type,`search_as_you_type` field type>>.
===================================================
3 changes: 1 addition & 2 deletions docs/reference/query-dsl/match-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,6 @@ process. It does not support field name prefixes, wildcard characters,
or other "advanced" features. For this reason, chances of it failing are
very small / non existent, and it provides an excellent behavior when it
comes to just analyze and run that text as a query behavior (which is
usually what a text search box does). Also, the <<query-dsl-match-query-phrase-prefix,`match_phrase_prefix`>>
type can provide a great "as you type" behavior to automatically load search results.
usually what a text search box does).
**************************************************
Loading

0 comments on commit 23395a9

Please sign in to comment.