Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search as you type fieldmapper #35600

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
cd8de86
Add SearchAsTypeFieldMapper
jimczi Sep 7, 2018
6935480
wip compiles now
andyb-elastic Nov 2, 2018
6d79b6e
wip default configuration test passes
andyb-elastic Nov 2, 2018
6443ed9
wip passes basic test
andyb-elastic Nov 10, 2018
024e546
wip move to own module, passes
andyb-elastic Nov 10, 2018
9582375
wip passes
andyb-elastic Nov 10, 2018
4bcc6af
wip put all suggestrized field types into searchasyoutypefieldtype
andyb-elastic Nov 10, 2018
cc6fb91
wip change defaults
andyb-elastic Nov 15, 2018
15ef530
wip remove commented code
andyb-elastic Nov 15, 2018
a8897d6
wip limit max shingle size
andyb-elastic Nov 15, 2018
db7bbc1
wip test configuration
andyb-elastic Nov 15, 2018
0d1a506
wip remove multifield support
andyb-elastic Nov 17, 2018
3701310
wip remove original field mapper
andyb-elastic Nov 17, 2018
bb21fa0
wip move root field mapper class down a little tests pass
andyb-elastic Dec 10, 2018
5171d6e
wip structure subfield in a container, tests pass
andyb-elastic Dec 12, 2018
e1c8493
wip put analysis details into field type
andyb-elastic Dec 12, 2018
501f3a0
wip added references to ngram subfields
andyb-elastic Dec 13, 2018
b2360d5
wip method name consistency
andyb-elastic Dec 13, 2018
c4b5e10
wip disable integTest bc we dont have any
andyb-elastic Dec 13, 2018
a3eae0e
wip add test for doc parsing
andyb-elastic Dec 14, 2018
9c334a7
wip IT smoketest
andyb-elastic Dec 14, 2018
0d51d1b
Merge branch 'feature-search-as-you-type' into feature-search-as-you-…
andyb-elastic Dec 14, 2018
8664540
wip lucene updates
andyb-elastic Dec 14, 2018
2a24629
Merge branch 'feature-search-as-you-type' into feature-search-as-you-…
andyb-elastic Jan 8, 2019
7d30c1b
wip edge ngrams only on highest shingle size
andyb-elastic Jan 9, 2019
1e21ea9
wip cleanup
andyb-elastic Jan 9, 2019
e716e1f
wip cleanup
andyb-elastic Jan 9, 2019
55e3f05
wip new design
andyb-elastic Jan 15, 2019
e49d8f8
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Jan 15, 2019
fe8bb6d
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Jan 15, 2019
bf22cad
wip create trailing shingles
andyb-elastic Jan 16, 2019
cb4d3f1
wip move to core
andyb-elastic Jan 16, 2019
6cd91e5
refactor code and move to mapper-extras
jimczi Jan 16, 2019
5c9a196
remove dupe IT
andyb-elastic Jan 16, 2019
91a12f2
wip undo changes no longer needed
andyb-elastic Jan 16, 2019
bdb0433
wip prefix field type test case
andyb-elastic Jan 16, 2019
990e4de
wip tests back to where they were
andyb-elastic Jan 17, 2019
21a5ec5
wip analyzer tests w more terms
andyb-elastic Jan 22, 2019
01bf105
wip make IT a yaml test instead
andyb-elastic Jan 23, 2019
1835429
wip handle prefix query
andyb-elastic Jan 24, 2019
bb56d1a
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Jan 24, 2019
fdf0cfb
wip docs
andyb-elastic Jan 25, 2019
c5adcbe
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Jan 25, 2019
4b7933a
doc review changes
andyb-elastic Jan 26, 2019
f954b73
wip test for index options and correct docs
andyb-elastic Jan 28, 2019
0f1dee7
wip test rest of mapping options and add norms
andyb-elastic Jan 28, 2019
6d6b2f4
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Jan 28, 2019
9532c19
handle phrase and phrase prefix query
jimczi Jan 29, 2019
a62caee
wip add prefix query to root field type
andyb-elastic Jan 29, 2019
fa25c70
wip fix root field clone
andyb-elastic Jan 29, 2019
55a80d7
wip disable stored fields in subfields
andyb-elastic Jan 29, 2019
5413f6e
wip dont set default index options in shingle fields
andyb-elastic Jan 29, 2019
513c346
wip tests for prefix, phrase, and phrase prfefix queries
andyb-elastic Jan 30, 2019
948f44c
wip shingle fields shouldn't delegate phrase prefix queries to prefix…
andyb-elastic Jan 31, 2019
615477b
wip change field type tests to root field
andyb-elastic Jan 31, 2019
0aa7ab6
wip set search quote analyzer
andyb-elastic Feb 1, 2019
d8efc04
wip javadoc and some better names
andyb-elastic Feb 1, 2019
99a0c32
wip example query in docs uses match phrase prefix
andyb-elastic Feb 1, 2019
cffbab0
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Feb 2, 2019
1052fa5
add support for match boolean prefix query
jimczi Feb 5, 2019
f8142f7
Merge branch 'master' into match_prefix
jimczi Feb 5, 2019
8bcf121
Merge branch 'feature-search-as-you-type-fieldmapper' of github.com:a…
andyb-elastic Mar 4, 2019
4c736e0
highlighting test
andyb-elastic Mar 4, 2019
90014a8
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 4, 2019
16229fa
test for boolean prefix in match query builder
andyb-elastic Mar 5, 2019
2da88b3
test with synonyms
andyb-elastic Mar 6, 2019
32a6dc4
remove commented code
andyb-elastic Mar 6, 2019
f26d919
boolean prefix query builder
andyb-elastic Mar 8, 2019
a8ffbf9
use params in yaml tests
andyb-elastic Mar 8, 2019
556dcc6
sayt yaml tests for boolean prefix
andyb-elastic Mar 9, 2019
7acb81b
remove prefix and phrase prefix highlighting tests, because highlight…
andyb-elastic Mar 9, 2019
c6056da
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 9, 2019
f2ccbd5
fix highlighting test to pass
andyb-elastic Mar 9, 2019
a4bcaef
rename to bool_prefix
andyb-elastic Mar 11, 2019
0b4baf1
MultiMatchQueryBuilderTests for boolean prefix
andyb-elastic Mar 12, 2019
a24f788
partially fix some issues with MultiMatchQueryBuilderTests
andyb-elastic Mar 13, 2019
c85aad4
fix searchmoduletests
andyb-elastic Mar 14, 2019
c77d1d1
enforce leniency with prefix queries - MultiMatchQueryBuilderTests pa…
andyb-elastic Mar 14, 2019
acaaf8e
fix checkstyle
andyb-elastic Mar 14, 2019
cfab648
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 14, 2019
0ed9201
respect autoGenerateMultiTermSynonymPhraseQuery
andyb-elastic Mar 15, 2019
1201135
checkstyle
andyb-elastic Mar 15, 2019
bee66d6
fix bwc skips
andyb-elastic Mar 18, 2019
0e887c5
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 18, 2019
d7981df
fix skip reasons
andyb-elastic Mar 18, 2019
d26ce8a
remove test logging
andyb-elastic Mar 18, 2019
2bc13d2
multi match yaml tests
andyb-elastic Mar 18, 2019
d95199d
highlighting tests for multi match
andyb-elastic Mar 18, 2019
5e2ce2b
fix method name
andyb-elastic Mar 18, 2019
e249c12
multimatch bool_prefix test for fieldmapper
andyb-elastic Mar 18, 2019
1bcc7f6
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 19, 2019
30cb8b1
clarify bool prefix query builder tests
andyb-elastic Mar 19, 2019
cf8c1e0
use the correct phrase in tests
andyb-elastic Mar 19, 2019
3c96949
bool_prefix docs
andyb-elastic Mar 20, 2019
78b150e
yaml tests for multi_match bool_prefix
andyb-elastic Mar 21, 2019
96ce9eb
notes about what multi_match params we support with bool_prefix
andyb-elastic Mar 21, 2019
e45935e
notes on supported multi_match parameters
andyb-elastic Mar 21, 2019
89982f2
explicitly disallow unsupported multimatch params
andyb-elastic Mar 21, 2019
9dcd698
examples in sayt doc page
andyb-elastic Mar 22, 2019
b9f49a6
point other doc pages to new sayt doc
andyb-elastic Mar 22, 2019
6cc7935
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 22, 2019
df56e88
docs fixes
andyb-elastic Mar 22, 2019
4fa005c
default max shingle size in docs
andyb-elastic Mar 22, 2019
2c10385
fix reference to scoring mode
andyb-elastic Mar 22, 2019
02383c8
account for disallowed params
andyb-elastic Mar 22, 2019
9691f4a
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 22, 2019
5ff97b1
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 25, 2019
a9abdff
rename server
andyb-elastic Mar 25, 2019
f36d9c1
rename module
andyb-elastic Mar 25, 2019
5003046
rename docs and some fixes
andyb-elastic Mar 25, 2019
52d4442
support operator
andyb-elastic Mar 26, 2019
5d891a0
support fuzziness
andyb-elastic Mar 26, 2019
41a7530
fix searchmoduletests
andyb-elastic Mar 26, 2019
a146fac
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 27, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/reference/mapping/types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>

<<sparse-vector>>:: Record sparse vectors of float values.

<<search-as-you-type>>:: A text-like field optimized for queries to implement as-you-type completion

[float]
=== Multi-fields
Expand Down Expand Up @@ -110,3 +111,5 @@ include::types/rank-features.asciidoc[]
include::types/dense-vector.asciidoc[]

include::types/sparse-vector.asciidoc[]

include::types/search-as-you-type.asciidoc[]
258 changes: 258 additions & 0 deletions docs/reference/mapping/types/search-as-you-type.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
[[search-as-you-type]]
=== Search as you type datatype

experimental[]

The `search_as_you_type` field type is a text-like field that is optimized to
provide out-of-the-box support for queries that serve an as-you-type completion
use case. It creates a series of subfields that are analyzed to index terms
that can be efficiently matched by a query that partially matches the entire
indexed text value. Both prefix completion (i.e matching terms starting at the
beginning of the input) and infix completion (i.e. matching terms at any
position within the input) are supported.

When adding a field of this type to a mapping

[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"properties": {
"my_field": {
"type": "search_as_you_type"
}
}
}
}
--------------------------------------------------
// CONSOLE

This creates the following fields

[horizontal]

`my_field`::

Analyzed as configured in the mapping. If an analyzer is not configured,
the default analyzer for the index is used

`my_field._2gram`::

Wraps the analyzer of `my_field` with a shingle token filter of shingle
size 2
jimczi marked this conversation as resolved.
Show resolved Hide resolved

`my_field._3gram`::

Wraps the analyzer of `my_field` with a shingle token filter of shingle
size 3

`my_field._index_prefix`::

Wraps the analyzer of `my_field._3gram` with an edge ngram token filter


The size of shingles in subfields can be configured with the `max_shingle_size`
mapping parameter. The default is 3, and valid values for this parameter are
integer values 2 - 4 inclusive. Shingle subfields will be created for each
shingle size from 2 up to and including the `max_shingle_size`. The
`my_field._index_prefix` subfield will always use the analyzer from the shingle
subfield with the `max_shingle_size` when constructing its own analyzer.

Increasing the `max_shingle_size` will improve matches for queries with more
consecutive terms, at the cost of larger index size. The default
`max_shingle_size` should usually be sufficient.

The same input text is indexed into each of these fields automatically, with
their differing analysis chains, when an indexed document has a value for the
root field `my_field`.

[source,js]
--------------------------------------------------
PUT my_index/_doc/1?refresh
{
"my_field": "quick brown fox jump lazy dog"
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

The most efficient way of querying to serve a search-as-you-type use case is
usually a <<query-dsl-multi-match-query,`multi_match`>> query of type
<<query-dsl-match-bool-prefix-query,`bool_prefix`>> that targets the root
`search_as_you_type` field and its shingle subfields. This can match the query
terms in any order, but will score documents higher if they contain the terms
in order in a shingle subfield.

[source,js]
--------------------------------------------------
GET my_index/_search
{
"query": {
"multi_match": {
"query": "brown f",
"type": "bool_prefix",
"fields": [
"my_field",
"my_field._2gram",
"my_field._3gram"
]
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

jimczi marked this conversation as resolved.
Show resolved Hide resolved
[source,js]
--------------------------------------------------
{
"took" : 44,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.8630463,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.8630463,
"_source" : {
"my_field" : "quick brown fox jump lazy dog"
}
}
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took" : 44/"took" : $body.took/]
// TESTRESPONSE[s/"max_score" : 0.8630463/"max_score" : $body.hits.max_score/]
// TESTRESPONSE[s/"_score" : 0.8630463/"_score" : $body.hits.hits.0._score/]

To search for documents that strictly match the query terms in order, or to
search using other properties of phrase queries, use a
<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix` query>> on the root
field. A <<query-dsl-match-query-phrase,`match_phrase` query>> can also be used
if the last term should be matched exactly, and not as a prefix. Using phrase
queries may be less efficient than using the `match_bool_prefix` query.

[source,js]
--------------------------------------------------
GET my_index/_search
{
"query": {
"match_phrase_prefix": {
"my_field": "brown f"
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

[[specific-params]]
==== Parameters specific to the `search_as_you_type` field

The following parameters are accepted in a mapping for the `search_as_you_type`
field and are specific to this field type

[horizontal]

`max_shingle_size`::

The largest shingle size to index the input with and create subfields for,
creating one subfield for each shingle size between 2 and
`max_shingle_size`. Accepts integer values between 2 and 4 inclusive. This
option defaults to 3.

jimczi marked this conversation as resolved.
Show resolved Hide resolved

[[general-params]]
==== Parameters of the field type as a text field
jimczi marked this conversation as resolved.
Show resolved Hide resolved

The following parameters are accepted in a mapping for the `search_as_you_type`
field due to its nature as a text-like field, and behave similarly to their
behavior when configuring a field of the <<text,`text`>> datatype. Unless
otherwise noted, these options configure the root fields subfields in
the same way.

<<analyzer,`analyzer`>>::

The <<analysis,analyzer>> which should be used for
<<mapping-index,`analyzed`>> string fields, both at index-time and at
search-time (unless overridden by the
<<search-analyzer,`search_analyzer`>>). Defaults to the default index
analyzer, or the <<analysis-standard-analyzer,`standard` analyzer>>.

<<mapping-index,`index`>>::

Should the field be searchable? Accepts `true` (default) or `false`.

<<index-options,`index_options`>>::

What information should be stored in the index, for search and highlighting
purposes. Defaults to `positions`.

<<norms,`norms`>>::

Whether field-length should be taken into account when scoring queries.
Accepts `true` or `false`. This option configures the root field
and shingle subfields, where its default is `true`. It does not configure
the prefix subfield, where it it `false`.

<<mapping-store,`store`>>::

Whether the field value should be stored and retrievable separately from
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
(default). This option only configures the root field, and does not
configure any subfields.

<<search-analyzer,`search_analyzer`>>::

The <<analyzer,`analyzer`>> that should be used at search time on
<<mapping-index,`analyzed`>> fields. Defaults to the `analyzer` setting.

<<search-quote-analyzer,`search_quote_analyzer`>>::

The <<analyzer,`analyzer`>> that should be used at search time when a
phrase is encountered. Defaults to the `search_analyzer` setting.

<<similarity,`similarity`>>::

Which scoring algorithm or _similarity_ should be used. Defaults
to `BM25`.

<<term-vector,`term_vector`>>::

Whether term vectors should be stored for an <<mapping-index,`analyzed`>>
field. Defaults to `no`. This option configures the root field and shingle
subfields, but not the prefix subfield.


[[prefix-queries]]
==== Optimization of prefix queries

When making a <<query-dsl-prefix-query,`prefix`>> query to the root field or
any of its subfields, the query will be rewritten to a
<<query-dsl-term-query,`term`>> query on the `._index_prefix` subfield. This
matches more efficiently than is typical of `prefix` queries on text fields,
as prefixes up to a certain length of each shingle are indexed directly as
terms in the `._index_prefix` subfield.

The analyzer of the `._index_prefix` subfield slightly modifies the
shingle-building behavior to also index prefixes of the terms at the end of the
field's value that normally would not be produced as shingles. For example, if
the value `quick brown fox` is indexed into a `search_as_you_type` field with
`max_shingle_size` of 3, prefixes for `brown fox` and `fox` are also indexed
into the `._index_prefix` subfield even though they do not appear as terms in
the `._3gram` subfield. This allows for completion of all the terms in the
field's input.
9 changes: 8 additions & 1 deletion docs/reference/query-dsl/full-text-queries.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,12 @@ The queries in this group are:

<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix` query>>::

The poor man's _search-as-you-type_. Like the `match_phrase` query, but does a wildcard search on the final word.
Like the `match_phrase` query, but does a wildcard search on the final word.

<<query-dsl-match-bool-prefix-query,`match_bool_prefix` query>>::

Creates a `bool` query that matches each term as a `term` query, except for
the last term, which is matched as a `prefix` query

<<query-dsl-multi-match-query,`multi_match` query>>::

Expand Down Expand Up @@ -50,6 +55,8 @@ include::match-phrase-query.asciidoc[]

include::match-phrase-prefix-query.asciidoc[]

include::match-bool-prefix-query.asciidoc[]

include::multi-match-query.asciidoc[]

include::common-terms-query.asciidoc[]
Expand Down
85 changes: 85 additions & 0 deletions docs/reference/query-dsl/match-bool-prefix-query.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
[[query-dsl-match-bool-prefix-query]]
=== Match Bool Prefix Query

A `match_bool_prefix` query analyzes its input and constructs a
<<query-dsl-bool-query,`bool` query>> from the terms. Each term except the last
is used in a `term` query. The last term is used in a `prefix` query. A
`match_bool_prefix` query such as

[source,js]
--------------------------------------------------
GET /_search
{
"query": {
"match_bool_prefix" : {
"message" : "quick brown f"
}
}
}
--------------------------------------------------
// CONSOLE

where analysis produces the terms `quick`, `brown`, and `f` is similar to the
following `bool` query

[source,js]
--------------------------------------------------
GET /_search
{
"query": {
"bool" : {
"should": [
{ "term": { "message": "quick" }},
{ "term": { "message": "brown" }},
{ "prefix": { "message": "f"}}
]
}
}
}
--------------------------------------------------
// CONSOLE

An important difference between the `match_bool_prefix` query and
<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix`>> is that the
`match_phrase_prefix` query matches its terms as a phrase, but the
`match_bool_prefix` query can match its terms in any position. The example
`match_bool_prefix` query above could match a field containing containing
`quick brown fox`, but it could also match `brown fox quick`. It could also
match a field containing the term `quick`, the term `brown` and a term
starting with `f`, appearing in any position.

==== Parameters

By default, `match_bool_prefix` queries' input text will be analyzed using the
analyzer from the queried field's mapping. A different search analyzer can be
configured with the `analyzer` parameter

[source,js]
--------------------------------------------------
GET /_search
{
"query": {
"match_bool_prefix" : {
"message": {
"query": "quick brown f",
"analyzer": "keyword"
}
}
}
}
--------------------------------------------------
// CONSOLE

`match_bool_prefix` queries support the
<<query-dsl-minimum-should-match,`minimum_should_match`>> and `operator`
parameters as described for the
<<query-dsl-match-query-boolean,`match` query>>, applying the setting to the
constructed `bool` query. The number of clauses in the constructed `bool`
query will in most cases be the number of terms produced by analysis of the
query text.

The <<query-dsl-match-query-fuzziness,`fuzziness`>>, `prefix_length`,
`max_expansions`, `fuzzy_transpositions`, and `fuzzy_rewrite` parameters can
be applied to the `term` subqueries constructed for all terms but the final
term. They do not have any effect on the prefix query constructed for the
final term.
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,6 @@ for appears.

For better solutions for _search-as-you-type_ see the
<<search-suggesters-completion,completion suggester>> and
{defguide}/_index_time_search_as_you_type.html[Index-Time Search-as-You-Type].
the <<search-as-you-type,`search_as_you_type` field type>>.

===================================================
3 changes: 1 addition & 2 deletions docs/reference/query-dsl/match-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,6 @@ process. It does not support field name prefixes, wildcard characters,
or other "advanced" features. For this reason, chances of it failing are
very small / non existent, and it provides an excellent behavior when it
comes to just analyze and run that text as a query behavior (which is
usually what a text search box does). Also, the <<query-dsl-match-query-phrase-prefix,`match_phrase_prefix`>>
type can provide a great "as you type" behavior to automatically load search results.
usually what a text search box does).

**************************************************
Loading