Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search as you type fieldmapper #35600

Merged
Merged
Show file tree
Hide file tree
Changes from 74 commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
cd8de86
Add SearchAsTypeFieldMapper
jimczi Sep 7, 2018
6935480
wip compiles now
andyb-elastic Nov 2, 2018
6d79b6e
wip default configuration test passes
andyb-elastic Nov 2, 2018
6443ed9
wip passes basic test
andyb-elastic Nov 10, 2018
024e546
wip move to own module, passes
andyb-elastic Nov 10, 2018
9582375
wip passes
andyb-elastic Nov 10, 2018
4bcc6af
wip put all suggestrized field types into searchasyoutypefieldtype
andyb-elastic Nov 10, 2018
cc6fb91
wip change defaults
andyb-elastic Nov 15, 2018
15ef530
wip remove commented code
andyb-elastic Nov 15, 2018
a8897d6
wip limit max shingle size
andyb-elastic Nov 15, 2018
db7bbc1
wip test configuration
andyb-elastic Nov 15, 2018
0d1a506
wip remove multifield support
andyb-elastic Nov 17, 2018
3701310
wip remove original field mapper
andyb-elastic Nov 17, 2018
bb21fa0
wip move root field mapper class down a little tests pass
andyb-elastic Dec 10, 2018
5171d6e
wip structure subfield in a container, tests pass
andyb-elastic Dec 12, 2018
e1c8493
wip put analysis details into field type
andyb-elastic Dec 12, 2018
501f3a0
wip added references to ngram subfields
andyb-elastic Dec 13, 2018
b2360d5
wip method name consistency
andyb-elastic Dec 13, 2018
c4b5e10
wip disable integTest bc we dont have any
andyb-elastic Dec 13, 2018
a3eae0e
wip add test for doc parsing
andyb-elastic Dec 14, 2018
9c334a7
wip IT smoketest
andyb-elastic Dec 14, 2018
0d51d1b
Merge branch 'feature-search-as-you-type' into feature-search-as-you-…
andyb-elastic Dec 14, 2018
8664540
wip lucene updates
andyb-elastic Dec 14, 2018
2a24629
Merge branch 'feature-search-as-you-type' into feature-search-as-you-…
andyb-elastic Jan 8, 2019
7d30c1b
wip edge ngrams only on highest shingle size
andyb-elastic Jan 9, 2019
1e21ea9
wip cleanup
andyb-elastic Jan 9, 2019
e716e1f
wip cleanup
andyb-elastic Jan 9, 2019
55e3f05
wip new design
andyb-elastic Jan 15, 2019
e49d8f8
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Jan 15, 2019
fe8bb6d
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Jan 15, 2019
bf22cad
wip create trailing shingles
andyb-elastic Jan 16, 2019
cb4d3f1
wip move to core
andyb-elastic Jan 16, 2019
6cd91e5
refactor code and move to mapper-extras
jimczi Jan 16, 2019
5c9a196
remove dupe IT
andyb-elastic Jan 16, 2019
91a12f2
wip undo changes no longer needed
andyb-elastic Jan 16, 2019
bdb0433
wip prefix field type test case
andyb-elastic Jan 16, 2019
990e4de
wip tests back to where they were
andyb-elastic Jan 17, 2019
21a5ec5
wip analyzer tests w more terms
andyb-elastic Jan 22, 2019
01bf105
wip make IT a yaml test instead
andyb-elastic Jan 23, 2019
1835429
wip handle prefix query
andyb-elastic Jan 24, 2019
bb56d1a
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Jan 24, 2019
fdf0cfb
wip docs
andyb-elastic Jan 25, 2019
c5adcbe
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Jan 25, 2019
4b7933a
doc review changes
andyb-elastic Jan 26, 2019
f954b73
wip test for index options and correct docs
andyb-elastic Jan 28, 2019
0f1dee7
wip test rest of mapping options and add norms
andyb-elastic Jan 28, 2019
6d6b2f4
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Jan 28, 2019
9532c19
handle phrase and phrase prefix query
jimczi Jan 29, 2019
a62caee
wip add prefix query to root field type
andyb-elastic Jan 29, 2019
fa25c70
wip fix root field clone
andyb-elastic Jan 29, 2019
55a80d7
wip disable stored fields in subfields
andyb-elastic Jan 29, 2019
5413f6e
wip dont set default index options in shingle fields
andyb-elastic Jan 29, 2019
513c346
wip tests for prefix, phrase, and phrase prfefix queries
andyb-elastic Jan 30, 2019
948f44c
wip shingle fields shouldn't delegate phrase prefix queries to prefix…
andyb-elastic Jan 31, 2019
615477b
wip change field type tests to root field
andyb-elastic Jan 31, 2019
0aa7ab6
wip set search quote analyzer
andyb-elastic Feb 1, 2019
d8efc04
wip javadoc and some better names
andyb-elastic Feb 1, 2019
99a0c32
wip example query in docs uses match phrase prefix
andyb-elastic Feb 1, 2019
cffbab0
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Feb 2, 2019
1052fa5
add support for match boolean prefix query
jimczi Feb 5, 2019
f8142f7
Merge branch 'master' into match_prefix
jimczi Feb 5, 2019
8bcf121
Merge branch 'feature-search-as-you-type-fieldmapper' of github.com:a…
andyb-elastic Mar 4, 2019
4c736e0
highlighting test
andyb-elastic Mar 4, 2019
90014a8
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 4, 2019
16229fa
test for boolean prefix in match query builder
andyb-elastic Mar 5, 2019
2da88b3
test with synonyms
andyb-elastic Mar 6, 2019
32a6dc4
remove commented code
andyb-elastic Mar 6, 2019
f26d919
boolean prefix query builder
andyb-elastic Mar 8, 2019
a8ffbf9
use params in yaml tests
andyb-elastic Mar 8, 2019
556dcc6
sayt yaml tests for boolean prefix
andyb-elastic Mar 9, 2019
7acb81b
remove prefix and phrase prefix highlighting tests, because highlight…
andyb-elastic Mar 9, 2019
c6056da
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 9, 2019
f2ccbd5
fix highlighting test to pass
andyb-elastic Mar 9, 2019
a4bcaef
rename to bool_prefix
andyb-elastic Mar 11, 2019
0b4baf1
MultiMatchQueryBuilderTests for boolean prefix
andyb-elastic Mar 12, 2019
a24f788
partially fix some issues with MultiMatchQueryBuilderTests
andyb-elastic Mar 13, 2019
c85aad4
fix searchmoduletests
andyb-elastic Mar 14, 2019
c77d1d1
enforce leniency with prefix queries - MultiMatchQueryBuilderTests pa…
andyb-elastic Mar 14, 2019
acaaf8e
fix checkstyle
andyb-elastic Mar 14, 2019
cfab648
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 14, 2019
0ed9201
respect autoGenerateMultiTermSynonymPhraseQuery
andyb-elastic Mar 15, 2019
1201135
checkstyle
andyb-elastic Mar 15, 2019
bee66d6
fix bwc skips
andyb-elastic Mar 18, 2019
0e887c5
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 18, 2019
d7981df
fix skip reasons
andyb-elastic Mar 18, 2019
d26ce8a
remove test logging
andyb-elastic Mar 18, 2019
2bc13d2
multi match yaml tests
andyb-elastic Mar 18, 2019
d95199d
highlighting tests for multi match
andyb-elastic Mar 18, 2019
5e2ce2b
fix method name
andyb-elastic Mar 18, 2019
e249c12
multimatch bool_prefix test for fieldmapper
andyb-elastic Mar 18, 2019
1bcc7f6
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 19, 2019
30cb8b1
clarify bool prefix query builder tests
andyb-elastic Mar 19, 2019
cf8c1e0
use the correct phrase in tests
andyb-elastic Mar 19, 2019
3c96949
bool_prefix docs
andyb-elastic Mar 20, 2019
78b150e
yaml tests for multi_match bool_prefix
andyb-elastic Mar 21, 2019
96ce9eb
notes about what multi_match params we support with bool_prefix
andyb-elastic Mar 21, 2019
e45935e
notes on supported multi_match parameters
andyb-elastic Mar 21, 2019
89982f2
explicitly disallow unsupported multimatch params
andyb-elastic Mar 21, 2019
9dcd698
examples in sayt doc page
andyb-elastic Mar 22, 2019
b9f49a6
point other doc pages to new sayt doc
andyb-elastic Mar 22, 2019
6cc7935
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 22, 2019
df56e88
docs fixes
andyb-elastic Mar 22, 2019
4fa005c
default max shingle size in docs
andyb-elastic Mar 22, 2019
2c10385
fix reference to scoring mode
andyb-elastic Mar 22, 2019
02383c8
account for disallowed params
andyb-elastic Mar 22, 2019
9691f4a
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 22, 2019
5ff97b1
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 25, 2019
a9abdff
rename server
andyb-elastic Mar 25, 2019
f36d9c1
rename module
andyb-elastic Mar 25, 2019
5003046
rename docs and some fixes
andyb-elastic Mar 25, 2019
52d4442
support operator
andyb-elastic Mar 26, 2019
5d891a0
support fuzziness
andyb-elastic Mar 26, 2019
41a7530
fix searchmoduletests
andyb-elastic Mar 26, 2019
a146fac
Merge branch 'master' into feature-search-as-you-type-fieldmapper
andyb-elastic Mar 27, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/reference/mapping/types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>

<<sparse-vector>>:: Record sparse vectors of float values.

<<search-as-you-type>>:: A text-like field optimized for queries to implement as-you-type completion

[float]
=== Multi-fields
Expand Down Expand Up @@ -110,3 +111,5 @@ include::types/rank-features.asciidoc[]
include::types/dense-vector.asciidoc[]

include::types/sparse-vector.asciidoc[]

include::types/search-as-you-type.asciidoc[]
267 changes: 267 additions & 0 deletions docs/reference/mapping/types/search-as-you-type.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
[[search-as-you-type]]
=== Search as you type datatype

experimental[]

The `search_as_you_type` field type is a text-like field that is optimized to
provide out-of-the-box support for queries that serve an as-you-type completion
use case. It creates a series of subfields that are analyzed to index terms
that can be efficiently matched by a query that partially matches the entire
indexed text value. Both prefix completion (i.e matching terms starting at the
beginning of the input) and infix completion (i.e. matching terms at any
position within the input) are supported.

When adding a field of this type to a mapping

[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"properties": {
"my_field": {
"type": "search_as_you_type"
}
}
}
}
--------------------------------------------------
// CONSOLE

This creates the following fields

[horizontal]

`my_field`::

Analyzed as configured in the mapping. If an analyzer is not configured,
the default analyzer for the index is used

`my_field._2gram`::

Wraps the analyzer of `my_field` with a shingle token filter of shingle
size 2
jimczi marked this conversation as resolved.
Show resolved Hide resolved

`my_field._3gram`::

Wraps the analyzer of `my_field` with a shingle token filter of shingle
size 3

`my_field._index_prefix`::

Wraps the analyzer of `my_field._3gram` with an edge ngram token filter


Setting a larger `max_shingle_size` in the mapping will create shingle fields
with larger sized shingles, of 2 up to `max_shingle_size`. The
`my_field._index_prefix` subfield will always use the analyzer from the
subfield with the largest shingle size. For example the mapping

[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"properties": {
"my_field": {
"type": "search_as_you_type",
"max_shingle_size": 4
andyb-elastic marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
}
--------------------------------------------------
// CONSOLE

Will create the fields as such. Note how this changes how the
`my_field._index_prefix` field is analyzed

[horizontal]

`my_field`::

Analyzed as configured in the mapping. If an analyzer is not configured,
the default analyzer for the index is used

`my_field._2gram`::

Wraps the analyzer of `my_field` with a shingle token filter of shingle
size 2

`my_field._3gram`::

Wraps the analyzer of `my_field` with a shingle token filter of shingle
size 3

`my_field._4gram`::

Wraps the analyzer of `my_field` with a shingle token filter of shingle
size 4

`my_field._index_prefix`::

Wraps the analyzer of `my_field._4gram` with an edge ngram token filter


The same input text is indexed into each of these fields automatically, with
their differing analysis chains, when an indexed document has a value for the
root field `my_field`.

[source,js]
--------------------------------------------------
PUT my_index/_doc/1?refresh
{
"my_field": "quick red fox lazy brown dog"
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

To search for documents with values of `my_field` that can complete the
query text, use a `match_phrase_prefix` query

[source,js]
--------------------------------------------------
GET my_index/_search
{
"query": {
"match_phrase_prefix": {
"my_field": "red f"
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

jimczi marked this conversation as resolved.
Show resolved Hide resolved
[source,js]
--------------------------------------------------
{
"took" : 44,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.8630463,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.8630463,
"_source" : {
"my_field" : "quick red fox lazy brown dog"
}
}
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took" : 44/"took" : $body.took/]
// TESTRESPONSE[s/"max_score" : 0.8630463/"max_score" : $body.hits.max_score/]
// TESTRESPONSE[s/"_score" : 0.8630463/"_score" : $body.hits.hits.0._score/]

[[specific-params]]
==== Parameters specific to the `search_as_you_type` field

The following parameters are accepted in a mapping for the `search_as_you_type`
field and are specific to this field type

[horizontal]

`max_shingle_size`::

The largest shingle size to index the input with and create subfields for,
creating one subfield for each shingle size between 2 and
`max_shingle_size`. Accepts integer values between 2 and 4 inclusive. This
option defaults to 3.

jimczi marked this conversation as resolved.
Show resolved Hide resolved

[[general-params]]
==== Parameters of the field type as a text field
jimczi marked this conversation as resolved.
Show resolved Hide resolved

The following parameters are accepted in a mapping for the `search_as_you_type`
field due to its nature as a text-like field, and behave similarly to their
behavior when configuring a field of the <<text,`text`>> datatype. Unless
otherwise noted, these options configure the root fields subfields in
the same way.

<<analyzer,`analyzer`>>::

The <<analysis,analyzer>> which should be used for
<<mapping-index,`analyzed`>> string fields, both at index-time and at
search-time (unless overridden by the
<<search-analyzer,`search_analyzer`>>). Defaults to the default index
analyzer, or the <<analysis-standard-analyzer,`standard` analyzer>>.

<<mapping-index,`index`>>::

Should the field be searchable? Accepts `true` (default) or `false`.

<<index-options,`index_options`>>::

What information should be stored in the index, for search and highlighting
purposes. Defaults to `positions`.

<<norms,`norms`>>::

Whether field-length should be taken into account when scoring queries.
Accepts `true` or `false`. This option configures the root field
and shingle subfields, where its default is `true`. It does not configure
the prefix subfield, where it it `false`.

<<mapping-store,`store`>>::

Whether the field value should be stored and retrievable separately from
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
(default). This option only configures the root field, and does not
configure any subfields.

<<search-analyzer,`search_analyzer`>>::

The <<analyzer,`analyzer`>> that should be used at search time on
<<mapping-index,`analyzed`>> fields. Defaults to the `analyzer` setting.

<<search-quote-analyzer,`search_quote_analyzer`>>::

The <<analyzer,`analyzer`>> that should be used at search time when a
phrase is encountered. Defaults to the `search_analyzer` setting.

<<similarity,`similarity`>>::

Which scoring algorithm or _similarity_ should be used. Defaults
to `BM25`.

<<term-vector,`term_vector`>>::

Whether term vectors should be stored for an <<mapping-index,`analyzed`>>
field. Defaults to `no`. This option configures the root field and shingle
subfields, but not the prefix subfield.


[[prefix-queries]]
==== Optimization of prefix queries

When making a <<query-dsl-prefix-query,`prefix`>> query to the root field or
any of its subfields, the query will be rewritten to a
<<query-dsl-term-query,`term`>> query on the `._index_prefix` subfield. This
matches more efficiently than is typical of `prefix` queries on text fields,
as prefixes up to a certain length of each shingle are indexed directly as
terms in the `._index_prefix` subfield.

The analyzer of the `._index_prefix` subfield slightly modifies the
shingle-building behavior to also index prefixes of the terms at the end of the
field's value that normally would not be produced as shingles. For example, if
the value `quick red fox` is indexed into a `search_as_you_type` field with
`max_shingle_size` of 3, prefixes for `red fox` and `fox` are also indexed into
the `._index_prefix` subfield even though they do not appear as terms in the
`._3gram` subfield. This allows for completion of all the terms in the field's
input.
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ public Map<String, Mapper.TypeParser> getMappers() {
mappers.put(RankFeaturesFieldMapper.CONTENT_TYPE, new RankFeaturesFieldMapper.TypeParser());
mappers.put(DenseVectorFieldMapper.CONTENT_TYPE, new DenseVectorFieldMapper.TypeParser());
mappers.put(SparseVectorFieldMapper.CONTENT_TYPE, new SparseVectorFieldMapper.TypeParser());
mappers.put(SearchAsYouTypeFieldMapper.CONTENT_TYPE, new SearchAsYouTypeFieldMapper.TypeParser());
return Collections.unmodifiableMap(mappers);
}

Expand Down
Loading