Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize phrase_prefix match query #31921

Closed
jimczi opened this issue Jul 10, 2018 · 4 comments · Fixed by #37436
Closed

Optimize phrase_prefix match query #31921

jimczi opened this issue Jul 10, 2018 · 4 comments · Fixed by #37436
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories

Comments

@jimczi
Copy link
Contributor

jimczi commented Jul 10, 2018

We can leverage #28290 to optimize phrase_prefix on match query. If the index_prefix option is set on the field we can query the last term using the prefix field, this should speed up the query significantly (a single term query) and increase the recall since all expansions would match.

@jimczi jimczi added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Jul 10, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@markharwood
Copy link
Contributor

this should speed up the query significantly (a single term query)

That makes sense to me. The search for "brown f*" would be able to match all docs with text._index_prefix field with the term f rather than searching text field for fab, fan etc

and increase the recall since all expansions would match.

This bit I don't get. The phrase_prefix docs already include a warning that the f* expansions tried are not verified to be prefixes that exist as a phrase with the previous term e.g. may suggest the non-existant brown fab and brown fan etc rather than the fox in the existing phrase brown fox.

To solve that problem shouldn't this issue be about using the index_phrases enabled fields rather than the index_prefix field?

@jimczi
Copy link
Contributor Author

jimczi commented Jul 11, 2018

This bit I don't get. The phrase_prefix docs already include a warning that the f* expansions tried are not verified to be prefixes that exist as a phrase with the previous term e.g. may suggest the non-existant brown fab and brown fan etc rather than the fox in the existing phrase brown fox.

When prefixes are not indexed we extract the first 50 terms that match the prefix by default. With indexed prefixes we can match all documents that contain the prefix with a single term which ensures that all documents that should match will be taken into account. We could also use a mix of index_phrases and index_prefixes to make the query faster but we can do that in a follow up.

@markharwood
Copy link
Contributor

Ah OK. so the phrase(/span/interval) query we run has a the full term (eg brown) and only the single prefix term (e.g. f) and both indexed terms share a common idea of position info that means we can test they occur next to each other.

jimczi added a commit to jimczi/elasticsearch that referenced this issue Jan 14, 2019
This change adds a way to customize how phrase prefix queries should be created
on field types. The match phrase prefix query is exposed in field types in order
to allow optimizations based on the options set on the field.
For instance the text field uses the configured prefix field (if available) to
build a span near that mixes the original field and the prefix field on the last
position.
This change also contains a small refactoring of the match/multi_match query that
simplifies the interactions between the builders.

Closes elastic#31921
jimczi added a commit to jimczi/elasticsearch that referenced this issue Jan 14, 2019
This change adds a way to customize how phrase prefix queries should be created
on field types. The match phrase prefix query is exposed in field types in order
to allow optimizations based on the options set on the field.
For instance the text field uses the configured prefix field (if available) to
build a span near that mixes the original field and the prefix field on the last
position.
This change also contains a small refactoring of the match/multi_match query that
simplifies the interactions between the builders.

Closes elastic#31921
jimczi added a commit that referenced this issue Jan 17, 2019
This change adds a way to customize how phrase prefix queries should be created
on field types. The match phrase prefix query is exposed in field types in order
to allow optimizations based on the options set on the field.
For instance the text field uses the configured prefix field (if available) to
build a span near that mixes the original field and the prefix field on the last
position.
This change also contains a small refactoring of the match/multi_match query that
simplifies the interactions between the builders.

Closes #31921
jimczi added a commit that referenced this issue Jan 18, 2019
This change adds a way to customize how phrase prefix queries should be created
on field types. The match phrase prefix query is exposed in field types in order
to allow optimizations based on the options set on the field.
For instance the text field uses the configured prefix field (if available) to
build a span near that mixes the original field and the prefix field on the last
position.
This change also contains a small refactoring of the match/multi_match query that
simplifies the interactions between the builders.

Closes #31921
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants