-
Notifications
You must be signed in to change notification settings - Fork 11
Full title match is better than split match #739
Conversation
external_search.py
Outdated
@@ -453,6 +453,13 @@ def make_target_age_query(target_age): | |||
match_phrase = make_phrase_query(query_string, ['title.minimal', 'author', 'series.minimal']) | |||
must_match_options.append(match_phrase) | |||
|
|||
# An exact title or author match outweighs a match that is split | |||
# across fields. | |||
match_title = make_phrase_query(query_string, ['title.minimal'], 150) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to title.standard
?
…bject is created.
…as to having a single setting for the prefix. Index and alias are always calculated from the prefix.
…s the other queries get the same boost it works out.
I think you need a migration for this - for the configuration setting change, if nothing else. |
I forgot to add the migration script. All it does is create a brand new index using the default prefix. I don't think there are any existing instances where this won't work, and if there are any, we don't know enough about their setup to automatically migrate them. |
I could also add a migration script to remove the old configuration settings, just to avoid any confusion when looking at the database in the future. |
tests/test_external_search.py
Outdated
) | ||
|
||
# TODO: Uncomment these lines and the 'modern romance' | ||
# test fails for some reason. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you fixed this!
"standard": { | ||
"type": "string", | ||
"analyzer": "standard" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part that's different.
This branch adds two new ways a book might show up in search results: its title or one of its authors might be a near-exact match against the search term.
This should resolve the 'modern romance' and 'law of the mountain man' problems described in #482, where a book whose title includes a genre loses out to books that are in that genre. It may also resolve some of the 'game of thrones' problems described in that issue.
This branch changes the Elasticsearch configuration used by core during Travis runs so that everything runs in a single shard; this eliminates the possibility that search result ordering will be unpredictable because of how documents are assigned to shards.
This branch also changes the way Elasticsearch is configured: instead of separate configuration settings for the alias and the index, a single configuration setting controls the prefix used for both the alias and the index. This means that changing the alias to point to some other index is currently an unsupported operation -- it's supported by the code (transfer_current_alias) but nobody ever calls that method.
This is a good time to make that configuration change because this branch introduces a v3 of the search document, which requires that the entire index be rebuilt anyway. (The migration script should take care of this.)