Skip to content
This repository has been archived by the owner on Apr 11, 2022. It is now read-only.

Full title match is better than split match #739

Merged
merged 21 commits into from
Dec 4, 2017

Conversation

leonardr
Copy link
Contributor

@leonardr leonardr commented Nov 30, 2017

This branch adds two new ways a book might show up in search results: its title or one of its authors might be a near-exact match against the search term.

This should resolve the 'modern romance' and 'law of the mountain man' problems described in #482, where a book whose title includes a genre loses out to books that are in that genre. It may also resolve some of the 'game of thrones' problems described in that issue.

This branch changes the Elasticsearch configuration used by core during Travis runs so that everything runs in a single shard; this eliminates the possibility that search result ordering will be unpredictable because of how documents are assigned to shards.

This branch also changes the way Elasticsearch is configured: instead of separate configuration settings for the alias and the index, a single configuration setting controls the prefix used for both the alias and the index. This means that changing the alias to point to some other index is currently an unsupported operation -- it's supported by the code (transfer_current_alias) but nobody ever calls that method.

This is a good time to make that configuration change because this branch introduces a v3 of the search document, which requires that the entire index be rebuilt anyway. (The migration script should take care of this.)

@@ -453,6 +453,13 @@ def make_target_age_query(target_age):
match_phrase = make_phrase_query(query_string, ['title.minimal', 'author', 'series.minimal'])
must_match_options.append(match_phrase)

# An exact title or author match outweighs a match that is split
# across fields.
match_title = make_phrase_query(query_string, ['title.minimal'], 150)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to title.standard?

@aslagle
Copy link
Collaborator

aslagle commented Dec 1, 2017

I think you need a migration for this - for the configuration setting change, if nothing else.

@leonardr
Copy link
Contributor Author

leonardr commented Dec 1, 2017

I forgot to add the migration script. All it does is create a brand new index using the default prefix. I don't think there are any existing instances where this won't work, and if there are any, we don't know enough about their setup to automatically migrate them.

@leonardr
Copy link
Contributor Author

leonardr commented Dec 1, 2017

I could also add a migration script to remove the old configuration settings, just to avoid any confusion when looking at the database in the future.

)

# TODO: Uncomment these lines and the 'modern romance'
# test fails for some reason.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you fixed this!

"standard": {
"type": "string",
"analyzer": "standard"
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that's different.

@leonardr leonardr merged commit e5c4c26 into master Dec 4, 2017
@leonardr leonardr removed the in review label Dec 4, 2017
@leonardr leonardr deleted the full-title-match-is-better-than-split-match branch December 4, 2017 14:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants