-
Notifications
You must be signed in to change notification settings - Fork 11
Searches that don't work well #482
Comments
"python programming" Lots of good results in the first two dozen, then hits a bumpy long tail with matches on titles "Phthor" and "Pythias" sorting above titles like "Think Python: How to think like a Computer," "Python Data Science Handbook," etc. |
Looking more closely at "Steve Berry" and other search examples, I think the following changes, combined or separately tried, might help filter out some of the louder noise in the Elasticsearch results.
Reviewing some of the options for the tokenizers being used when indexing might help weed out other tangential query results. For example, from https://www.elastic.co/blog/found-fuzzy-search:
|
Searches for book titles that include genre names don't work well. For example, searching for "modern romance" turns up several pages of romance books before it turns up the book called Modern Romance. The romance books generally have "modern" in their subtitle and/or series names. |
|
Searching for
|
The first result for |
Amy suggests rearchitecting search to use a suggester rather than fixing these problems by changing the search query. |
Searching for an age range like "age 3-5" privileges works that fit that age range, but it also introduces titles that match '3' or '5', e.g. in a subtitle that says "X Series Volume 3". This can introduce titles that are not appropriate for the age range. It's worth investigating whether we should short-circuit our usual technique of treating a search query like "age 3-5" or "romance" as either an advanced search directive or a normal full-text search term. |
I'm in a position where I can directly compare the new search algorithm to the old one, so I'm giving an update on how we're doing so far and how much of the improvement is due to the new algorithm.
In the new algorithm, In the new algorithm, Both algorithms treat The old algorithm handles Both algorithms perform well on The new algorithm performs significantly worse on |
Rebuilding my test search index fixes |
In general, if you search for a specific title/author/series, you now get a bunch of really good results and then suddenly the results become awful. There is an abrupt dropoff in result quality. If you search for a topic like "python programming" then you start off with good results and eventually start seeing poor-quality results start being merged into the good results. There's not a gradual dropoff in result quality. If you search for a single word you're likely to get fuzz errors. So overall, an improvement, but obviously not the last word in search. |
I figured out why I had the missing items in the index; suffice to say the items were missing from the index, but the cause was user error, not a problem in the code, and rebuilding the index was the right thing to do. |
cf. #443 |
cf. #159 |
I'm making this issue to keep track of specific searches that don't work well.
Here's a report from an app review:
"I looked up books by M.J. Rose and it gave me one, but when I looked up specific titles by the author they had them! I looked up Steve Berry, it gave me a few of his books and then others by authors whose names were not even close to it."
The text was updated successfully, but these errors were encountered: