Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][function_score] Limit individual function's score in functions array #17348

Closed
sean-cherbone opened this issue Mar 25, 2016 · 6 comments
Labels
discuss >enhancement :Search/Search Search-related issues that do not fall into other categories

Comments

@sean-cherbone
Copy link

Currently, there does not appear to be a way to place an upper or lower bound on an individual function within a function_score functions array. It would be nice to be able to place either a max or min limit on the individual function to prevent something like a field_value_factor from overshadowing other more relevant signals.

Example function_score:

"function_score": {
"query": {},
"boost": "boost for the whole query",
"functions": [
{
"filter": {},
"FUNCTION": {},
"weight": number,
"min_score": number // New Feature
},
{
"FUNCTION": {},
"max_score": number // New Feature
},
{
"filter": {},
"weight": number
}
],
"max_boost": number,
"score_mode": "(multiply|max|...)",
"boost_mode": "(multiply|replace|...)",
"min_score" : number
}

@clintongormley
Copy link
Contributor

My initial thought was that this could be done easily with a script. My next thought was that, actually, this could be generally useful without having to resort to scripting. eg a gaussian decay can end up returning zero which, when multiplied by other factors....

@sean-cherbone
Copy link
Author

I too had considered script (and may still use it if needed) but feel that limiting low priority signal strength is a sufficiently straight forward need that functions can benefit from it.

For example, let's say I have the following factors that could indicate relevance:

  • long_view_count
  • short_view_count
  • share_count
  • tweet_count
  • facebook_count
  • up_vote_count
  • down_vote_count
  • etc...

Here are a couple of conditions that could cause problems with this scheme:

  • People discover this and exploit this ranking feature to boost their irrelevant document to the top.
  • A marginally relevant document that has been around for awhile and is well advertised overtakes a much more relevant but new or less advertised document.

@clintongormley
Copy link
Contributor

Well, normally you'd use a log function so that each value counts for less the higher it goes (ie the first 5 votes count a lot, but votes 100+ count for little more)

@sean-cherbone
Copy link
Author

Agreed, and I am using such tapering modifiers as well but here is another example that may help.

Let's say I also want to factor in cost. For most documents $10 to $100 is typical. I would expect that range to have a linear curve, representing how the average person feels about spending money on non-essentials. Now lets say that there is something that costs $1000 or $10,000, those are so far beyond the typical reach of most people that they are essentially the same but taking the log of each returns a substantial difference. Placing a max limit here would allow me to truncate these outliers as a way of saying, "they're high cost but potentially still relevant" and leaving it at that rather then (in my case) driving them way down in the relevance scale even if the cost is completely reasonable for this type of document.

I should also point out that in the case of votes, those too can be a problem, even when scaled down with log. I want to give new documents a fighting chance of being seen even though they start out with 0 votes. If some other documents that is years old has many thousands of votes, even taking the log of that will create a major boost over the new document. Here again, it may be appropriate to say that those documents having 10 to 100 votes may be proportionately more relevant but all documents having a vote count of 100 or more may be considered simply "popular" without overwhelming the other signals.

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
@javanna
Copy link
Member

javanna commented Mar 16, 2018

@elastic/es-search-aggs

@mayya-sharipova
Copy link
Contributor

I am closing this issue. As we are currently working on redesigning FunctionScore query, and one of the features we are considering is the normalization of scores, which when/if implemented would address this issue as well.
#30303

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss >enhancement :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

4 participants