[Feature][function_score] Limit individual function's score in functions array #17348

sean-cherbone · 2016-03-25T16:16:28Z

Currently, there does not appear to be a way to place an upper or lower bound on an individual function within a function_score functions array. It would be nice to be able to place either a max or min limit on the individual function to prevent something like a field_value_factor from overshadowing other more relevant signals.

Example function_score:

"function_score": {
"query": {},
"boost": "boost for the whole query",
"functions": [
{
"filter": {},
"FUNCTION": {},
"weight": number,
"min_score": number // New Feature
},
{
"FUNCTION": {},
"max_score": number // New Feature
},
{
"filter": {},
"weight": number
}
],
"max_boost": number,
"score_mode": "(multiply|max|...)",
"boost_mode": "(multiply|replace|...)",
"min_score" : number
}

clintongormley · 2016-03-25T17:34:17Z

My initial thought was that this could be done easily with a script. My next thought was that, actually, this could be generally useful without having to resort to scripting. eg a gaussian decay can end up returning zero which, when multiplied by other factors....

sean-cherbone · 2016-03-25T18:03:50Z

I too had considered script (and may still use it if needed) but feel that limiting low priority signal strength is a sufficiently straight forward need that functions can benefit from it.

For example, let's say I have the following factors that could indicate relevance:

long_view_count
short_view_count
share_count
tweet_count
facebook_count
up_vote_count
down_vote_count
etc...

Here are a couple of conditions that could cause problems with this scheme:

People discover this and exploit this ranking feature to boost their irrelevant document to the top.
A marginally relevant document that has been around for awhile and is well advertised overtakes a much more relevant but new or less advertised document.

clintongormley · 2016-03-25T18:07:48Z

Well, normally you'd use a log function so that each value counts for less the higher it goes (ie the first 5 votes count a lot, but votes 100+ count for little more)

sean-cherbone · 2016-03-25T18:22:06Z

Agreed, and I am using such tapering modifiers as well but here is another example that may help.

Let's say I also want to factor in cost. For most documents $10 to $100 is typical. I would expect that range to have a linear curve, representing how the average person feels about spending money on non-essentials. Now lets say that there is something that costs $1000 or $10,000, those are so far beyond the typical reach of most people that they are essentially the same but taking the log of each returns a substantial difference. Placing a max limit here would allow me to truncate these outliers as a way of saying, "they're high cost but potentially still relevant" and leaving it at that rather then (in my case) driving them way down in the relevance scale even if the cost is completely reasonable for this type of document.

I should also point out that in the case of votes, those too can be a problem, even when scaled down with log. I want to give new documents a fighting chance of being seen even though they start out with 0 votes. If some other documents that is years old has many thousands of votes, even taking the log of that will create a major boost over the new document. Here again, it may be appropriate to say that those documents having 10 to 100 votes may be proportionately more relevant but all documents having a vote count of 100 or more may be considered simply "popular" without overwhelming the other signals.

javanna · 2018-03-16T11:05:50Z

@elastic/es-search-aggs

mayya-sharipova · 2018-06-01T14:30:54Z

I am closing this issue. As we are currently working on redesigning FunctionScore query, and one of the features we are considering is the normalization of scores, which when/if implemented would address this issue as well.
#30303

clintongormley added >enhancement discuss :Query DSL labels Mar 25, 2016

$@polyfractal$ polyfractal mentioned this issue Nov 29, 2017

Replacing the function_score with discrete queries #27588

Closed

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018

mayya-sharipova closed this as completed Jun 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][function_score] Limit individual function's score in functions array #17348

[Feature][function_score] Limit individual function's score in functions array #17348

sean-cherbone commented Mar 25, 2016

clintongormley commented Mar 25, 2016

sean-cherbone commented Mar 25, 2016

clintongormley commented Mar 25, 2016

sean-cherbone commented Mar 25, 2016

javanna commented Mar 16, 2018

mayya-sharipova commented Jun 1, 2018

[Feature][function_score] Limit individual function's score in functions array #17348

[Feature][function_score] Limit individual function's score in functions array #17348

Comments

sean-cherbone commented Mar 25, 2016

clintongormley commented Mar 25, 2016

sean-cherbone commented Mar 25, 2016

clintongormley commented Mar 25, 2016

sean-cherbone commented Mar 25, 2016

javanna commented Mar 16, 2018

mayya-sharipova commented Jun 1, 2018