-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][function_score] Limit individual function's score in functions array #17348
Comments
My initial thought was that this could be done easily with a script. My next thought was that, actually, this could be generally useful without having to resort to scripting. eg a gaussian decay can end up returning zero which, when multiplied by other factors.... |
I too had considered script (and may still use it if needed) but feel that limiting low priority signal strength is a sufficiently straight forward need that functions can benefit from it. For example, let's say I have the following factors that could indicate relevance:
Here are a couple of conditions that could cause problems with this scheme:
|
Well, normally you'd use a log function so that each value counts for less the higher it goes (ie the first 5 votes count a lot, but votes 100+ count for little more) |
Agreed, and I am using such tapering modifiers as well but here is another example that may help. Let's say I also want to factor in cost. For most documents $10 to $100 is typical. I would expect that range to have a linear curve, representing how the average person feels about spending money on non-essentials. Now lets say that there is something that costs $1000 or $10,000, those are so far beyond the typical reach of most people that they are essentially the same but taking the log of each returns a substantial difference. Placing a max limit here would allow me to truncate these outliers as a way of saying, "they're high cost but potentially still relevant" and leaving it at that rather then (in my case) driving them way down in the relevance scale even if the cost is completely reasonable for this type of document. I should also point out that in the case of votes, those too can be a problem, even when scaled down with log. I want to give new documents a fighting chance of being seen even though they start out with 0 votes. If some other documents that is years old has many thousands of votes, even taking the log of that will create a major boost over the new document. Here again, it may be appropriate to say that those documents having 10 to 100 votes may be proportionately more relevant but all documents having a vote count of 100 or more may be considered simply "popular" without overwhelming the other signals. |
@elastic/es-search-aggs |
I am closing this issue. As we are currently working on redesigning FunctionScore query, and one of the features we are considering is the normalization of scores, which when/if implemented would address this issue as well. |
Currently, there does not appear to be a way to place an upper or lower bound on an individual function within a function_score functions array. It would be nice to be able to place either a max or min limit on the individual function to prevent something like a field_value_factor from overshadowing other more relevant signals.
Example function_score:
"function_score": {
"query": {},
"boost": "boost for the whole query",
"functions": [
{
"filter": {},
"FUNCTION": {},
"weight": number,
"min_score": number // New Feature
},
{
"FUNCTION": {},
"max_score": number // New Feature
},
{
"filter": {},
"weight": number
}
],
"max_boost": number,
"score_mode": "(multiply|max|...)",
"boost_mode": "(multiply|replace|...)",
"min_score" : number
}
The text was updated successfully, but these errors were encountered: