Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function score queries with score_mode sum and a total weight of 0 always return a score of 1 #24910

Closed
csben opened this issue May 26, 2017 · 9 comments
Labels
>bug feedback_needed :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@csben
Copy link

csben commented May 26, 2017

Elasticsearch version: 5.1.2 and 5.3.2

JVM version: 1.8.0_111 and 1.8.0_131

OS version: 4.4.0-75-generic and 2.6.32-642.13.1.el6.x86_64

Description of the problem including expected versus actual behavior:
Consider the following query:

curl -XPOST http://localhost:9200/test/_search -d '{
  "query": {
    "function_score": {
      "query": { "match_all": {} },
        "functions": [
             {
                "filter": { "match": { "test_value": 1 } },
                "weight": 1
             },
             {
                "filter": { "match": { "test_value":1 } },
                "script_score": {
                  "script": "-1"
                }, 
                "weight": -1
             }
        ],
        "score_mode": "sum",
        "boost_mode": "replace"
      }
  }
}' | jq 

In principle, this should produce a score of 2 (with a contribution of 1 from each filter) for any event with test_value = 1. However, this query will always produce a score of 1. Any pair of functions with a weight of 1 and weight of -1 will produce a score of 1. Any set of functions where the sum of the weights is 0 will produce a score of 1.

I think the bug is in FiltersFunctionScoreQuery.java around line 311.

                default: // Avg / Total
                    double totalFactor = 0.0f;
                    double weightSum = 0;
                    for (int i = 0; i < filterFunctions.length; i++) {
                        if (docSets[i].get(docId)) {
                            totalFactor += functions[i].score(docId, subQueryScore);
                            if (filterFunctions[i].function instanceof WeightFactorFunction) {
                                weightSum += ((WeightFactorFunction) filterFunctions[i].function).getWeight();
                            } else {
                                weightSum += 1.0;
                            }
                        }
                    }
                    if (weightSum != 0) {
                        factor = totalFactor;
                        if (scoreMode == ScoreMode.AVG) {
                            factor /= weightSum;
                        }
                    }
                    break;
            }
            return factor;

There is an if-statement checking if the weightSum is zero, presumably to avoid a divide by zero when computing the weighted average. However, since the same block of code is used to compute a straightforward weighted sum (not just a weighted average), this if-statement results in the method returning the default value for factor which is 1. I believe if we are computing a weighted sum, we should just set factor = totalFactor with no dependence on the value of weightSum.

Steps to reproduce:

curl -XPUT http://localhost:9200/test -d '{
  "mappings": {
    "test": {
      "properties": {
        "test_value": {
          "type": "integer"
        }
      }
    }
  }
}'

curl -XPOST http://localhost:9200/test/test -d '{
  "test_value": 1
}'

# Returns a score of 2.1 as expected
curl -XPOST http://localhost:9200/test/_search -d '{
  "query": {
    "function_score": {
      "query": { "match_all": {} },
        "functions": [
             {
                "filter": { "match": { "test_value": 1 } },
                "weight": 1
             },
             {
                "filter": { "match": { "test_value":1 } },
                "script_score": {
                  "script": "-1"
                }, 
                "weight": -1.1
             }
        ],
        "score_mode": "sum",
        "boost_mode": "replace"
      }
  }
}' | jq 

# Returns a score of 1 instead of the expected score of 2
curl -XPOST http://localhost:9200/test/_search -d '{
  "query": {
    "function_score": {
      "query": { "match_all": {} },
        "functions": [
             {
                "filter": { "match": { "test_value": 1 } },
                "weight": 1
             },
             {
                "filter": { "match": { "test_value":1 } },
                "script_score": {
                  "script": "-1"
                }, 
                "weight": -1
             }
        ],
        "score_mode": "sum",
        "boost_mode": "replace"
      }
  }
}' | jq 
@markharwood markharwood added >bug :Search/Search Search-related issues that do not fall into other categories labels May 31, 2017
@markharwood
Copy link
Contributor

The only thing I'd question here is whether it makes sense to support negative weights in the query requests in the first place.

@csben
Copy link
Author

csben commented Jun 1, 2017

I feel like there's a mathematical argument against supporting negative weights, but that in practice, they're kind of important. In particular, we ran into this while computing sums of log probability scores using field_value_factor functions with a modifier of LOG1P. For values of P in (0,1), each LOG1P is going to be negative, so a negative weight is required when summing. This happens to both be a fairly standard scoring method and one that appears to be intentionally supported by the modifiers available for field_value_factor.

@david206
Copy link
Contributor

actually, all the score_mode returns default value of 1 which doesn't make sense for boost_mode other then multiply

@lmenezes
Copy link
Contributor

@markharwood Are there plans to work/fix this? Would you welcome a PR and guide a bit how this should actually work? :)

@markharwood
Copy link
Contributor

I know there was some discussion recently on changing/improving FunctionScore - perhaps by beefing up the context passed to scripts such that common functions like gauss would be accessible to scripts. @colings86 - do you have any input on these plans?

@andyb-elastic
Copy link
Contributor

@elastic/es-search-aggs @mayya-sharipova @polyfractal

@mayya-sharipova
Copy link
Contributor

@csben Thanks for your input, and you are right about the code! We still need to discuss the idea of negative weights though.

@lmenezes The only reason we would to hold off with PRs for function score is we may redesign Function score. We have a new issue for this: #27588

@jpountz jpountz added :Search Relevance/Ranking Scoring, rescoring, rank evaluation. and removed :Search/Search Search-related issues that do not fall into other categories labels Apr 25, 2018
@mayya-sharipova
Copy link
Contributor

We have discussed in our meeting, and decided NOT to allow negative values for weights in Function Score Query: #31927. Lucene is going towards not allowing negative scores, and in Lucene 8 negative scores will be completely forbidden. We would like to go along this road as well.

@csben would it be possible for you to redesign your query to use painless script instead so that you can control not to output negative scores at all? In this case you wouldn't need to multiply scores by negative weight.

@mayya-sharipova
Copy link
Contributor

Closing this ticket, as there is no further user feedback.

@javanna javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug feedback_needed :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

9 participants