Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incorrect score when vector field is inside nested type #751

Closed
martin-gaievski opened this issue Feb 7, 2023 · 0 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@martin-gaievski
Copy link
Member

What is the bug?
KNN return incorrect score for docs in search query response. Same setup and data works fine on 2.3 and above.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create knn index with nmslib vector field, cosine similarity space with default params similar to one below:
{
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "knn" : "true",
        "number_of_replicas" : "0"
      }
    },
    "mappings" : {
      "properties" : {
        "qid" : {
          "type" : "keyword"
        },
        "questions" : {
          "type" : "nested",
          "properties" : {
            "q" : {
              "type" : "text"
            },
            "q_vector" : {
              "type" : "knn_vector",
              "dimension" : 1024,
              "method" : {
                "engine" : "nmslib",
                "space_type" : "cosinesimil",
                "name" : "hnsw",
                "parameters" : { }
              }
            }
          }
        }
      }
    }
}
  1. Upload 11 vectors using _bulk upload API, make 3 of docs with identical vector data
{"index":{"_index":"test","_id":"0.lastitem"}}
{"qid":"0.lastitem","questions":[{"q_vector":[<vector_data>]}]}
....
  1. Run search query, set vector data exactly same as in those 3 docs mentioned in a previous step
{
    "_source": {
        "exclude": [
            "a_vector",
            "questions.q_vector"
        ]
    },
    "from": "0",
    "size": "10",
    "query": {
        "bool": {
            "should": [
                {
                    "function_score": {
                        "query": {
                            "nested": {
                                "score_mode": "max",
                                "path": "questions",
                                "query": {
                                    "knn": {
                                        "questions.q_vector": {
                                            "k": 5,
                                            "vector": [
                                           <vector_data>    
                                            ]
                                        }
                                    }
                                }
                            }
                        },
                        "weight": 1
                    }
                }
            ]
        }
    }
}

Getting response similar to one below:

{
    "took": 177,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 7,
            "relation": "eq"
        },
        "max_score": 0.99028254,
        "hits": [
            {
                "_index": "test",
                "_id": "0.CostNoTopic",
                "_score": 0.99028254,
                "_source": {
                    "questions": [],
                    "qid": "0.CostNoTopic"
                }
            },
            {
                "_index": "test",
                "_id": "Alexa.Cost",
                "_score": 0.99028254,
                "_source": {
                    "questions": [],
                    "qid": "Alexa.Cost"
                }
            },
            {
                "_index": "test",
                "_id": "Bot.Cost",
                "_score": 0.99028254,
                "_source": {
                    "questions": [],
                    "qid": "Bot.Cost"
                }
            },
            {
                "_index": "test",
                "_id": "0.gettysburg",
                "_score": 0.99028254,
                "_source": {
                    "questions": [],
                    "qid": "0.gettysburg"
                }
            },
...
        ]
    }
}

What is the expected behavior?
Expected to have only three docs with the score very close to 1.0 (0.99028254 in my example). Actual response got 4 docs with the same score 0.99028254. For the last doc of those 4 with id "0.gettysburg" score calculated incorrectly, it should be about 0.7301.

What is your host/environment?

  • knn 1.3.7, 1 data node, 2 master nodes, 1 shard, 1 replica. Same incorrect behavior is on 2.0, 2.2. 2.3 works as expected.

Do you have any additional context?

  • Issue reproducible on 1.3 and some of 2.x - 2.2 and 2.0, same setup works fine on 2.3 +
  • If doc in question deleted and uploaded using single POST system works as expected
  • I tried on dev environment using test cluster, after shooting search query cluster crushes with error
»  java.lang.AssertionError: Sub-iterators of ConjunctionDISI are not on the same document!
»       at org.apache.lucene.search.ConjunctionDISI$BitSetConjunctionDISI.nextDoc(ConjunctionDISI.java:303)
»       at org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer.setScoreAndFreq(ToParentBlockJoinQuery.java:356)
»       at org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer.score(ToParentBlockJoinQuery.java:331)
»       at org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
»       at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:258)
»       at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:245)
»       at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:69)
»       at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
»       at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:253)
»       at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:226)
»       at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
»       at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:312)
»       at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:269)
»       at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:146)
»       at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:441)
»       at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:505)
»       at org.opensearch.search.SearchService.access$500(SearchService.java:154)
»       at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:474)
»       at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:71)
»       at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:86)
»       at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
»       at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:57)
»       at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792)
»       at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
»       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
»       at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
»       at java.base/java.lang.Thread.run(Thread.java:829)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant