-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rescore collapsed documents #28521
Rescore collapsed documents #28521
Conversation
…84/elasticsearch into 27243_collapse_with_rescore
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
1 similar comment
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
@elasticmachine ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me.
I'll merge if the build passes with the changes, thanks @fred84 !
https://github.com/elastic/elasticsearch/blob/master/rest-api-spec/src/main/resources/rest-api-spec/test/search/110_field_collapsing.yml#L241 is failing. I think we could remove it, I've already added integration test for this behaviour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fred84, you can remove the failing tests, it is no longer needed. Though I left some comments regarding the IT test. I think it needs to be changed to ensure that scoring and ordering are consistent.
|
||
SearchResponse searchResponse = client().prepareSearch("test") | ||
.setTypes("type1") | ||
.setQuery(new MatchQueryBuilder("name", "one")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The score of this query depends on the number of shards, the default similarity, ... To make sure that we have consistent scoring you can use a function_score
query like the following:
QueryBuilder query = functionScoreQuery(
termQuery("name", "one"),
ScoreFunctionBuilders.fieldValueFactorFunction("my_static_doc_score")
).boostMode(CombineFunction.REPLACE);
... and add the my_static_doc_score
at indexing time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
SearchResponse searchResponse = client().prepareSearch("test") | ||
.setTypes("type1") | ||
.setQuery(new MatchQueryBuilder("name", "one")) | ||
.addRescorer(new QueryRescorerBuilder(new MatchQueryBuilder("name", "two"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the same for the rescore with another field for instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@jimczi Thanks for reviewing. I'll update PR next week. |
@jimczi PR updated, now integration test use static scoring. |
@elasticmachine ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fred84
This change adds the ability to rescore collapsed documents.
* master: [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse Decouple XContentType from StreamInput/Output (elastic#28927) Remove BytesRef usage from XContentParser and its subclasses (elastic#28792) [DOCS] Correct typo in configuration (elastic#28903) Fix incorrect datemath example (elastic#28904) Add a usage example of the JLH score (elastic#28905) Wrap stream passed to createParser in try-with-resources (elastic#28897) Rescore collapsed documents (elastic#28521) Fix (simple)_query_string to ignore removed terms (elastic#28871) [Docs] Fix typo in composite aggregation (elastic#28891) Try if tombstone is eligable for pruning before locking on it's key (elastic#28767)
I had to revert this change since it doesn't work as expected. I forgot that the collapsed values would also need to be resorted by the rescorer. We use these values in the coordinating node to collapse the results of each shard but the rescorer in Lucene cannot access them: elasticsearch/server/src/main/java/org/apache/lucene/search/grouping/CollapseTopFieldDocs.java Line 41 in 99f88f1
I am really sorry I missed that but since it would require a rewriting of the rescorer in Lucene and that the collapsing code is only in es I don't think it is worth the effort. |
Doesn't Solr support collapse + rescore (rerank)? The claim that Lucene's rescorer needs a rewrite seems dubious. |
I agree that we should be able to rescore collapsed documents but this is more high hanging fruit than I thought which is why I reverted and closed the issue for now (sorry @fred84 ). |
@jimczi let me now when I can start this issue again :) |
* es/master: (48 commits) Update bucket-sort-aggregation.asciidoc (#28937) [Docs] REST high-level client: Fix code for most basic search request (#28916) Improved percolator's random candidate query duel test and fixed bugs that were exposed by this: Revert "Rescore collapsed documents (#28521)" Build: Fix test logger NPE when no tests are run (#28929) [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse Decouple XContentType from StreamInput/Output (#28927) Remove BytesRef usage from XContentParser and its subclasses (#28792) [DOCS] Correct typo in configuration (#28903) Fix incorrect datemath example (#28904) Add a usage example of the JLH score (#28905) Wrap stream passed to createParser in try-with-resources (#28897) Rescore collapsed documents (#28521) Fix (simple)_query_string to ignore removed terms (#28871) [Docs] Fix typo in composite aggregation (#28891) Try if tombstone is eligable for pruning before locking on it's key (#28767) Limit analyzed text for highlighting (improvements) (#28808) Missing `timeout` parameter from the REST API spec JSON files (#28328) Clarifies how query_string splits textual part (#28798) Update outdated java version reference (#28870) ...
* es/6.x: (48 commits) Update bucket-sort-aggregation.asciidoc (#28937) [Docs] REST high-level client: Fix code for most basic search request (#28916) Improved percolator's random candidate query duel test and fixed bugs that were exposed by this: Revert "Rescore collapsed documents (#28521)" Build: Fix test logger NPE when no tests are run (#28929) [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse Decouple XContentType from StreamInput/Output (#28927) Remove BytesRef usage from XContentParser and its subclasses (#28792) Add doc note for -server flag on Windows service [DOCS] Correct typo in configuration (#28903) Fix incorrect datemath example (#28904) Add a usage example of the JLH score (#28905) Limit analyzed text for highlighting (improvements) (#28907) Wrap stream passed to createParser in try-with-resources (#28897) [Docs] Fix typo in composite aggregation (#28891) Rescore collapsed documents (#28521) Fix (simple)_query_string to ignore removed terms (#28871) Missing `timeout` parameter from the REST API spec JSON files (#28328) Clarifies how query_string splits textual part (#28798) Update outdated java version reference (#28870) ...
This change adds the ability to rescore collapsed documents.
This reverts commit f057fc2. The rescorer does not resort the collapsed values inside the top docs during rescoring. For this reason the Lucene rescorer is not compatible with collapsing. Relates elastic#27243
Sorry it took me some time to come back at this. I checked why Solr was able to rescore the collapsed documents seamlessly and found out that they force the routing of each group in a single shard. This means that all the documents belonging to a single group are on the same shard so the rescoring is always done on the final head of the group. In es we don't enforce the routing so each group can be spread over multiple shards. This complicates the rescoring since it is always applied at the shard level and in this case on the temporary head of the groups (we don't know the final head in the shard since another shard can contain a better document for that group). For this reason I am reluctant to add this functionality because it might be surprising to see a head in a group that is not the best document of that group in the final response. This can happen if the rescoring gives a score to a document in a shard that is better that the score of the best document in the group which is in another shard. I don't see how we could avoid this unless we force the routing of the groups. |
Is there a way to revisit this functionality? This limitation curtails the use of any LTR algorithms |
Excuse me, is there a way to apply rescoring before collapsing? |
I am looking for this option for LTR rescoring |
Hello, @jimczi I am looking for possible (even limited) solution to make collapse working with rescore (especially LTR). "ext": {
"post-rescore" : { /* ... */ }
} This The present solution rescore only results present on current page. Code of solution is here: I will be thankful for any feedback and suggestions. |
Add support for rescoring collapsed docs (#27243). Documents at first get collapsed and then rescored.
@jimczi please take a look