Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rescore collapsed documents #28521

Merged
merged 14 commits into from
Mar 4, 2018
Merged

Conversation

fred84
Copy link
Contributor

@fred84 fred84 commented Feb 5, 2018

Add support for rescoring collapsed docs (#27243). Documents at first get collapsed and then rescored.

@jimczi please take a look

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@jimczi
Copy link
Contributor

jimczi commented Feb 14, 2018

@elasticmachine ok to test

@jimczi jimczi added >enhancement :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.3.0 labels Feb 14, 2018
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me.
I'll merge if the build passes with the changes, thanks @fred84 !

@fred84
Copy link
Contributor Author

fred84 commented Feb 14, 2018

@jimczi

https://github.com/elastic/elasticsearch/blob/master/rest-api-spec/src/main/resources/rest-api-spec/test/search/110_field_collapsing.yml#L241 is failing. I think we could remove it, I've already added integration test for this behaviour

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fred84, you can remove the failing tests, it is no longer needed. Though I left some comments regarding the IT test. I think it needs to be changed to ensure that scoring and ordering are consistent.


SearchResponse searchResponse = client().prepareSearch("test")
.setTypes("type1")
.setQuery(new MatchQueryBuilder("name", "one"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The score of this query depends on the number of shards, the default similarity, ... To make sure that we have consistent scoring you can use a function_score query like the following:

 QueryBuilder query = functionScoreQuery(
            termQuery("name", "one"),
            ScoreFunctionBuilders.fieldValueFactorFunction("my_static_doc_score") 
        ).boostMode(CombineFunction.REPLACE);

... and add the my_static_doc_score at indexing time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

SearchResponse searchResponse = client().prepareSearch("test")
.setTypes("type1")
.setQuery(new MatchQueryBuilder("name", "one"))
.addRescorer(new QueryRescorerBuilder(new MatchQueryBuilder("name", "two")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the same for the rescore with another field for instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@fred84
Copy link
Contributor Author

fred84 commented Feb 15, 2018

@jimczi Thanks for reviewing. I'll update PR next week.

@fred84
Copy link
Contributor Author

fred84 commented Feb 27, 2018

@jimczi PR updated, now integration test use static scoring.

@jimczi
Copy link
Contributor

jimczi commented Feb 28, 2018

@elasticmachine ok to test

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fred84

@jimczi jimczi merged commit f057fc2 into elastic:master Mar 4, 2018
jimczi pushed a commit that referenced this pull request Mar 4, 2018
This change adds the ability to rescore collapsed documents.
@fred84 fred84 deleted the 27243_collapse_with_rescore branch March 5, 2018 06:30
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Mar 7, 2018
* master:
  [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse
  Decouple XContentType from StreamInput/Output (elastic#28927)
  Remove BytesRef usage from XContentParser and its subclasses (elastic#28792)
  [DOCS] Correct typo in configuration (elastic#28903)
  Fix incorrect datemath example (elastic#28904)
  Add a usage example of the JLH score (elastic#28905)
  Wrap stream passed to createParser in try-with-resources (elastic#28897)
  Rescore collapsed documents (elastic#28521)
  Fix (simple)_query_string to ignore removed terms (elastic#28871)
  [Docs] Fix typo in composite aggregation (elastic#28891)
  Try if tombstone is eligable for pruning before locking on it's key (elastic#28767)
jimczi added a commit that referenced this pull request Mar 8, 2018
This reverts commit f057fc2.
The rescorer does not resort the collapsed values inside the top docs
during rescoring. For this reason the Lucene rescorer is not compatible
with collapsing.
Relates #27243
jimczi added a commit that referenced this pull request Mar 8, 2018
This reverts commit f057fc2.
The rescorer does not resort the collapsed values inside the top docs
during rescoring. For this reason the Lucene rescorer is not compatible
with collapsing.
Relates #27243
@jimczi
Copy link
Contributor

jimczi commented Mar 8, 2018

I had to revert this change since it doesn't work as expected. I forgot that the collapsed values would also need to be resorted by the rescorer. We use these values in the coordinating node to collapse the results of each shard but the rescorer in Lucene cannot access them:


I am really sorry I missed that but since it would require a rewriting of the rescorer in Lucene and that the collapsing code is only in es I don't think it is worth the effort.

@rpedela
Copy link

rpedela commented Mar 8, 2018

Doesn't Solr support collapse + rescore (rerank)? The claim that Lucene's rescorer needs a rewrite seems dubious.

@jimczi jimczi removed the won't fix label Mar 8, 2018
@jimczi
Copy link
Contributor

jimczi commented Mar 8, 2018

I agree that we should be able to rescore collapsed documents but this is more high hanging fruit than I thought which is why I reverted and closed the issue for now (sorry @fred84 ).
The current design of the collapsing in es is not compatible with the rescorer and will require some internal refactoring. I've started to work on this refactoring and when it's ready we'll reevaluate this pr if @fred84 still wants to work on it ;).

@fred84
Copy link
Contributor Author

fred84 commented Mar 8, 2018

@jimczi let me now when I can start this issue again :)

martijnvg added a commit that referenced this pull request Mar 8, 2018
* es/master: (48 commits)
  Update bucket-sort-aggregation.asciidoc (#28937)
  [Docs] REST high-level client: Fix code for most basic search request (#28916)
  Improved percolator's random candidate query duel test and fixed bugs that were exposed by this:
  Revert "Rescore collapsed documents (#28521)"
  Build: Fix test logger NPE when no tests are run (#28929)
  [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse
  Decouple XContentType from StreamInput/Output (#28927)
  Remove BytesRef usage from XContentParser and its subclasses (#28792)
  [DOCS] Correct typo in configuration (#28903)
  Fix incorrect datemath example (#28904)
  Add a usage example of the JLH score (#28905)
  Wrap stream passed to createParser in try-with-resources (#28897)
  Rescore collapsed documents (#28521)
  Fix (simple)_query_string to ignore removed terms (#28871)
  [Docs] Fix typo in composite aggregation (#28891)
  Try if tombstone is eligable for pruning before locking on it's key (#28767)
  Limit analyzed text for highlighting (improvements) (#28808)
  Missing `timeout` parameter from the REST API spec JSON files (#28328)
  Clarifies how query_string splits textual part (#28798)
  Update outdated java version reference (#28870)
  ...
martijnvg added a commit that referenced this pull request Mar 8, 2018
* es/6.x: (48 commits)
  Update bucket-sort-aggregation.asciidoc (#28937)
  [Docs] REST high-level client: Fix code for most basic search request (#28916)
  Improved percolator's random candidate query duel test and fixed bugs that were exposed by this:
  Revert "Rescore collapsed documents (#28521)"
  Build: Fix test logger NPE when no tests are run (#28929)
  [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse
  Decouple XContentType from StreamInput/Output (#28927)
  Remove BytesRef usage from XContentParser and its subclasses (#28792)
  Add doc note for -server flag on Windows service
  [DOCS] Correct typo in configuration (#28903)
  Fix incorrect datemath example (#28904)
  Add a usage example of the JLH score (#28905)
  Limit analyzed text for highlighting (improvements) (#28907)
  Wrap stream passed to createParser in try-with-resources (#28897)
  [Docs] Fix typo in composite aggregation (#28891)
  Rescore collapsed documents (#28521)
  Fix (simple)_query_string to ignore removed terms (#28871)
  Missing `timeout` parameter from the REST API spec JSON files (#28328)
  Clarifies how query_string splits textual part (#28798)
  Update outdated java version reference (#28870)
  ...
sebasjm pushed a commit to sebasjm/elasticsearch that referenced this pull request Mar 10, 2018
This change adds the ability to rescore collapsed documents.
sebasjm pushed a commit to sebasjm/elasticsearch that referenced this pull request Mar 10, 2018
This reverts commit f057fc2.
The rescorer does not resort the collapsed values inside the top docs
during rescoring. For this reason the Lucene rescorer is not compatible
with collapsing.
Relates elastic#27243
@jimczi jimczi added the stalled label Mar 14, 2018
@jimczi
Copy link
Contributor

jimczi commented Mar 30, 2018

Sorry it took me some time to come back at this. I checked why Solr was able to rescore the collapsed documents seamlessly and found out that they force the routing of each group in a single shard. This means that all the documents belonging to a single group are on the same shard so the rescoring is always done on the final head of the group. In es we don't enforce the routing so each group can be spread over multiple shards. This complicates the rescoring since it is always applied at the shard level and in this case on the temporary head of the groups (we don't know the final head in the shard since another shard can contain a better document for that group). For this reason I am reluctant to add this functionality because it might be surprising to see a head in a group that is not the best document of that group in the final response. This can happen if the rescoring gives a score to a document in a shard that is better that the score of the best document in the group which is in another shard. I don't see how we could avoid this unless we force the routing of the groups.

@damitkwr
Copy link

Is there a way to revisit this functionality? This limitation curtails the use of any LTR algorithms

@DajunZhou
Copy link

Excuse me, is there a way to apply rescoring before collapsing?

@lasagar
Copy link

lasagar commented May 8, 2020

Is there a way to revisit this functionality? This limitation curtails the use of any LTR algorithms

I am looking for this option for LTR rescoring

@hosuaby
Copy link

hosuaby commented Jul 20, 2020

Hello, @jimczi

I am looking for possible (even limited) solution to make collapse working with rescore (especially LTR).
For the moment, I created a small plugin, that support search extension:

"ext": {
    "post-rescore" : { /* ... */ }
}

This post-rescore has the same syntax as original rescore, but it is executed as FetchSubPhase (so I assume, that it's executed on coordinating node).

The present solution rescore only results present on current page.

Code of solution is here:
https://github.com/hosuaby/bug-elasticsearch-collapse-rescore/tree/master/src/main/java/io/hosuaby/elasticsearch

I will be thankful for any feedback and suggestions.

@Morikko
Copy link

Morikko commented Aug 13, 2020

@hosuaby you might have a look to this discussion: #60946 it looks close to your proposition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants