Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support field collapsing + rescore #27243

Closed
rpedela opened this issue Nov 3, 2017 · 15 comments · Fixed by #107779
Closed

support field collapsing + rescore #27243

rpedela opened this issue Nov 3, 2017 · 15 comments · Fixed by #107779
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@rpedela
Copy link

rpedela commented Nov 3, 2017

I have some searches that require field collapsing. I need the feature, and I also need the performance improvement over using parent/child or nested. I would like to start using the learning to rank plugin for these searches but rescore support is needed to have good performance, at least for my use case. Any chance of adding this?

@colings86 colings86 added the :Search/Search Search-related issues that do not fall into other categories label Nov 3, 2017
@jimczi
Copy link
Contributor

jimczi commented Nov 3, 2017

That's possible but that would mean rescoring the collapsed hits per shard and then doing the final collapsing of the rescored hits in the coordinating node. It would not be possible to select the top N uncollapsed and do the collapsing on the rescored docs only. Is it acceptable for your use case @rpedela ?

@rpedela
Copy link
Author

rpedela commented Nov 3, 2017

The ideal for me would be to collapse first and then rescore the top N collapsed. I don't know if that is equivalent to rescoring the collapsed hits per shard, but it sounds close enough.

@jimczi
Copy link
Contributor

jimczi commented Nov 6, 2017

It's not exactly equivalent to a global rescore of the top N collapsed because the rescoring would be per shard first but that's close enough. I'll mark this issue as adoptme because I don't have time to work on it right now. @rpedela would you like to contribute a patch for this ?

@fred84
Copy link
Contributor

fred84 commented Jan 22, 2018

I want to take this issue. @jimczi could you please look at following example to verify that I'm correctly understand expected behaviour.

Example

Given we have following documents:

[
  {"name": "elasticsearch", "access": "public", "maintainers": 30 },
  {"name": "logstash"     , "access": "public", "maintainers": 20 },
  {"name": "kibana"       , "access": "public", "maintainers": 10 },
  {"name": "xpack"        , "access": "private", "maintainers": 20 },
  {"name": "beats"        , "access": "private", "maintainers": 5 },
  {"name": "security"     , "access": "private", "maintainers": 2 }
]

Query:

{
    "query": { "range": {"maintainers": {"gt": 3}}},

    "collapse" : {
        "field" : "access",
        "inner_hits": {
            "name": "most_maintainers", 
            "size": 2, 
            "sort": [{ "maintainers": "desc" }] 
        }
    },
    "rescore" : {
        "query" : {
            "rescore_query" : {
                "function_score" : {
                    "script_score": {
                        "script": {
                            "source": "doc.maintainers.value"
                        }
                    }
                }
            }
        }
    }
} 

Expected result:

{
  "hits" : {
    "total" : 5,
    "max_score" : 1.0, // default score
    "hits" : [
      {
        "_index" : "test",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 1.0, // default score
        "_source" : {
          "name" : "kibana",
          "access" : "public",
          "maintainers" : 10
        },
        "fields" : {
          "access" : [
            "public"
          ]
        },
        "inner_hits" : {
          "most_maintainers" : {
            "hits" : {
              "total" : 3,
              "max_score" : 31, 
              "hits" : [
                {
                  "_index" : "test",
                  "_type" : "doc",
                  "_id" : "1",
                  "_score" : 31, // rescoring applied, 1 + 30
                  "_source" : {
                    "name" : "elasticsearch",
                    "access" : "public",
                    "maintainers" : 30
                  },
                  "sort" : [
                    30
                  ]
                },
                {
                  "_index" : "test",
                  "_type" : "doc",
                  "_id" : "2",
                  "_score" : 21, // rescoring applied
                  "_source" : {
                    "name" : "logstash",
                    "access" : "public",
                    "maintainers" : 20
                  },
                  "sort" : [
                    20
                  ]
                }
              ]
            }
          }
        }
      },
      {
        "_index" : "test",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "name" : "beats",
          "access" : "private",
          "maintainers" : 5
        },
        "fields" : {
          "access" : [
            "private"
          ]
        },
        "inner_hits" : {
          "most_maintainers" : {
            "hits" : {
              "total" : 2,
              "max_score" : 21,
              "hits" : [
                {
                  "_index" : "test",
                  "_type" : "doc",
                  "_id" : "4",
                  "_score" : 21,
                  "_source" : {
                    "name" : "xpack",
                    "access" : "private",
                    "maintainers" : 20
                  },
                  "sort" : [
                    20
                  ]
                },
                {
                  "_index" : "test",
                  "_type" : "doc",
                  "_id" : "5",
                  "_score" : 6,
                  "_source" : {
                    "name" : "beats",
                    "access" : "private",
                    "maintainers" : 5
                  },
                  "sort" : [
                    5
                  ]
                }
              ]
            }
          }
        }
      }
    ]
  }
}

@jimczi
Copy link
Contributor

jimczi commented Jan 22, 2018

the rescoring should be applied to collapsed hits at the top level, not the inner_hits. Rescoring inner_hits should not be necessary, or at least considered as a different issue. The feature here would be to apply rescoring after collapsing but still on the top level documents. Looking at your example it would first collapse the top documents and then resort them using the rescore function.
This also means that the best documents after rescoring may not be the best document in the group after rescoring since the collapsing is done before the rescoring.

@fred84
Copy link
Contributor

fred84 commented Jan 23, 2018

@jimczi, below is updated example. Is it correct now?

Query:

{
    "query": { "range": {"maintainers": {"gt": 3}}},
    "collapse" : {
        "field" : "access"
    },
    "rescore" : {
        "query" : {
            "rescore_query" : {
                "function_score" : {
                    "script_score": {
                        "script": {
                            "source": "doc.maintainers.value"
                        }
                    }
                }
            }
        }
    }
} 

Expected result:

{
  "hits" : {
    "total" : 5,
    "max_score" : 11.0, // smaller then possible best document score (31.0) because rescoring is done after collapsing
    "hits" : [
      {
        "_index" : "test",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 11.0, // rescoring applied (10+1)
        "_source" : {
          "name" : "kibana",
          "access" : "public",
          "maintainers" : 10
        },
        "fields" : {
          "access" : [
            "public"
          ]
        }
      },
      {
        "_index" : "test",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 6.0, // rescoring applied (5+1)
        "_source" : {
          "name" : "beats",
          "access" : "private",
          "maintainers" : 5
        },
        "fields" : {
          "access" : [
            "private"
          ]
        }
      }
    ]
  }
}

@rpedela
Copy link
Author

rpedela commented Jan 23, 2018

@fred84 Thanks!

@jimczi Are inner_hits rescored for the nested and parent/child queries?

@jimczi
Copy link
Contributor

jimczi commented Jan 24, 2018

Are inner_hits rescored for the nested and parent/child queries?

No

@fred84 , yes your example is correct, though the best document per group in your case will depend on the order of the documents in the index since all documents have the same score for the query (the range query uses a constant score).

@jasontedor
Copy link
Member

Closed by #28521

@jimczi jimczi removed the help wanted adoptme label Mar 8, 2018
jimczi added a commit that referenced this issue Mar 8, 2018
This reverts commit f057fc2.
The rescorer does not resort the collapsed values inside the top docs
during rescoring. For this reason the Lucene rescorer is not compatible
with collapsing.
Relates #27243
jimczi added a commit that referenced this issue Mar 8, 2018
This reverts commit f057fc2.
The rescorer does not resort the collapsed values inside the top docs
during rescoring. For this reason the Lucene rescorer is not compatible
with collapsing.
Relates #27243
sebasjm pushed a commit to sebasjm/elasticsearch that referenced this issue Mar 10, 2018
This reverts commit f057fc2.
The rescorer does not resort the collapsed values inside the top docs
during rescoring. For this reason the Lucene rescorer is not compatible
with collapsing.
Relates elastic#27243
@damitkwr
Copy link

The fix was reverted. Can we revisit this functionality? This limitation curtails the use of any LTR algorithms

@Hronom
Copy link

Hronom commented Mar 3, 2022

+1 Very needed

1 similar comment
@qi20099
Copy link

qi20099 commented Jul 25, 2022

+1 Very needed

@jimczi
Copy link
Contributor

jimczi commented Mar 14, 2024

Reopening after discussing internally. We're actively working on this issue and plan to support this feature in the near future.

@serenachou serenachou reopened this Mar 14, 2024
@Hronom
Copy link

Hronom commented Mar 16, 2024

After many years, can't believe about this.

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Apr 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

jimczi added a commit to jimczi/elasticsearch that referenced this issue Apr 23, 2024
This change adds the support for rescoring collapsed documents.
The rescoring is applied on the top document per group on each shard.

Closes elastic#27243
jimczi added a commit that referenced this issue Apr 29, 2024
This change adds the support for rescoring collapsed documents.
The rescoring is applied on the top document per group on each shard.

Closes #27243
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging a pull request may close this issue.