-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support field collapsing + rescore #27243
Comments
That's possible but that would mean rescoring the collapsed hits per shard and then doing the final collapsing of the rescored hits in the coordinating node. It would not be possible to select the top N uncollapsed and do the collapsing on the rescored docs only. Is it acceptable for your use case @rpedela ? |
The ideal for me would be to collapse first and then rescore the top N collapsed. I don't know if that is equivalent to rescoring the collapsed hits per shard, but it sounds close enough. |
It's not exactly equivalent to a global rescore of the top N collapsed because the rescoring would be per shard first but that's close enough. I'll mark this issue as adoptme because I don't have time to work on it right now. @rpedela would you like to contribute a patch for this ? |
I want to take this issue. @jimczi could you please look at following example to verify that I'm correctly understand expected behaviour. Example Given we have following documents: [
{"name": "elasticsearch", "access": "public", "maintainers": 30 },
{"name": "logstash" , "access": "public", "maintainers": 20 },
{"name": "kibana" , "access": "public", "maintainers": 10 },
{"name": "xpack" , "access": "private", "maintainers": 20 },
{"name": "beats" , "access": "private", "maintainers": 5 },
{"name": "security" , "access": "private", "maintainers": 2 }
] Query: {
"query": { "range": {"maintainers": {"gt": 3}}},
"collapse" : {
"field" : "access",
"inner_hits": {
"name": "most_maintainers",
"size": 2,
"sort": [{ "maintainers": "desc" }]
}
},
"rescore" : {
"query" : {
"rescore_query" : {
"function_score" : {
"script_score": {
"script": {
"source": "doc.maintainers.value"
}
}
}
}
}
}
} Expected result: {
"hits" : {
"total" : 5,
"max_score" : 1.0, // default score
"hits" : [
{
"_index" : "test",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0, // default score
"_source" : {
"name" : "kibana",
"access" : "public",
"maintainers" : 10
},
"fields" : {
"access" : [
"public"
]
},
"inner_hits" : {
"most_maintainers" : {
"hits" : {
"total" : 3,
"max_score" : 31,
"hits" : [
{
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_score" : 31, // rescoring applied, 1 + 30
"_source" : {
"name" : "elasticsearch",
"access" : "public",
"maintainers" : 30
},
"sort" : [
30
]
},
{
"_index" : "test",
"_type" : "doc",
"_id" : "2",
"_score" : 21, // rescoring applied
"_source" : {
"name" : "logstash",
"access" : "public",
"maintainers" : 20
},
"sort" : [
20
]
}
]
}
}
}
},
{
"_index" : "test",
"_type" : "doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "beats",
"access" : "private",
"maintainers" : 5
},
"fields" : {
"access" : [
"private"
]
},
"inner_hits" : {
"most_maintainers" : {
"hits" : {
"total" : 2,
"max_score" : 21,
"hits" : [
{
"_index" : "test",
"_type" : "doc",
"_id" : "4",
"_score" : 21,
"_source" : {
"name" : "xpack",
"access" : "private",
"maintainers" : 20
},
"sort" : [
20
]
},
{
"_index" : "test",
"_type" : "doc",
"_id" : "5",
"_score" : 6,
"_source" : {
"name" : "beats",
"access" : "private",
"maintainers" : 5
},
"sort" : [
5
]
}
]
}
}
}
}
]
}
} |
the rescoring should be applied to collapsed hits at the top level, not the |
@jimczi, below is updated example. Is it correct now? Query: {
"query": { "range": {"maintainers": {"gt": 3}}},
"collapse" : {
"field" : "access"
},
"rescore" : {
"query" : {
"rescore_query" : {
"function_score" : {
"script_score": {
"script": {
"source": "doc.maintainers.value"
}
}
}
}
}
}
} Expected result: {
"hits" : {
"total" : 5,
"max_score" : 11.0, // smaller then possible best document score (31.0) because rescoring is done after collapsing
"hits" : [
{
"_index" : "test",
"_type" : "doc",
"_id" : "3",
"_score" : 11.0, // rescoring applied (10+1)
"_source" : {
"name" : "kibana",
"access" : "public",
"maintainers" : 10
},
"fields" : {
"access" : [
"public"
]
}
},
{
"_index" : "test",
"_type" : "doc",
"_id" : "5",
"_score" : 6.0, // rescoring applied (5+1)
"_source" : {
"name" : "beats",
"access" : "private",
"maintainers" : 5
},
"fields" : {
"access" : [
"private"
]
}
}
]
}
} |
No @fred84 , yes your example is correct, though the best document per group in your case will depend on the order of the documents in the index since all documents have the same score for the query (the range query uses a constant score). |
Closed by #28521 |
This reverts commit f057fc2. The rescorer does not resort the collapsed values inside the top docs during rescoring. For this reason the Lucene rescorer is not compatible with collapsing. Relates elastic#27243
The fix was reverted. Can we revisit this functionality? This limitation curtails the use of any LTR algorithms |
+1 Very needed |
1 similar comment
+1 Very needed |
Reopening after discussing internally. We're actively working on this issue and plan to support this feature in the near future. |
After many years, can't believe about this. |
Pinging @elastic/es-search (Team:Search) |
This change adds the support for rescoring collapsed documents. The rescoring is applied on the top document per group on each shard. Closes elastic#27243
This change adds the support for rescoring collapsed documents. The rescoring is applied on the top document per group on each shard. Closes #27243
I have some searches that require field collapsing. I need the feature, and I also need the performance improvement over using parent/child or nested. I would like to start using the learning to rank plugin for these searches but rescore support is needed to have good performance, at least for my use case. Any chance of adding this?
The text was updated successfully, but these errors were encountered: