-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter aggregation fails with index_out_of_bounds_exception
when nested within rare_terms
#51020
Comments
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
This will also happen with the following query, where field
|
I haven't looked at code yet or reproduced this particular error, but there's definitely something funky going on with sub-aggregators and GET /test/_search
{
"size": 0,
"aggs": {
"terms": {
"terms": {
"field": "host",
"size": 10
}
},
"rare": {
"rare_terms": {
"field": "host",
"max_doc_count": 10,
"precision": 0.00001
}
}
}
} {
"aggregations" : {
"terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "a",
"doc_count" : 3
},
{
"key" : "c",
"doc_count" : 2
},
{
"key" : "b",
"doc_count" : 1
}
]
},
"rare" : {
"buckets" : [
{
"key" : "b",
"doc_count" : 1
},
{
"key" : "c",
"doc_count" : 2
},
{
"key" : "a",
"doc_count" : 3
}
]
}
}
} But adding that filter sub-agg makes the rare_terms results incorrect, no matter how you play with GET /test/_search
{
"size": 0,
"aggs": {
"terms": {
"terms": {
"field": "host",
"size": 10
}
},
"rare": {
"rare_terms": {
"field": "host",
"max_doc_count": 10,
"precision": 0.00001
},
"aggs": {
"filter": {
"filter": {
"range": {
"@date": {
"gte": "now-15m"
}
}
}
}
}
}
}
} {
"aggregations" : {
"terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "a",
"doc_count" : 3
},
{
"key" : "c",
"doc_count" : 2
},
{
"key" : "b",
"doc_count" : 1
}
]
},
"rare" : {
"meta" : { },
"buckets" : [
{
"key" : "b",
"doc_count" : 1,
"filter" : {
"meta" : { },
"doc_count" : 0
}
}
]
}
}
}
(Note: none of the docs in my test actually match Not sure what's going on, but |
I can't reproduce the failure locally. Is there any chance you could either build a complete reproduction against an empty cluster or attach a stack trace? |
Hi @nik9000. I hope this stack trace helps:
|
I've reproduced this locally - something is indeed wrong with the merging, I believe. |
When the `rare_terms` aggregation contained another aggregation it'd break them. Most of the time. This happened because the process that it uses to remove buckets that turn out not to be rare was incorrectly merging results from multiple leaves. This'd cause array index out of bounds issues. We didn't catch it in the test because the issue doesn't happen on the very first bucket. And the tests generated data in such a way that the first bucket always contained the rare terms. Randomizing the order of the generated data fixed the test so it caught the issue. Closes elastic#51020
When the `rare_terms` aggregation contained another aggregation it'd break them. Most of the time. This happened because the process that it uses to remove buckets that turn out not to be rare was incorrectly merging results from multiple leaves. This'd cause array index out of bounds issues. We didn't catch it in the test because the issue doesn't happen on the very first bucket. And the tests generated data in such a way that the first bucket always contained the rare terms. Randomizing the order of the generated data fixed the test so it caught the issue. Closes #51020
When the `rare_terms` aggregation contained another aggregation it'd break them. Most of the time. This happened because the process that it uses to remove buckets that turn out not to be rare was incorrectly merging results from multiple leaves. This'd cause array index out of bounds issues. We didn't catch it in the test because the issue doesn't happen on the very first bucket. And the tests generated data in such a way that the first bucket always contained the rare terms. Randomizing the order of the generated data fixed the test so it caught the issue. Closes elastic#51020
When the `rare_terms` aggregation contained another aggregation it'd break them. Most of the time. This happened because the process that it uses to remove buckets that turn out not to be rare was incorrectly merging results from multiple leaves. This'd cause array index out of bounds issues. We didn't catch it in the test because the issue doesn't happen on the very first bucket. And the tests generated data in such a way that the first bucket always contained the rare terms. Randomizing the order of the generated data fixed the test so it caught the issue. Closes elastic#51020
When the `rare_terms` aggregation contained another aggregation it'd break them. Most of the time. This happened because the process that it uses to remove buckets that turn out not to be rare was incorrectly merging results from multiple leaves. This'd cause array index out of bounds issues. We didn't catch it in the test because the issue doesn't happen on the very first bucket. And the tests generated data in such a way that the first bucket always contained the rare terms. Randomizing the order of the generated data fixed the test so it caught the issue. Closes #51020
When the `rare_terms` aggregation contained another aggregation it'd break them. Most of the time. This happened because the process that it uses to remove buckets that turn out not to be rare was incorrectly merging results from multiple leaves. This'd cause array index out of bounds issues. We didn't catch it in the test because the issue doesn't happen on the very first bucket. And the tests generated data in such a way that the first bucket always contained the rare terms. Randomizing the order of the generated data fixed the test so it caught the issue. Closes #51020
Elasticsearch version (
bin/elasticsearch --version
): 7.5 on Elastic CloudDescription of the problem including expected versus actual behavior:
I am nesting a
filter
aggregation within arare_terms
aggregation:Some (or all) shards fail, returning an
index_out_of_bounds_exception
.Playing with the
range
query, it turns our that this works for some values ofgte
, and not for others.In my case, the query above (currently) works for, e.g.,
"gte": "2020-01-14T00:00:00"
.I am on Elastic Cloud and was unable to find additional logs. Please let me know which other information you need and how I can obtain it.
The text was updated successfully, but these errors were encountered: