-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bucket Aggregation size setting should never throw too_many_buckets_exception if size is less than respect search.max_buckets #51559
Comments
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
Hey @niemyjski :) I just recently replied to a different issue explaining a little how the setting works: #34209 (comment). At a minimum, it seems we need to document how this setting works internally so users have a better understanding. I think the main difficulty is that the setting serves two purposes today, which confuses the semantics a little. First, it's a soft-limit for response size. Easy enough to reason about. But the other purpose (and arguably the more important one to us) is that it acts as a "breaker" to help kill execution of expensive aggregations. There are several ways aggs can go off the rails and cause issues, but one of the predominant mechanisms are "abusive" aggs that generate too many buckets. The
The coordinator can't return results until all shards have reported in, so it has to buffer the intermediate reductions in memory. In an extreme example, if you as for So the I agree that does make it confusing/complicated for users, particularly since some aggs (like |
Small update: there is discussion ongoing about deprecating or changing the behavior of |
This should now be resolved by #57042. The limit has been drastically increased, and bucket counts are now only tallied after all reductions are complete. SO the various and confusing edge-cases that I described above no longer apply, and the setting really does what it says on the tin. :) 🎉 |
Thanks! |
Elasticsearch version (
bin/elasticsearch --version
): 7.5.2Plugins installed: []
JVM version (
java -version
): 7.5.2OS version (
uname -a
if on a Unix-like system): dockerDescription of the problem including expected versus actual behavior:
Bucket Aggregation size setting should never throw too_many_buckets_exception if size is less than respect search.max_buckets. If I have a simple terms aggregation (no nesting) then I'd think it would always return the max number of buckets as determined by the size property. I get that for accuracy more records might be returned from various shards which may be over the 10k limit but my end result returned should be <= 10k as defined by the size property.
TLDR: I don't care what queries happen behind the scenes to get me my 10k buckets. All I care about is I get my 10k buckets which is valid size as it's <=
search.max_buckets
:-)Steps to reproduce:
Assuming I have more than 10k unique document ids..
Should return 10k unique buckets with an id in each bucket..
What happens is:
Reasoning:
I'd love to learn more why this happens, if we could get a detailed response on this choice that would be greatly appreciated. I know I wasn't the only one as it was discussed here too: https://discuss.elastic.co/t/large-aggregate-too-many-buckets-exception/189091/15
If I'm understanding this issue correctly, wouldn't the following scenario also throw this error. Let's say I have two shards and shard 1 contains 10k+ unique ids and shard2 contains 10k+ different unique ids. The combination of both of them being queried would return 20k buckets that need to be merged down into the respected bucket size of 10k. But creating 1 bucket over the max behind the scenes would throw this error.
The text was updated successfully, but these errors were encountered: