Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix doc_count on HistoBackedHistogramAggregator #74650

Merged
merged 1 commit into from
Jun 28, 2021

Conversation

csoulios
Copy link
Contributor

@csoulios csoulios commented Jun 28, 2021

histogram aggregation on histogram field computes wrong doc_count values when _doc_count field is present.

The root cause of the problem is correctly described here

Closes #74617

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@csoulios csoulios requested a review from benwtrent June 28, 2021 19:10
// We have added the document already and we have incremented bucket doc_count
// by _doc_count times. To compensate for this, we should increment doc_count by
// (count - _doc_count) so that we have added it count times.
incrementBucketDocCount(bucketOrd, count - docCountProvider.getDocCount(doc));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No matter what _doc_count is, it was previously added to the bucket_count via the collectBucket methods. It could be ANY number.

Consequently, this incrementBucketDocCount may actually be decrementing the bucket_count to adjust for the difference.

I think this is fine.

The other potential solution is to override collectBucket... so the bucket count is not incremented. But that may prove too complicated.

I think this is good solution for now 👍

Somebody else from the aggs team should take a look as well.

@csoulios csoulios merged commit 49baa06 into elastic:master Jun 28, 2021
@csoulios csoulios deleted the fix-histo-doc_count branch June 28, 2021 19:55
csoulios added a commit that referenced this pull request Jun 29, 2021
Backports #74650 to v7.x

    histogram aggregation on histogram field computes wrong doc_count values when _doc_count field is present.

    The root cause of the problem is correctly described here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discrepancy between doc_count agg and histogram agg on histogram fields
5 participants