Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy between doc_count agg and histogram agg on histogram fields #74617

Closed
sorenlouv opened this issue Jun 28, 2021 · 2 comments · Fixed by #74650
Closed

Discrepancy between doc_count agg and histogram agg on histogram fields #74617

sorenlouv opened this issue Jun 28, 2021 · 2 comments · Fixed by #74650
Assignees
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@sorenlouv
Copy link
Member

sorenlouv commented Jun 28, 2021

It looks like there is a bug when performing a histogram aggregation on a histogram field, if the document contains _doc_count field.

Expected
doc_count agg and histogram agg should return identical doc counts.

Actual:
The doc count is (almost) doubled for histogram agg compared to the doc_count agg

Affected version

{
    "number" : "8.0.0-SNAPSHOT",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "307b82c474fe27722863fa2dc6b05def2f608ad9",
    "build_date" : "2021-06-27T01:39:13.126072914Z",
    "build_snapshot" : true,
    "lucene_version" : "8.9.0",
    "minimum_wire_compatibility_version" : "7.14.0",
    "minimum_index_compatibility_version" : "7.0.0"
}

Reproduce problem

PUT histogram-double-bug
{
  "mappings" : {
    "properties" : {
      "my_histogram" : {
        "type" : "histogram"
      }
    }
  }
}

POST histogram-double-bug/_doc
{
  "_doc_count": 10,
  "my_histogram": {
    "counts": [
      10
    ],
    "values": [
      20
    ]
  }
}

GET histogram-double-bug/_search?track_total_hits=true
{
  "size": 0,
  "aggs": {
    "doc_count": {
      "value_count": {
        "field": "my_histogram"
      }
    },
    "distribution": {
      "histogram": {
        "field": "my_histogram",
        "interval": 1000
      }
    }
  }
}

Response

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "doc_count" : {
      "value" : 10
    },
    "distribution" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 19
        }
      ]
    }
  }
}

In the above, the correct doc count is 10 but the histogram doc count is 19.

@sorenlouv sorenlouv added >bug needs:triage Requires assignment of a team area label labels Jun 28, 2021
@benwtrent benwtrent added :Analytics/Aggregations Aggregations and removed needs:triage Requires assignment of a team area label labels Jun 28, 2021
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@benwtrent
Copy link
Member

I think the problem is here:

I am thinking it should probably be something like:

incrementBucketDocCount(bucketOrd, count - docCountProvider.getDocCount(doc));

no?

This also relates to: #74146

@csoulios csoulios self-assigned this Jun 28, 2021
csoulios added a commit to csoulios/elasticsearch that referenced this issue Jun 28, 2021
csoulios added a commit that referenced this issue Jun 28, 2021
histogram aggregation on histogram field computes wrong doc_count values when _doc_count field is present.

The root cause of the problem is correctly described here

Closes #74617
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants