Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant terms ignores background filter #76729

Closed
benwtrent opened this issue Aug 19, 2021 · 1 comment · Fixed by #76730
Closed

Significant terms ignores background filter #76729

benwtrent opened this issue Aug 19, 2021 · 1 comment · Fixed by #76730
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.15.0 v8.0.0-alpha2

Comments

@benwtrent
Copy link
Member

Elasticsearch version (bin/elasticsearch --version):
7.15 (and probably earlier)
Description of the problem including expected versus actual behavior:
Significant terms is ignoring the background filter when one index has NO documents containing the term value.

Steps to reproduce:
Here are a long set of steps to reproduce

PUT repeat-test-1
{
  "mappings": {
    "properties": {
      "field1": {
        "type": "keyword"
      },
      "field2": {
        "type": "keyword"
      },
      "field3": {
        "type": "keyword"
      }
    }
  }
}

PUT repeat-test-2
{
  "mappings": {
    "properties": {
      "field1": {
        "type": "keyword"
      },
      "field2": {
        "type": "keyword"
      },
      "field3": {
        "type": "keyword"
      }
    }
  }
}

POST repeat-test-1/_bulk?refresh=true
{ "index" : {} }
{ "field1" : "failed", "field2": "frontend-node", "field3": "foo" }
{ "index" : {} }
{ "field1" : "failed", "field2": "frontend-node", "field3": "bar" }
{ "index" : {} }
{ "field1" : "failed", "field2": "frontend-node", "field3": "bar" }
{ "index" : {} }
{ "field1" : "success", "field2": "frontend-node", "field3": "bar" }
{ "index" : {} }
{ "field1" : "success", "field2": "frontend-node", "field3": "bar" }
{ "index" : {} }

POST repeat-test-2/_bulk?refresh=true
{ "index" : {} }
{ "field1" : "success", "field2": "java-node" }
{ "index" : {} }
{ "field1" : "success", "field2": "java-node" }
{ "index" : {} }
{ "field1" : "success", "field2": "java-node" }
{ "index" : {} }


GET repeat-test*/_search?track_total_hits=true&size=0
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "field2": "frontend-node"
          }
        }
      ]
    }
  },
  "aggs": {
    "sig_term": {
      "significant_terms": {
        "field": "field3",
        "background_filter": {
          "bool": {
            "filter": [
              {
                "term": {
                  "field2": "frontend-node"
                }
              }
            ]
          }
        }
      }
    }
  }
}

With the background filter, one expects the background count to be only 5 docs

But, it responds with:

"aggregations" : {
    "sig_term" : {
      "doc_count" : 5,
      "bg_count" : 8,
      "buckets" : [
        {
          "key" : "bar",
          "doc_count" : 4,
          "score" : 0.4800000000000001,
          "bg_count" : 4
        }
      ]
    }
  }

"bg_count" : 8,

Which means the 3 docs that don't match the filter were matched.

Seems like a bad optimization as those docs also don't contain the term field field3

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 19, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

benwtrent added a commit that referenced this issue Aug 20, 2021
When building empty responses for shards that don't have the term field in question,
significant terms ignored the background filter.

This commit fixes this bug by respecting the background filter count, even
when building empty results.

closes #76729
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Aug 20, 2021
When building empty responses for shards that don't have the term field in question,
significant terms ignored the background filter.

This commit fixes this bug by respecting the background filter count, even
when building empty results.

closes elastic#76729
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Aug 20, 2021
When building empty responses for shards that don't have the term field in question,
significant terms ignored the background filter.

This commit fixes this bug by respecting the background filter count, even
when building empty results.

closes elastic#76729
benwtrent added a commit that referenced this issue Aug 20, 2021
* Fix bug with significant terms background count (#76730)

When building empty responses for shards that don't have the term field in question,
significant terms ignored the background filter.

This commit fixes this bug by respecting the background filter count, even
when building empty results.

closes #76729
elasticsearchmachine pushed a commit that referenced this issue Aug 20, 2021
* Fix bug with significant terms background count (#76730)

When building empty responses for shards that don't have the term field in question,
significant terms ignored the background filter.

This commit fixes this bug by respecting the background filter count, even
when building empty results.

closes #76729

* fixing bwc yaml test version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.15.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants