Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No date histogram buckets returned for matching documents #3408

Closed
fmassot opened this issue May 25, 2023 · 2 comments · Fixed by quickwit-oss/tantivy#2057
Closed

No date histogram buckets returned for matching documents #3408

fmassot opened this issue May 25, 2023 · 2 comments · Fixed by quickwit-oss/tantivy#2057
Assignees
Labels
bug Something isn't working

Comments

@fmassot
Copy link
Contributor

fmassot commented May 25, 2023

I'm struggling to narrow down this problem. @PSeitz you may have better insights on this than me.

I indexed the hdfs logs dataset from our tutorial.

Here is the query:

{
   "aggs":{
      "2":{
         "aggs":{
            "3":{
               "date_histogram":{
                  "field":"timestamp",
                  "fixed_interval":"30d"
               }
            }
         },
         "terms":{
            "field":"severity_text",
            "size":100,
            "order":{
               "_count":"desc"
            },
            "min_doc_count":0
         }
      }
   },
   "query":"timestamp:>=2016-10-01T21:59:59Z AND timestamp:<2016-10-06T22:09:42.455Z",
   "max_hits":0
}

And here is the result: we can see that no bucket is returned for documents with terms WARN and ERROR.

{
  "num_hits": 5840046,
  "hits": [],
  "elapsed_time_micros": 1577,
  "errors": [],
  "aggregations": {
    "2": {
      "buckets": [
        {
          "3": {
            "buckets": [
              {
                "doc_count": 5840033,
                "key": 1474848000000.0,
                "key_as_string": "2016-09-26T00:00:00Z"
              }
            ]
          },
          "doc_count": 5840033,
          "key": "INFO"
        },
        {
          "3": {
            "buckets": []
          },
          "doc_count": 12,
          "key": "WARN"
        },
        {
          "3": {
            "buckets": []
          },
          "doc_count": 1,
          "key": "ERROR"
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    }
  }
}

There is one document with severity_text:ERROR matching the query:

{
      "body": "mesos-slave-19:50010:DataXceiver error processing WRITE_BLOCK operation  src: /10.10.34.29:55706 dst: /10.10.34.29:50010",
      "resource": {
        "service": "datanode/19"
      },
      "severity_text": "ERROR",
      "tenant_id": 41,
      "timestamp": 1475654227
    }

This timestamp corresponds to 2023-05-25T09:23:56+00:00.

Removing the min_doc_count from the date_histogram aggregation solves the problem.

@fmassot fmassot added the bug Something isn't working label May 25, 2023
@PSeitz PSeitz self-assigned this May 25, 2023
@PSeitz
Copy link
Contributor

PSeitz commented May 25, 2023

Thanks for the detailed bug report.

The issue was when min_doc==0 loads a term from the dictionary from
one segment and merges the same term with a subaggregation from another
segment.
Previously the empty structure was not correctly initialized to contain
the subaggregation so the merge was incorrect.

Will be fixed with quickwit-oss/tantivy#2057

@fmassot
Copy link
Contributor Author

fmassot commented May 25, 2023

fast fix! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants