Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date Histogram Aggregations w/ extended_bounds and time_zone #12278

Closed
feltnerm opened this issue Jul 15, 2015 · 5 comments
Closed

Date Histogram Aggregations w/ extended_bounds and time_zone #12278

feltnerm opened this issue Jul 15, 2015 · 5 comments
Assignees

Comments

@feltnerm
Copy link

Hey all,

I'm performing a date histogram aggregation over the past day ('now/d' -> 'now/d'), and would like to get results into hourly buckets. I am using the extended_bounds of the aggregation because I would still like to get empty buckets back (as long as they are within my time range).

Everything is working almost as expected... except it seems like extended_bounds is not respecting the time zone.

My query returns the UTC "midnight" bucket which becomes 7pm (on the previous day) when I adjust for my timezone (CST). I would expect that extended_bounds with time_zone would return that timezone's midnight.

So if I were performing a query from 'now/d' to 'now/d' in CST (which is GMT-5 currently), I would expect the first bucket to be '2015-07-15T05:00:00.000Z' and not '2015-07-15T00:00:00.000Z'.

My question is: Is extended_bounds not respecting time_zone?

I'm having trouble succinctly describing this in English; hopefully my test case below helps to explain:

Test Case:

Index a new document with timestamp of 00:00 GMT:

curl -XPOST "http://localhost:9200/analytics/event" -d'
{
    "name": "Prince George",
   "event-date": {
      "timestamp": "2015-07-15T00:00:00.000Z"
   }
}'

Index a new document with timestamp of 08:00 GMT:

curl -XPOST "http://localhost:9200/analytics/event" -d'
{
    "name": "James Madison",
    "event-date": {
        "timestamp": "2015-07-15T08:00:00.000Z"
    }
}'

Search!

curl -XPOST "http://localhost:9200/analytics/event/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "range": {
                        "timestamp": {
                           "from": "now/d",
                           "to": "now/d",
                           "time_zone": "-5:00",
                           "include_lower": true,
                           "include_upper": true
                        }
                     }
                  }
               ]
            }
         }
      }
   },
   "aggs": {
      "dateagg": {
         "date_histogram": {
            "field": "event-date.timestamp",
            "interval": "1h",
            "time_zone": "-5:00",
            "min_doc_count": 0,
            "extended_bounds": {
               "min": "now/d",
               "max": "now/d"
            }
         }
      }
   }
}'

Result:

{
   "took": 92,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "analytics",
            "_type": "event",
            "_id": "AU6S1QrhLJ3RT4VnGs-M",
            "_score": 1,
            "_source": {
               "name": "James Madison",
               "event-date": {
                  "timestamp": "2015-07-15T08:00:00.000Z"
               }
            }
         }
      ]
   },
   "aggregations": {
      "dateagg": {
         "buckets": [
            {
               "key_as_string": "2015-07-15T00:00:00.000Z",
               "key": 1436918400000,
               "doc_count": 0
            },
            {
               "key_as_string": "2015-07-15T01:00:00.000Z",
               "key": 1436922000000,
               "doc_count": 0
            },
            {
               "key_as_string": "2015-07-15T02:00:00.000Z",
               "key": 1436925600000,
               "doc_count": 0
            },
            {
               "key_as_string": "2015-07-15T03:00:00.000Z",
               "key": 1436929200000,
               "doc_count": 0
            },
            {
               "key_as_string": "2015-07-15T04:00:00.000Z",
               "key": 1436932800000,
               "doc_count": 0
            },
            {
               "key_as_string": "2015-07-15T05:00:00.000Z",
               "key": 1436936400000,
               "doc_count": 0
            },
            {
               "key_as_string": "2015-07-15T06:00:00.000Z",
               "key": 1436940000000,
               "doc_count": 0
            },
            {
               "key_as_string": "2015-07-15T07:00:00.000Z",
               "key": 1436943600000,
               "doc_count": 0
            },
            {
               "key_as_string": "2015-07-15T08:00:00.000Z",
               "key": 1436947200000,
               "doc_count": 1
            }
         ]
      }
   }
}

As you can see, we are getting the expected hits ("James Madison"), and the document is in the correct bucket. I would prefer the first bucket to start at the beginning of the day in the timezone I specified in the query ("-05:00"), and not at the beginning of the day in UTC.

Thanks for your review, and thanks your hard work!

@clintongormley
Copy link
Contributor

@cbuescher could you take a look at this one? I think the docs need some updating - I'm confused as to best way to do something like this.

thanks

@cbuescher
Copy link
Member

I just verified this on 1.7. and the result still looks like @feltnerm describes it above. Might be a bug, but looking at the code, at first glance it looks like timezone should already be applied to extended_bounds. Will have to look into this more deeply.

@cbuescher
Copy link
Member

@feltnerm which version of ES were you using for the output you described above?

@feltnerm
Copy link
Author

@cbuescher the above is from 1.5.2, but I have tried with 1.6.x as well. Can try on 1.6.x or 1.7.x if needed.

@cbuescher
Copy link
Member

Some preliminary findings: I was able to dig a bit deeper into this, and could reproduce the behaviour on master, haven't found a way to fix this though.
The root problem seems to be that the extended bounds datemath expression is first evaluated without considering the timezone (so the "/d" day rounding that comes with the datemath expression results in UTC day changes). Later on, the time zone rounding of the aggregation is applied, but since we have specified one hour intervals here, the min/max of the extended bounds doesn't change. Before jumping to a quick fix here we need to rethink what the timezone parameter in this type of aggregation should apply to, if it should also affect date math expressions like "now/d" or if this has undesired side effects.

cbuescher added a commit to cbuescher/elasticsearch that referenced this issue Aug 14, 2015
This PR adds a timezone field to ValueParser.DateMath that is
set to UTC by default but can be set using the existing constructors.
This makes it possible for extended bounds setting in DateHistogram
to also use date math expressions that e.g. round by day and apply
this rounding in the time zone specified in the date histogram
aggregation request.

Closes elastic#12278
cbuescher added a commit that referenced this issue Aug 21, 2015
This PR adds a timezone field to ValueParser.DateMath that is
set to UTC by default but can be set using the existing constructors.
This makes it possible for extended bounds setting in DateHistogram
to also use date math expressions that e.g. round by day and apply
this rounding in the time zone specified in the date histogram
aggregation request.

Closes #12278
fixmebot bot referenced this issue in VectorXz/elasticsearch Apr 22, 2021
fixmebot bot referenced this issue in VectorXz/elasticsearch May 28, 2021
fixmebot bot referenced this issue in VectorXz/elasticsearch Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants