Aggregations: Ability to calculate the derivative of a histogram #9293

colings86 · 2015-01-14T09:44:32Z

Purpose

Computes the derivative of all the metrics in the sub-aggregation tree. If no metric aggregations are present in the sub-aggregation tree then it will compute the derivative of the doc count. The original sub-aggregation tree is destroyed in the computation of this aggregation as is not included in the output.

Validation of sub-aggregation tree

Accepts a single:

date histogram
histogram
derivative

which contain one or more single-value numeric metric aggregation only.

Missing Buckets

Data is not always complete and gaps may exist at any point. For example, a derivative may be calculated on a daily date_histogram spanning the date 01/01/2015 to 31/01/2015, but there may not be any data points for 05/01/2015 and 10/01/2015-15/01/2015. We need to be able to deal with situation in a manner which is intuitive for the user. This means that the derivative transformer needs to be aware of the keys in the histogram buckets and the expected interval between each bucket. There are three policies we can adopt for dealing with gaps in the data. These are outlined in the sub-sections below. We should probably support all three policies and allow the user to specify which policy to use on the request.

Fill gaps with zero values

Probably the simplest solution to implement. When a gap is identified, buckets are artificially inserted for each missing date with a value of '0'. The derivative is then calculated taking into account these artificial buckets. We should probably add a flag to the output to indicate that these are artificial values and not derived from real values at that date.

Skip calculation for gaps

Filling gap with zero's is not always a good idea. For example, imagine a situation where you are calculating the derivative of the water pressure for a power plant. The pressure is supposed to be a non-zero value. Here recording a value of zero is a lot different to no value recorded; a value of zero (which would be reflected as a plummeting water pressure in the derivative) means a serious problem that probably warrants an evacuation, whereas no value would indicate something is wrong with the pressure sensor and warrants further investigation by an engineer. For this situation we would use the following policy:

Gap's in the input histogram are also present in the output of the derivative. This means that for the example in the above section the derivative would be calculated for the ranges only:

01/01/2015-04/01/2015
06/01/2015-09/01/2015
16/01/2015-31/01/2015

I think this should be the default policy for missing values since it assumes nothing about the way a user expects gaps to be dealt with and accurately reflects the source data instead of inserting artificial values

Interpolate values for gaps

Sometimes a user will not want gaps on the graph and also not want zero values. This would be required if you had a system that tries to post a value every 10s but sometimes the value is dropped for some reason (a ping? or values from a UDP connection?). In this instance you want the result to look like a continuous stream even if there are gaps.

This policy would interpolate the values for any missing values. We should probably add a flag to the output to indicate that these are estimated values and not derived from real values at that date.

Example 1: First Derivative

Goal

Calculate the first derivative of the daily maximum price and the daily average price

Request

{
  "aggs": {
    "first_derivative": {
      "derivative": {},
      "aggs": {
        "my_date_histo": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "day"
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            },
            "max_price": {
              "max": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

Response

{
    ...
    "aggregations": {
        "first_derivative": {
            "buckets": [
                {
                    "key": "2014-12-01T00:00:00.000Z",
                    "from": "2014-12-01T00:00:00.000Z",
                    "to": "2015-12-02T00:00:00.000Z",
                    "avg_price": {
                        "value": 23.00
                    },
                    "max_price": {
                        "value": 56.00
                    }
                },
                {
                    "key": "2014-12-02T00:00:00.000Z",
                    "from": "2014-12-02T00:00:00.000Z",
                    "to": "2015-12-03T00:00:00.000Z",
                    "avg_price": {
                        "value": 50.00
                    },
                    "max_price": {
                        "value": 28.00
                    }
                },
                {
                    "key": "2014-12-03T00:00:00.000Z",
                    "from": "2014-12-03T00:00:00.000Z",
                    "to": "2015-12-04T00:00:00.000Z",
                    "avg_price": {
                        "value": -6.00
                    },
                    "max_price": {
                        "value": -1.00
                    }
                },
                ...
            ]
        }
    }
    ...
}

Example 2: Second Derivative

Goal

Calculate the second derivative of the daily maximum price and the daily average price

Request

{
  "aggs": {
    "second_derivative": {
      "derivative": {},
      "aggs": {
        "first_derivative": {
          "derivative": {},
          "aggs": {
            "my_date_histo": {
              "date_histogram": {
                "field": "@timestamp",
                "interval": "day"
              },
              "aggs": {
                "avg_price": {
                  "avg": {
                    "field": "price"
                  }
                },
                "max_price": {
                  "max": {
                    "field": "price"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Response

{
    ...
    "aggregations": {
        "second_derivative": {
            "buckets": [
                {
                    "key": "2014-12-01T00:00:00.000Z",
                    "from": "2014-12-01T00:00:00.000Z",
                    "to": "2015-12-02T00:00:00.000Z",
                    "avg_price": {
                        "value": 23.00
                    },
                    "max_price": {
                        "value": 56.00
                    }
                },
                {
                    "key": "2014-12-02T00:00:00.000Z",
                    "from": "2014-12-02T00:00:00.000Z",
                    "to": "2015-12-03T00:00:00.000Z",
                    "avg_price": {
                        "value": 50.00
                    },
                    "max_price": {
                        "value": 28.00
                    }
                },
                {
                    "key": "2014-12-03T00:00:00.000Z",
                    "from": "2014-12-03T00:00:00.000Z",
                    "to": "2015-12-04T00:00:00.000Z",
                    "avg_price": {
                        "value": -6.00
                    },
                    "max_price": {
                        "value": -1.00
                    }
                },
                ...
            ]
        }
    }
    ...
}

j0hnsmith · 2015-01-14T09:55:18Z

Sounds great.

Another useful gap strategy would be to simply repeat the last known/calculated value (also with a flag to indicate it's not a real value).

marcboscher · 2015-01-29T20:05:55Z

+1

So now metric aggregators (like derivative) would be allowed to have child aggregations?

If so, would be great if the scripted metric aggregator could actually have child metric aggregators too. The script could then access the child metric values and perform operation on them.

For example, getting the number of visitors per region (doc count), divided by the number of sites in each region (unique count on site field). This would indicate the popularity of a region, independently of the number of sites in that region.

Or is there another way to do this (apart from client side)?

shokurov · 2015-03-07T14:09:03Z

+1

Would it be also possible to run filters, for example, on top_hits? It would be really beneficial if the whole bucket was available as a single document for filtering.

An example application case would be to get sequence chains grouped by a field (say latest transactions by customer) and then filter them for specific subsequences (customer purchased item Y in 3 months after item X).

nrstott · 2015-03-21T21:11:04Z

+1 for reducers on top_hits

skearns64 · 2015-04-07T20:20:37Z

I think as part of derivatives, we should also consider optional time normalization when date histograms are in use, so I could report the derivative "per minute," regardless of whether my buckets are per-minute or one every 5 seconds. This will be a big help when trying to show consistent derivative values when zooming in or out of a graph.

shokurov · 2015-06-29T14:39:16Z

👍

bstsnail · 2016-01-24T10:49:44Z

How to show in the kibana ?

clamping · 2016-10-07T12:42:14Z

+1 @bstsnail
How to show in Kibana?

hellwen · 2017-01-20T02:55:59Z

How to show in Kibana?

colings86 added >feature v2.0.0-beta1 :Analytics/Aggregations Aggregations labels Jan 14, 2015

colings86 self-assigned this Jan 14, 2015

colings86 mentioned this issue Jan 14, 2015

Reducers - Post processing of aggregation results #8110

Closed

colings86 mentioned this issue Feb 25, 2015

Add ability to perform computations on aggregations #9876

Closed

24 tasks

rashidkpc mentioned this issue Mar 31, 2015

Series transformations (eg derivative) elastic/kibana#1743

Closed

colings86 mentioned this issue Apr 13, 2015

Pipeline aggregations: Ability to perform computations on aggregations #10568

Merged

colings86 closed this as completed in 02c0cdf Apr 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregations: Ability to calculate the derivative of a histogram #9293

Aggregations: Ability to calculate the derivative of a histogram #9293

colings86 commented Jan 14, 2015

j0hnsmith commented Jan 14, 2015

marcboscher commented Jan 29, 2015

shokurov commented Mar 7, 2015

nrstott commented Mar 21, 2015

skearns64 commented Apr 7, 2015

shokurov commented Jun 29, 2015

bstsnail commented Jan 24, 2016

clamping commented Oct 7, 2016

hellwen commented Jan 20, 2017

Aggregations: Ability to calculate the derivative of a histogram #9293

Aggregations: Ability to calculate the derivative of a histogram #9293

Comments

colings86 commented Jan 14, 2015

Purpose

Validation of sub-aggregation tree

Missing Buckets

Fill gaps with zero values

Skip calculation for gaps

Interpolate values for gaps

Example 1: First Derivative

Goal

Request

Response

Example 2: Second Derivative

Goal

Request

Response

j0hnsmith commented Jan 14, 2015

marcboscher commented Jan 29, 2015

shokurov commented Mar 7, 2015

nrstott commented Mar 21, 2015

skearns64 commented Apr 7, 2015

shokurov commented Jun 29, 2015

bstsnail commented Jan 24, 2016

clamping commented Oct 7, 2016

hellwen commented Jan 20, 2017