Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregations: Ability to calculate the derivative of a histogram #9293

Closed
colings86 opened this issue Jan 14, 2015 · 9 comments
Closed

Aggregations: Ability to calculate the derivative of a histogram #9293

colings86 opened this issue Jan 14, 2015 · 9 comments

Comments

@colings86
Copy link
Contributor

Purpose

Computes the derivative of all the metrics in the sub-aggregation tree. If no metric aggregations are present in the sub-aggregation tree then it will compute the derivative of the doc count. The original sub-aggregation tree is destroyed in the computation of this aggregation as is not included in the output.

Validation of sub-aggregation tree

Accepts a single:

  • date histogram
  • histogram
  • derivative

which contain one or more single-value numeric metric aggregation only.

Missing Buckets

Data is not always complete and gaps may exist at any point. For example, a derivative may be calculated on a daily date_histogram spanning the date 01/01/2015 to 31/01/2015, but there may not be any data points for 05/01/2015 and 10/01/2015-15/01/2015. We need to be able to deal with situation in a manner which is intuitive for the user. This means that the derivative transformer needs to be aware of the keys in the histogram buckets and the expected interval between each bucket. There are three policies we can adopt for dealing with gaps in the data. These are outlined in the sub-sections below. We should probably support all three policies and allow the user to specify which policy to use on the request.

Fill gaps with zero values

Probably the simplest solution to implement. When a gap is identified, buckets are artificially inserted for each missing date with a value of '0'. The derivative is then calculated taking into account these artificial buckets. We should probably add a flag to the output to indicate that these are artificial values and not derived from real values at that date.

Skip calculation for gaps

Filling gap with zero's is not always a good idea. For example, imagine a situation where you are calculating the derivative of the water pressure for a power plant. The pressure is supposed to be a non-zero value. Here recording a value of zero is a lot different to no value recorded; a value of zero (which would be reflected as a plummeting water pressure in the derivative) means a serious problem that probably warrants an evacuation, whereas no value would indicate something is wrong with the pressure sensor and warrants further investigation by an engineer. For this situation we would use the following policy:

Gap's in the input histogram are also present in the output of the derivative. This means that for the example in the above section the derivative would be calculated for the ranges only:

  • 01/01/2015-04/01/2015
  • 06/01/2015-09/01/2015
  • 16/01/2015-31/01/2015

I think this should be the default policy for missing values since it assumes nothing about the way a user expects gaps to be dealt with and accurately reflects the source data instead of inserting artificial values

Interpolate values for gaps

Sometimes a user will not want gaps on the graph and also not want zero values. This would be required if you had a system that tries to post a value every 10s but sometimes the value is dropped for some reason (a ping? or values from a UDP connection?). In this instance you want the result to look like a continuous stream even if there are gaps.

This policy would interpolate the values for any missing values. We should probably add a flag to the output to indicate that these are estimated values and not derived from real values at that date.

Example 1: First Derivative

Goal

Calculate the first derivative of the daily maximum price and the daily average price

Request

{
  "aggs": {
    "first_derivative": {
      "derivative": {},
      "aggs": {
        "my_date_histo": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "day"
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            },
            "max_price": {
              "max": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

Response

{
    ...
    "aggregations": {
        "first_derivative": {
            "buckets": [
                {
                    "key": "2014-12-01T00:00:00.000Z",
                    "from": "2014-12-01T00:00:00.000Z",
                    "to": "2015-12-02T00:00:00.000Z",
                    "avg_price": {
                        "value": 23.00
                    },
                    "max_price": {
                        "value": 56.00
                    }
                },
                {
                    "key": "2014-12-02T00:00:00.000Z",
                    "from": "2014-12-02T00:00:00.000Z",
                    "to": "2015-12-03T00:00:00.000Z",
                    "avg_price": {
                        "value": 50.00
                    },
                    "max_price": {
                        "value": 28.00
                    }
                },
                {
                    "key": "2014-12-03T00:00:00.000Z",
                    "from": "2014-12-03T00:00:00.000Z",
                    "to": "2015-12-04T00:00:00.000Z",
                    "avg_price": {
                        "value": -6.00
                    },
                    "max_price": {
                        "value": -1.00
                    }
                },
                ...
            ]
        }
    }
    ...
}

Example 2: Second Derivative

Goal

Calculate the second derivative of the daily maximum price and the daily average price

Request

{
  "aggs": {
    "second_derivative": {
      "derivative": {},
      "aggs": {
        "first_derivative": {
          "derivative": {},
          "aggs": {
            "my_date_histo": {
              "date_histogram": {
                "field": "@timestamp",
                "interval": "day"
              },
              "aggs": {
                "avg_price": {
                  "avg": {
                    "field": "price"
                  }
                },
                "max_price": {
                  "max": {
                    "field": "price"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Response

{
    ...
    "aggregations": {
        "second_derivative": {
            "buckets": [
                {
                    "key": "2014-12-01T00:00:00.000Z",
                    "from": "2014-12-01T00:00:00.000Z",
                    "to": "2015-12-02T00:00:00.000Z",
                    "avg_price": {
                        "value": 23.00
                    },
                    "max_price": {
                        "value": 56.00
                    }
                },
                {
                    "key": "2014-12-02T00:00:00.000Z",
                    "from": "2014-12-02T00:00:00.000Z",
                    "to": "2015-12-03T00:00:00.000Z",
                    "avg_price": {
                        "value": 50.00
                    },
                    "max_price": {
                        "value": 28.00
                    }
                },
                {
                    "key": "2014-12-03T00:00:00.000Z",
                    "from": "2014-12-03T00:00:00.000Z",
                    "to": "2015-12-04T00:00:00.000Z",
                    "avg_price": {
                        "value": -6.00
                    },
                    "max_price": {
                        "value": -1.00
                    }
                },
                ...
            ]
        }
    }
    ...
}
@j0hnsmith
Copy link

Sounds great.

Another useful gap strategy would be to simply repeat the last known/calculated value (also with a flag to indicate it's not a real value).

@marcboscher
Copy link

+1

So now metric aggregators (like derivative) would be allowed to have child aggregations?

If so, would be great if the scripted metric aggregator could actually have child metric aggregators too. The script could then access the child metric values and perform operation on them.

For example, getting the number of visitors per region (doc count), divided by the number of sites in each region (unique count on site field). This would indicate the popularity of a region, independently of the number of sites in that region.

Or is there another way to do this (apart from client side)?

@shokurov
Copy link

shokurov commented Mar 7, 2015

+1

Would it be also possible to run filters, for example, on top_hits? It would be really beneficial if the whole bucket was available as a single document for filtering.

An example application case would be to get sequence chains grouped by a field (say latest transactions by customer) and then filter them for specific subsequences (customer purchased item Y in 3 months after item X).

@nrstott
Copy link

nrstott commented Mar 21, 2015

+1 for reducers on top_hits

@skearns64
Copy link
Contributor

I think as part of derivatives, we should also consider optional time normalization when date histograms are in use, so I could report the derivative "per minute," regardless of whether my buckets are per-minute or one every 5 seconds. This will be a big help when trying to show consistent derivative values when zooming in or out of a graph.

@shokurov
Copy link

👍

@bstsnail
Copy link

How to show in the kibana ?

@clamping
Copy link

clamping commented Oct 7, 2016

+1 @bstsnail
How to show in Kibana?

@hellwen
Copy link

hellwen commented Jan 20, 2017

How to show in Kibana?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants