-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregations: Ability to calculate the derivative of a histogram #9293
Comments
Sounds great. Another useful gap strategy would be to simply repeat the last known/calculated value (also with a flag to indicate it's not a real value). |
+1 So now metric aggregators (like derivative) would be allowed to have child aggregations? If so, would be great if the scripted metric aggregator could actually have child metric aggregators too. The script could then access the child metric values and perform operation on them. For example, getting the number of visitors per region (doc count), divided by the number of sites in each region (unique count on site field). This would indicate the popularity of a region, independently of the number of sites in that region. Or is there another way to do this (apart from client side)? |
+1 Would it be also possible to run filters, for example, on top_hits? It would be really beneficial if the whole bucket was available as a single document for filtering. An example application case would be to get sequence chains grouped by a field (say latest transactions by customer) and then filter them for specific subsequences (customer purchased item Y in 3 months after item X). |
+1 for reducers on top_hits |
I think as part of derivatives, we should also consider optional time normalization when date histograms are in use, so I could report the derivative "per minute," regardless of whether my buckets are per-minute or one every 5 seconds. This will be a big help when trying to show consistent derivative values when zooming in or out of a graph. |
👍 |
How to show in the kibana ? |
+1 @bstsnail |
How to show in Kibana? |
Purpose
Computes the derivative of all the metrics in the sub-aggregation tree. If no metric aggregations are present in the sub-aggregation tree then it will compute the derivative of the doc count. The original sub-aggregation tree is destroyed in the computation of this aggregation as is not included in the output.
Validation of sub-aggregation tree
Accepts a single:
which contain one or more single-value numeric metric aggregation only.
Missing Buckets
Data is not always complete and gaps may exist at any point. For example, a derivative may be calculated on a daily date_histogram spanning the date 01/01/2015 to 31/01/2015, but there may not be any data points for 05/01/2015 and 10/01/2015-15/01/2015. We need to be able to deal with situation in a manner which is intuitive for the user. This means that the derivative transformer needs to be aware of the keys in the histogram buckets and the expected interval between each bucket. There are three policies we can adopt for dealing with gaps in the data. These are outlined in the sub-sections below. We should probably support all three policies and allow the user to specify which policy to use on the request.
Fill gaps with zero values
Probably the simplest solution to implement. When a gap is identified, buckets are artificially inserted for each missing date with a value of '0'. The derivative is then calculated taking into account these artificial buckets. We should probably add a flag to the output to indicate that these are artificial values and not derived from real values at that date.
Skip calculation for gaps
Filling gap with zero's is not always a good idea. For example, imagine a situation where you are calculating the derivative of the water pressure for a power plant. The pressure is supposed to be a non-zero value. Here recording a value of zero is a lot different to no value recorded; a value of zero (which would be reflected as a plummeting water pressure in the derivative) means a serious problem that probably warrants an evacuation, whereas no value would indicate something is wrong with the pressure sensor and warrants further investigation by an engineer. For this situation we would use the following policy:
Gap's in the input histogram are also present in the output of the derivative. This means that for the example in the above section the derivative would be calculated for the ranges only:
I think this should be the default policy for missing values since it assumes nothing about the way a user expects gaps to be dealt with and accurately reflects the source data instead of inserting artificial values
Interpolate values for gaps
Sometimes a user will not want gaps on the graph and also not want zero values. This would be required if you had a system that tries to post a value every 10s but sometimes the value is dropped for some reason (a ping? or values from a UDP connection?). In this instance you want the result to look like a continuous stream even if there are gaps.
This policy would interpolate the values for any missing values. We should probably add a flag to the output to indicate that these are estimated values and not derived from real values at that date.
Example 1: First Derivative
Goal
Calculate the first derivative of the daily maximum price and the daily average price
Request
Response
Example 2: Second Derivative
Goal
Calculate the second derivative of the daily maximum price and the daily average price
Request
Response
The text was updated successfully, but these errors were encountered: