-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline aggregations: Ability to perform computations on aggregations #10568
Changes from 67 commits
e2949d7
c60bb4d
ae76239
1e947c8
55b82db
9cfa6c6
d65e9a4
ef4a910
f00a9b8
3a77754
9805b83
0f22d7e
18c2cb6
9357fc4
63f3281
3ab3ffa
f20dae8
58f2cec
247b6a7
e994044
c97dd84
511e275
f68bce5
269d4bc
19cdfe2
3375c02
6c12cfd
f03fe5b
7f84466
5a2c4ab
bcaf22e
7c046d2
dc03912
3131e01
8e02a85
3063f06
02679e7
2ac9309
cb4ab06
b751f0e
53de93a
caeb85e
a824184
e19d20b
56a3762
7d38989
48a94a4
306d94a
392f9ce
7fdf32f
ea1470a
0f4b7f3
be647a8
bd28c9c
89d424e
f6934e0
77e2f64
dcf91ff
3017788
a03cefc
2a74f2c
e08e45c
1a1ddce
0ff4827
114d10e
a218d59
8435d92
26189ee
31f26ec
935144a
bf9739d
891dfee
ccca038
3bb8ff2
57a8885
a33e77f
88aa893
0589adb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -116,6 +116,38 @@ aggregated for the buckets created by their "parent" bucket aggregation. | |
There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some | ||
define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process. | ||
|
||
[float] | ||
=== Reducer Aggregations | ||
|
||
coming[2.0.0] | ||
|
||
experimental[] | ||
|
||
Reducer aggregations work on the outputs produced from other aggregations rather than from document sets, adding | ||
information to the output tree. There are many different types of reducer, each computing different information from | ||
other aggregations, but these type can broken down into two families: | ||
|
||
_Parent_:: | ||
A family of reducer aggregations that is provided with the output of its parent aggregation and is able | ||
to compute new buckets or new aggregations to add to existing buckets. | ||
|
||
_Sibling_:: | ||
Reducer aggregations that are provided with the output of a sibling aggregation and are able to compute a | ||
new aggregation which will be at the same level as the sibling aggregation. | ||
|
||
Reducer aggregations can reference the aggregations they need to perform their computation by using the `buckets_paths` | ||
parameter to indicate the paths to the required metrics. The syntax for defining these paths can be found in the | ||
<<search-aggregations-bucket-terms-aggregation-order, terms aggregation order>> section. | ||
|
||
?????? SHOULD THE SECTION ABOUT DEFINING AGGREGATION PATHS | ||
BE IN THIS PAGE AND REFERENCED FROM THE TERMS AGGREGATION DOCUMENTATION ??????? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or maybe be its own page and referenced from both? |
||
|
||
Reducer aggregations cannot have sub-aggregations but depending on the type it can reference another reducer in the `buckets_path` | ||
allowing reducers to be chained. | ||
|
||
NOTE: Because reducer aggregations only add to the output, when chaining reducer aggregations the output of each reducer will be | ||
included in the final output. | ||
|
||
[float] | ||
=== Caching heavy aggregations | ||
|
||
|
@@ -195,3 +227,6 @@ Then that piece of metadata will be returned in place for our `titles` terms agg | |
include::aggregations/metrics.asciidoc[] | ||
|
||
include::aggregations/bucket.asciidoc[] | ||
|
||
include::aggregations/reducer.asciidoc[] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
[[search-aggregations-reducer]] | ||
|
||
include::reducer/derivative.asciidoc[] | ||
include::reducer/movavg-reducer.asciidoc[] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should it include documentation for the max reducer? |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,192 @@ | ||
[[search-aggregations-reducer-derivative-aggregation]] | ||
=== Derivative Aggregation | ||
|
||
A parent reducer aggregation which calculates the derivative of a specified metric in a parent histogram (or date_histogram) | ||
aggregation. The specified metric must be numeric and the enclosing histogram must have `min_doc_count` set to `0`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's confusing that the below example does not set the min doc count? |
||
|
||
The following snippet calculates the derivative of the total monthly `sales`: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"aggs" : { | ||
"sales_per_month" : { | ||
"date_histogram" : { | ||
"field" : "date", | ||
"interval" : "month" | ||
}, | ||
"aggs": { | ||
"sales": { | ||
"sum": { | ||
"field": "price" | ||
} | ||
}, | ||
"sales_deriv": { | ||
"derivative": { | ||
"buckets_paths": "sales" <1> | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
|
||
<1> `bucket_paths` instructs this derivative aggregation to use the output of the `sales` aggregation for the derivative | ||
|
||
And the following may be the response: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"aggregations": { | ||
"sales_per_month": { | ||
"buckets": [ | ||
{ | ||
"key_as_string": "2015/01/01 00:00:00", | ||
"key": 1420070400000, | ||
"doc_count": 3, | ||
"sales": { | ||
"value": 550 | ||
} <1> | ||
}, | ||
{ | ||
"key_as_string": "2015/02/01 00:00:00", | ||
"key": 1422748800000, | ||
"doc_count": 2, | ||
"sales": { | ||
"value": 60 | ||
}, | ||
"sales_deriv": { | ||
"value": -490 <2> | ||
} | ||
}, | ||
{ | ||
"key_as_string": "2015/03/01 00:00:00", | ||
"key": 1425168000000, | ||
"doc_count": 2, | ||
"sales": { | ||
"value": 375 | ||
}, | ||
"sales_deriv": { | ||
"value": 315 | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
|
||
<1> No derivative for the first bucket since we need at least 2 data points to calculate the derivative | ||
<2> Derivative value units are implicitly defined by the `sales` aggregation and the parent histogram so in this case the units | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be worth having a 3rd note to explain what the doc_count measures |
||
would be $/month assuming the `price` field has units of $. | ||
|
||
==== Second Order Derivative | ||
|
||
A second order derivative can be calculated by chaining the derivative reducer aggregation onto the result of another derivative | ||
reducer aggregation as in the following example which will calculate both the first and the second order derivative of the total | ||
monthly sales: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"aggs" : { | ||
"sales_per_month" : { | ||
"date_histogram" : { | ||
"field" : "date", | ||
"interval" : "month" | ||
}, | ||
"aggs": { | ||
"sales": { | ||
"sum": { | ||
"field": "price" | ||
} | ||
}, | ||
"sales_deriv": { | ||
"derivative": { | ||
"buckets_paths": "sales" | ||
} | ||
}, | ||
"sales_2nd_deriv": { | ||
"derivative": { | ||
"buckets_paths": "sales_deriv" <1> | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
|
||
<1> `bucket_paths` for the second derivative points to the name of the first derivative | ||
|
||
And the following may be the response: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"aggregations": { | ||
"sales_per_month": { | ||
"buckets": [ | ||
{ | ||
"key_as_string": "2015/01/01 00:00:00", | ||
"key": 1420070400000, | ||
"doc_count": 3, | ||
"sales": { | ||
"value": 550 | ||
} <1> | ||
}, | ||
{ | ||
"key_as_string": "2015/02/01 00:00:00", | ||
"key": 1422748800000, | ||
"doc_count": 2, | ||
"sales": { | ||
"value": 60 | ||
}, | ||
"sales_deriv": { | ||
"value": -490 | ||
} <1> | ||
}, | ||
{ | ||
"key_as_string": "2015/03/01 00:00:00", | ||
"key": 1425168000000, | ||
"doc_count": 2, | ||
"sales": { | ||
"value": 375 | ||
}, | ||
"sales_deriv": { | ||
"value": 315 | ||
}, | ||
"sales_2nd_deriv": { | ||
"value": 805 | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
<1> No second derivative for the first two buckets since we need at least 2 data points from the first derivative to calculate the | ||
second derivative | ||
|
||
==== Dealing with gaps in the data | ||
|
||
There are a couple of reasons why the data output by the enclosing histogram may have gaps: | ||
|
||
* There are no documents matching the query for some buckets | ||
* The data for a metric is missing in all of the documents falling into a bucket (this is most likely with either a small interval | ||
on the enclosing histogram or with a query matching only a small number of documents) | ||
|
||
Where there is no data available in a bucket for a given metric it presents a problem for calculating the derivative value for both | ||
the current bucket and the next bucket. In the derivative reducer aggregation has a `gap policy` parameter to define what the behavior | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/gap policy/gap_policy/ |
||
should be when a gap in the data is found. There are currently two options for controlling the gap policy: | ||
|
||
_ignore_:: | ||
This option will not produce a derivative value for any buckets where the value in the current or previous bucket is | ||
missing | ||
|
||
_insert_zeros_:: | ||
This option will assume the missing value is `0` and calculate the derivative with the value `0`. | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
[[search-aggregations-reducer-max-bucket-aggregation]] | ||
=== Max Bucket Aggregation | ||
|
||
A sibling reducer aggregation which identifies the bucket(s) with the maximum value of a specified metric in a sibing aggregation | ||
and outputs both the value and the key(s) of the bucket(s). The specified metric must be numeric and the sibling aggregation must | ||
be a multi-bucket aggregation. | ||
|
||
The following snippet calculates the maximum of the total monthly `sales`: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"aggs" : { | ||
"sales_per_month" : { | ||
"date_histogram" : { | ||
"field" : "date", | ||
"interval" : "month" | ||
}, | ||
"aggs": { | ||
"sales": { | ||
"sum": { | ||
"field": "price" | ||
} | ||
} | ||
} | ||
}, | ||
"max_monthly_sales": { | ||
"max_bucket": { | ||
"buckets_paths": "sales_per_month>sales" <1> | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
|
||
<1> `bucket_paths` instructs this max_bucket aggregation that we want the maximum value of the `sales` aggregation in the | ||
"sales_per_month` date histogram. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/"/`/ |
||
|
||
And the following may be the response: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"aggregations": { | ||
"sales_per_month": { | ||
"buckets": [ | ||
{ | ||
"key_as_string": "2015/01/01 00:00:00", | ||
"key": 1420070400000, | ||
"doc_count": 3, | ||
"sales": { | ||
"value": 550 | ||
} | ||
}, | ||
{ | ||
"key_as_string": "2015/02/01 00:00:00", | ||
"key": 1422748800000, | ||
"doc_count": 2, | ||
"sales": { | ||
"value": 60 | ||
} | ||
}, | ||
{ | ||
"key_as_string": "2015/03/01 00:00:00", | ||
"key": 1425168000000, | ||
"doc_count": 2, | ||
"sales": { | ||
"value": 375 | ||
} | ||
} | ||
] | ||
}, | ||
"max_monthly_sales": { | ||
"keys": ["2015/01/01 00:00:00"], <1> | ||
"value": 550 | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
|
||
<1> `keys` is an array of strings since the maximum value may be present in multiple buckets | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
types