Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline aggregations: Ability to perform computations on aggregations #10568

Merged
merged 78 commits into from
Apr 29, 2015
Merged
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
e2949d7
make InternalAggregation.reduce(ReduceContext) use template pattern
colings86 Feb 11, 2015
c60bb4d
Adds reducers list to InternalAggregation.reduce()
colings86 Feb 11, 2015
ae76239
AggregatorFactories now stores reducers as well as aggregators
colings86 Feb 12, 2015
1e947c8
Reducers are now parsed in AggregatorParsers
colings86 Feb 12, 2015
55b82db
Reducers are now wired end-to-end into the agg framework
colings86 Feb 12, 2015
9cfa6c6
Basic derivative reducer
colings86 Feb 12, 2015
d65e9a4
Fixing compile issues after rebase with master
colings86 Feb 12, 2015
ef4a910
fix to the name of the injected aggregation for derivatives
colings86 Feb 12, 2015
f00a9b8
Minor indentation/validation fix in AggregatorParsers.
jpountz Feb 12, 2015
3a77754
derivative reducer now works with both date_histogram and histogram
colings86 Feb 12, 2015
9805b83
can now reference single value metrics directly instead of having to …
colings86 Feb 12, 2015
0f22d7e
Can now specify a format for the returned derivative values
colings86 Feb 12, 2015
18c2cb6
Validation of the reducer factories is now called from within the Agg…
colings86 Feb 13, 2015
9357fc4
bucketsPath is now in the Reducer class since every Reducer implement…
colings86 Feb 13, 2015
63f3281
Merge branch 'master' into feature/aggs_2_0
colings86 Feb 16, 2015
3ab3ffa
First (rough) pass at dependancy resolution for reducers
colings86 Feb 16, 2015
f20dae8
getProperty method in the aggregations framework now throws a specifi…
colings86 Feb 16, 2015
58f2cec
Derivative Reducer now supported nth order derivatives
colings86 Feb 16, 2015
247b6a7
removed obselete NOCOMMIT and left over sysout call
colings86 Feb 16, 2015
e994044
Added Builder classes for Reducers
colings86 Feb 16, 2015
c97dd84
Added Builder for Derivatives Reducer
colings86 Feb 16, 2015
511e275
More update to support Reducer Builders
colings86 Feb 16, 2015
f68bce5
Tests for derivative reducer
colings86 Feb 17, 2015
269d4bc
InternalHistogram.Factory.create() can now work from prototype
colings86 Feb 17, 2015
19cdfe2
DerivativeReducer now copies histogram options from old histogram ins…
colings86 Feb 17, 2015
3375c02
Added support for _count and _key as bucketsPaths
colings86 Feb 17, 2015
6c12cfd
updated derivative tests to test _count
colings86 Feb 17, 2015
f03fe5b
Cleaning up NOCOMMITs which are resolved
colings86 Feb 17, 2015
7f84466
Cleaning up NOCOMMITs
colings86 Feb 17, 2015
5a2c4ab
Added test for second_derivative
colings86 Feb 17, 2015
bcaf22e
Merge branch 'master' into feature/aggs_2_0
colings86 Feb 18, 2015
7c046d2
Implementation of GapPolicy for derivative
colings86 Mar 2, 2015
dc03912
Merge branch 'master' into feature/aggs_2_0
colings86 Mar 4, 2015
3131e01
Move GapPolicy and resolveBucketValues() to static helper methods
polyfractal Mar 2, 2015
8e02a85
Add header to BucketHelpers class
polyfractal Mar 5, 2015
3063f06
Add randomiziation to test for derivative aggregation
Mar 3, 2015
02679e7
[BUILD] fix snapshot URL
s1monw Mar 17, 2015
2ac9309
Merge branch 'master' into feature/aggs_2_0
colings86 Mar 18, 2015
cb4ab06
missed file in merge
colings86 Mar 18, 2015
b751f0e
added validation of reducers
colings86 Mar 23, 2015
53de93a
Aggregations: Added Factory for all MultiBucketAggregations to implement
colings86 Mar 5, 2015
caeb85e
Merge branch 'master' into feature/aggs_2_0
colings86 Apr 8, 2015
a824184
Aggregations: Add MovAvg Reducer
polyfractal Apr 8, 2015
e19d20b
max bucket reducer and sibling reducer framework
colings86 Mar 17, 2015
56a3762
Merge branch 'master' into feature/aggs_2_0
colings86 Apr 9, 2015
7d38989
Merge branch 'master' into feature/aggs_2_0
colings86 Apr 13, 2015
48a94a4
Added normalisation to Derivative Reducer
colings86 Apr 13, 2015
306d94a
Revert "Added normalisation to Derivative Reducer"
colings86 Apr 13, 2015
392f9ce
clean up
colings86 Apr 13, 2015
7fdf32f
changed `bucketsPaths` to `buckets_paths`
colings86 Apr 13, 2015
ea1470a
More tests for max bucket reducer
colings86 Apr 20, 2015
0f4b7f3
Added section for reducer aggregations in the main aggregation docs page
colings86 Apr 15, 2015
be647a8
Documentation for the derivative reducer
colings86 Apr 16, 2015
bd28c9c
Documentation for the max_bucket reducer
colings86 Apr 17, 2015
89d424e
Derivative can now access multi-value metric aggregations
colings86 Apr 21, 2015
f6934e0
unit test for derivative of metric agg with gaps
colings86 Apr 22, 2015
77e2f64
Derivative tests for gaps in metrics
colings86 Apr 22, 2015
dcf91ff
Temporarily disabled gap policy randomisation in MovAvgTests
colings86 Apr 22, 2015
3017788
Add prediction capability to MovAvgReducer
polyfractal Apr 9, 2015
a03cefc
[DOCS] Add documentation for moving average
polyfractal Apr 15, 2015
2a74f2c
[TESTS] randomize metric type, better naming, fix gap handling
polyfractal Apr 22, 2015
e08e45c
[DOCS] Add link to movavg page
polyfractal Apr 22, 2015
1a1ddce
Muted failing MovAvgTests
colings86 Apr 23, 2015
0ff4827
Fix MaxBucketReducer to use gapPolicy
colings86 Apr 23, 2015
114d10e
[TEST] Restructure MovAvgTests to be more generic, robust
polyfractal Apr 23, 2015
a218d59
Fix bug where MovAvgReducer would allow NaN's to "corrupt" the moving…
polyfractal Apr 23, 2015
8435d92
Fix bug in GiantGapWithPrediction, due to "slow start" of double exp
polyfractal Apr 23, 2015
26189ee
Rename helpers to follow naming conventions
polyfractal Apr 25, 2015
31f26ec
review comment fixes
colings86 Apr 27, 2015
935144a
review comment fixes
polyfractal Apr 27, 2015
bf9739d
[DOCS] review comment fixes
polyfractal Apr 27, 2015
891dfee
Fix some indentation issues.
jpountz Apr 29, 2015
ccca038
Other indentation fixes
jpountz Apr 29, 2015
3bb8ff2
fixed issue with eggs in percolation request for 1 shard
colings86 Apr 29, 2015
57a8885
Merge branch 'master' into feature/aggs_2_0
colings86 Apr 29, 2015
a33e77f
Muted intermittently failing tests
colings86 Apr 29, 2015
88aa893
Merge branch 'master' into feature/aggs_2_0
colings86 Apr 29, 2015
0589adb
Merge branch 'master' into feature/aggs_2_0
colings86 Apr 29, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions docs/reference/search/aggregations.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,38 @@ aggregated for the buckets created by their "parent" bucket aggregation.
There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some
define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.

[float]
=== Reducer Aggregations

coming[2.0.0]

experimental[]

Reducer aggregations work on the outputs produced from other aggregations rather than from document sets, adding
information to the output tree. There are many different types of reducer, each computing different information from
other aggregations, but these type can broken down into two families:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types


_Parent_::
A family of reducer aggregations that is provided with the output of its parent aggregation and is able
to compute new buckets or new aggregations to add to existing buckets.

_Sibling_::
Reducer aggregations that are provided with the output of a sibling aggregation and are able to compute a
new aggregation which will be at the same level as the sibling aggregation.

Reducer aggregations can reference the aggregations they need to perform their computation by using the `buckets_paths`
parameter to indicate the paths to the required metrics. The syntax for defining these paths can be found in the
<<search-aggregations-bucket-terms-aggregation-order, terms aggregation order>> section.

?????? SHOULD THE SECTION ABOUT DEFINING AGGREGATION PATHS
BE IN THIS PAGE AND REFERENCED FROM THE TERMS AGGREGATION DOCUMENTATION ???????
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe be its own page and referenced from both?


Reducer aggregations cannot have sub-aggregations but depending on the type it can reference another reducer in the `buckets_path`
allowing reducers to be chained.

NOTE: Because reducer aggregations only add to the output, when chaining reducer aggregations the output of each reducer will be
included in the final output.

[float]
=== Caching heavy aggregations

Expand Down Expand Up @@ -195,3 +227,6 @@ Then that piece of metadata will be returned in place for our `titles` terms agg
include::aggregations/metrics.asciidoc[]

include::aggregations/bucket.asciidoc[]

include::aggregations/reducer.asciidoc[]

4 changes: 4 additions & 0 deletions docs/reference/search/aggregations/reducer.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[[search-aggregations-reducer]]

include::reducer/derivative.asciidoc[]
include::reducer/movavg-reducer.asciidoc[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it include documentation for the max reducer?

Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
[[search-aggregations-reducer-derivative-aggregation]]
=== Derivative Aggregation

A parent reducer aggregation which calculates the derivative of a specified metric in a parent histogram (or date_histogram)
aggregation. The specified metric must be numeric and the enclosing histogram must have `min_doc_count` set to `0`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's confusing that the below example does not set the min doc count?


The following snippet calculates the derivative of the total monthly `sales`:

[source,js]
--------------------------------------------------
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
},
"sales_deriv": {
"derivative": {
"buckets_paths": "sales" <1>
}
}
}
}
}
}
--------------------------------------------------

<1> `bucket_paths` instructs this derivative aggregation to use the output of the `sales` aggregation for the derivative

And the following may be the response:

[source,js]
--------------------------------------------------
{
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550
} <1>
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60
},
"sales_deriv": {
"value": -490 <2>
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375
},
"sales_deriv": {
"value": 315
}
}
]
}
}
}
--------------------------------------------------

<1> No derivative for the first bucket since we need at least 2 data points to calculate the derivative
<2> Derivative value units are implicitly defined by the `sales` aggregation and the parent histogram so in this case the units
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth having a 3rd note to explain what the doc_count measures

would be $/month assuming the `price` field has units of $.

==== Second Order Derivative

A second order derivative can be calculated by chaining the derivative reducer aggregation onto the result of another derivative
reducer aggregation as in the following example which will calculate both the first and the second order derivative of the total
monthly sales:

[source,js]
--------------------------------------------------
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
},
"sales_deriv": {
"derivative": {
"buckets_paths": "sales"
}
},
"sales_2nd_deriv": {
"derivative": {
"buckets_paths": "sales_deriv" <1>
}
}
}
}
}
}
--------------------------------------------------

<1> `bucket_paths` for the second derivative points to the name of the first derivative

And the following may be the response:

[source,js]
--------------------------------------------------
{
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550
} <1>
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60
},
"sales_deriv": {
"value": -490
} <1>
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375
},
"sales_deriv": {
"value": 315
},
"sales_2nd_deriv": {
"value": 805
}
}
]
}
}
}
--------------------------------------------------
<1> No second derivative for the first two buckets since we need at least 2 data points from the first derivative to calculate the
second derivative

==== Dealing with gaps in the data

There are a couple of reasons why the data output by the enclosing histogram may have gaps:

* There are no documents matching the query for some buckets
* The data for a metric is missing in all of the documents falling into a bucket (this is most likely with either a small interval
on the enclosing histogram or with a query matching only a small number of documents)

Where there is no data available in a bucket for a given metric it presents a problem for calculating the derivative value for both
the current bucket and the next bucket. In the derivative reducer aggregation has a `gap policy` parameter to define what the behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/gap policy/gap_policy/

should be when a gap in the data is found. There are currently two options for controlling the gap policy:

_ignore_::
This option will not produce a derivative value for any buckets where the value in the current or previous bucket is
missing

_insert_zeros_::
This option will assume the missing value is `0` and calculate the derivative with the value `0`.


Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
[[search-aggregations-reducer-max-bucket-aggregation]]
=== Max Bucket Aggregation

A sibling reducer aggregation which identifies the bucket(s) with the maximum value of a specified metric in a sibing aggregation
and outputs both the value and the key(s) of the bucket(s). The specified metric must be numeric and the sibling aggregation must
be a multi-bucket aggregation.

The following snippet calculates the maximum of the total monthly `sales`:

[source,js]
--------------------------------------------------
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"max_monthly_sales": {
"max_bucket": {
"buckets_paths": "sales_per_month>sales" <1>
}
}
}
}
--------------------------------------------------

<1> `bucket_paths` instructs this max_bucket aggregation that we want the maximum value of the `sales` aggregation in the
"sales_per_month` date histogram.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/"/`/


And the following may be the response:

[source,js]
--------------------------------------------------
{
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375
}
}
]
},
"max_monthly_sales": {
"keys": ["2015/01/01 00:00:00"], <1>
"value": 550
}
}
}
--------------------------------------------------

<1> `keys` is an array of strings since the maximum value may be present in multiple buckets

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading