-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
New Histogram field mapper that supports percentiles aggregations. (#…
…48580) (#49683) This commit adds a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.
- Loading branch information
Showing
32 changed files
with
2,127 additions
and
72 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
[role="xpack"] | ||
[testenv="basic"] | ||
[[histogram]] | ||
=== Histogram datatype | ||
++++ | ||
<titleabbrev>Histogram</titleabbrev> | ||
++++ | ||
|
||
A field to store pre-aggregated numerical data representing a histogram. | ||
This data is defined using two paired arrays: | ||
|
||
* A `values` array of <<number, `double`>> numbers, representing the buckets for | ||
the histogram. These values must be provided in ascending order. | ||
* A corresponding `counts` array of <<number, `integer`>> numbers, representing how | ||
many values fall into each bucket. These numbers must be positive or zero. | ||
|
||
Because the elements in the `values` array correspond to the elements in the | ||
same position of the `count` array, these two arrays must have the same length. | ||
|
||
[IMPORTANT] | ||
======== | ||
* A `histogram` field can only store a single pair of `values` and `count` arrays | ||
per document. Nested arrays are not supported. | ||
* `histogram` fields do not support sorting. | ||
======== | ||
|
||
[[histogram-uses]] | ||
==== Uses | ||
|
||
`histogram` fields are primarily intended for use with aggregations. To make it | ||
more readily accessible for aggregations, `histogram` field data is stored as a | ||
binary <<doc-values,doc values>> and not indexed. Its size in bytes is at most | ||
`13 * numValues`, where `numValues` is the length of the provided arrays. | ||
|
||
Because the data is not indexed, you only can use `histogram` fields for the | ||
following aggregations and queries: | ||
|
||
* <<search-aggregations-metrics-percentile-aggregation,percentiles>> aggregation | ||
* <<search-aggregations-metrics-percentile-rank-aggregation,percentile ranks>> aggregation | ||
* <<query-dsl-exists-query,exists>> query | ||
|
||
[[mapping-types-histogram-building-histogram]] | ||
==== Building a histogram | ||
|
||
When using a histogram as part of an aggregation, the accuracy of the results will depend on how the | ||
histogram was constructed. It is important to consider the percentiles aggregation mode that will be used | ||
to build it. Some possibilities include: | ||
|
||
- For the <<search-aggregations-metrics-percentile-aggregation, T-Digest>> mode, the `values` array represents | ||
the mean centroid positions and the `counts` array represents the number of values that are attributed to each | ||
centroid. If the algorithm has already started to approximate the percentiles, this inaccuracy is | ||
carried over in the histogram. | ||
|
||
- For the <<_hdr_histogram,High Dynamic Range (HDR)>> histogram mode, the `values` array represents fixed upper | ||
limits of each bucket interval, and the `counts` array represents the number of values that are attributed to each | ||
interval. This implementation maintains a fixed worse-case percentage error (specified as a number of significant digits), | ||
therefore the value used when generating the histogram would be the maximum accuracy you can achieve at aggregation time. | ||
|
||
The histogram field is "algorithm agnostic" and does not store data specific to either T-Digest or HDRHistogram. While this | ||
means the field can technically be aggregated with either algorithm, in practice the user should chose one algorithm and | ||
index data in that manner (e.g. centroids for T-Digest or intervals for HDRHistogram) to ensure best accuracy. | ||
|
||
[[histogram-ex]] | ||
==== Examples | ||
|
||
The following <<indices-create-index, create index>> API request creates a new index with two field mappings: | ||
|
||
* `my_histogram`, a `histogram` field used to store percentile data | ||
* `my_text`, a `keyword` field used to store a title for the histogram | ||
|
||
[ INSERT CREATE INDEX SNIPPET ] | ||
[source,console] | ||
-------------------------------------------------- | ||
PUT my_index | ||
{ | ||
"mappings": { | ||
"properties": { | ||
"my_histogram": { | ||
"type" : "histogram" | ||
}, | ||
"my_text" : { | ||
"type" : "keyword" | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
|
||
The following <<docs-index_,index>> API requests store pre-aggregated for | ||
two histograms: `histogram_1` and `histogram_2`. | ||
|
||
[source,console] | ||
-------------------------------------------------- | ||
PUT my_index/_doc/1 | ||
{ | ||
"my_text" : "histogram_1", | ||
"my_histogram" : { | ||
"values" : [0.1, 0.2, 0.3, 0.4, 0.5], <1> | ||
"counts" : [3, 7, 23, 12, 6] <2> | ||
} | ||
} | ||
PUT my_index/_doc/2 | ||
{ | ||
"my_text" : "histogram_2", | ||
"my_histogram" : { | ||
"values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5], <1> | ||
"counts" : [8, 17, 8, 7, 6, 2] <2> | ||
} | ||
} | ||
-------------------------------------------------- | ||
<1> Values for each bucket. Values in the array are treated as doubles and must be given in | ||
increasing order. For <<search-aggregations-metrics-percentile-aggregation-approximation, T-Digest>> | ||
histograms this value represents the mean value. In case of HDR histograms this represents the value iterated to. | ||
<2> Count for each bucket. Values in the arrays are treated as integers and must be positive or zero. | ||
Negative values will be rejected. The relation between a bucket and a count is given by the position in the array. | ||
|
||
|
||
|
34 changes: 34 additions & 0 deletions
34
server/src/main/java/org/elasticsearch/index/fielddata/AtomicHistogramFieldData.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.elasticsearch.index.fielddata; | ||
|
||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* {@link AtomicFieldData} specialization for histogram data. | ||
*/ | ||
public interface AtomicHistogramFieldData extends AtomicFieldData { | ||
|
||
/** | ||
* Return Histogram values. | ||
*/ | ||
HistogramValues getHistogramValues() throws IOException; | ||
|
||
} |
48 changes: 48 additions & 0 deletions
48
server/src/main/java/org/elasticsearch/index/fielddata/HistogramValue.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.index.fielddata; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* Per-document histogram value. Every value of the histogram consist on | ||
* a value and a count. | ||
*/ | ||
public abstract class HistogramValue { | ||
|
||
/** | ||
* Advance this instance to the next value of the histogram | ||
* @return true if there is a next value | ||
*/ | ||
public abstract boolean next() throws IOException; | ||
|
||
/** | ||
* the current value of the histogram | ||
* @return the current value of the histogram | ||
*/ | ||
public abstract double value(); | ||
|
||
/** | ||
* The current count of the histogram | ||
* @return the current count of the histogram | ||
*/ | ||
public abstract int count(); | ||
|
||
} |
41 changes: 41 additions & 0 deletions
41
server/src/main/java/org/elasticsearch/index/fielddata/HistogramValues.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.index.fielddata; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* Per-segment histogram values. | ||
*/ | ||
public abstract class HistogramValues { | ||
|
||
/** | ||
* Advance this instance to the given document id | ||
* @return true if there is a value for this document | ||
*/ | ||
public abstract boolean advanceExact(int doc) throws IOException; | ||
|
||
/** | ||
* Get the {@link HistogramValue} associated with the current document. | ||
* The returned {@link HistogramValue} might be reused across calls. | ||
*/ | ||
public abstract HistogramValue histogram() throws IOException; | ||
|
||
} |
34 changes: 34 additions & 0 deletions
34
server/src/main/java/org/elasticsearch/index/fielddata/IndexHistogramFieldData.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.index.fielddata; | ||
|
||
|
||
import org.elasticsearch.index.Index; | ||
import org.elasticsearch.index.fielddata.plain.DocValuesIndexFieldData; | ||
|
||
/** | ||
* Specialization of {@link IndexFieldData} for histograms. | ||
*/ | ||
public abstract class IndexHistogramFieldData extends DocValuesIndexFieldData implements IndexFieldData<AtomicHistogramFieldData> { | ||
|
||
public IndexHistogramFieldData(Index index, String fieldName) { | ||
super(index, fieldName); | ||
} | ||
} |
Oops, something went wrong.