Skip to content

Commit

Permalink
Aggs: Make it possible to configure missing values.
Browse files Browse the repository at this point in the history
Most aggregations (terms, histogram, stats, percentiles, geohash-grid) now
support a new `missing` option which defines the value to consider when a
field does not have a value. This can be handy if you eg. want a terms
aggregation to handle the same way documents that have "N/A" or no value
for a `tag` field.

This works in a very similar way to the `missing` option on the `sort`
element.

One known issue is that this option sometimes cannot make the right decision
in the unmapped case: it needs to replace all values with the `missing` value
but might not know what kind of values source should be produced (numerics,
strings, geo points?). For this reason, we might want to add an `unmapped_type`
option in the future like we did for sorting.

Related to elastic#5324
  • Loading branch information
jpountz committed May 15, 2015
1 parent 66921ff commit 32e23b9
Show file tree
Hide file tree
Showing 25 changed files with 1,311 additions and 48 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,26 @@ settings and filter the returned buckets based on a `min_doc_count` setting (by
bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds`
setting, which enables extending the bounds of the histogram beyond the data itself (to read more on why you'd want to
do that please refer to the explanation <<search-aggregations-bucket-histogram-aggregation-extended-bounds,here>>).

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"publish_date" : {
"datehistogram" : {
"field" : "publish_date",
"interval": "year",
"missing": "2000-01-01" <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `publish_date` field will fall into the same bucket as documents that have the value `2000-01-01`.
23 changes: 23 additions & 0 deletions docs/reference/aggregations/bucket/histogram-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -317,3 +317,26 @@ Response:
}
}
--------------------------------------------------

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"quantity" : {
"histogram" : {
"field" : "quantity",
"interval": 10,
"missing": 0 <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `quantity` field will fall into the same bucket as documents that have the value `0`.
22 changes: 22 additions & 0 deletions docs/reference/aggregations/bucket/terms-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -655,3 +655,25 @@ in inner aggregations.
<1> experimental[] the possible values are `map`, `global_ordinals`, `global_ordinals_hash` and `global_ordinals_low_cardinality`

Please note that Elasticsearch will ignore this execution hint if it is not applicable and that there is no backward compatibility guarantee on these hints.

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tags",
"missing": "N/A" <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `tags` field will fall into the same bucket as documents that have the value `N/A`.
24 changes: 23 additions & 1 deletion docs/reference/aggregations/metrics/avg-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,26 @@ It turned out that the exam was way above the level of the students and a grade
}
}
}
--------------------------------------------------
--------------------------------------------------

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"grade_avg" : {
"avg" : {
"field" : "grade",
"missing": 10 <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.
Original file line number Diff line number Diff line change
Expand Up @@ -155,3 +155,24 @@ however since hashes need to be computed on the fly.

TIP: The `script` parameter expects an inline script. Use `script_id` for indexed scripts and `script_file` for scripts in the `config/scripts/` directory.

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"tag_cardinality" : {
"cardinality" : {
"field" : "tag",
"missing": "N/A" <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `tag` field will fall into the same bucket as documents that have the value `N/A`.
Original file line number Diff line number Diff line change
Expand Up @@ -116,4 +116,26 @@ It turned out that the exam was way above the level of the students and a grade
}
}
}
--------------------------------------------------
--------------------------------------------------

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"grades_stats" : {
"extended_stats" : {
"field" : "grade",
"missing": 0 <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `0`.
21 changes: 21 additions & 0 deletions docs/reference/aggregations/metrics/max-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,24 @@ Let's say that the prices of the documents in our index are in USD, but we would
}
--------------------------------------------------

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"grade_max" : {
"max" : {
"field" : "grade",
"missing": 10 <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.
22 changes: 22 additions & 0 deletions docs/reference/aggregations/metrics/min-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,25 @@ Let's say that the prices of the documents in our index are in USD, but we would
}
}
--------------------------------------------------

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"grade_min" : {
"min" : {
"field" : "grade",
"missing": 10 <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.
Original file line number Diff line number Diff line change
Expand Up @@ -190,3 +190,25 @@ A "node" uses roughly 32 bytes of memory, so under worst-case scenarios (large a
of data which arrives sorted and in-order) the default settings will produce a
TDigest roughly 64KB in size. In practice data tends to be more random and
the TDigest will use less memory.

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"grade_percentiles" : {
"percentiles" : {
"field" : "grade",
"missing": 10 <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,25 @@ script to generate values which percentile ranks are calculated on
<2> Scripting supports parameterized input just like any other script

TIP: The `script` parameter expects an inline script. Use `script_id` for indexed scripts and `script_file` for scripts in the `config/scripts/` directory.

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"grade_ranks" : {
"percentile_ranks" : {
"field" : "grade",
"missing": 10 <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.
24 changes: 23 additions & 1 deletion docs/reference/aggregations/metrics/stats-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -78,4 +78,26 @@ It turned out that the exam was way above the level of the students and a grade
}
}
}
--------------------------------------------------
--------------------------------------------------

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"grades_stats" : {
"stats" : {
"field" : "grade",
"missing": 0 <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `0`.
22 changes: 22 additions & 0 deletions docs/reference/aggregations/metrics/sum-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,25 @@ Computing the sum of squares over all stock tick changes:
}
}
--------------------------------------------------

==== Missing value

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.

[source,js]
--------------------------------------------------
{
"aggs" : {
"total_time" : {
"sum" : {
"field" : "took",
"missing": 100 <1>
}
}
}
}
--------------------------------------------------

<1> Documents without a value in the `took` field will fall into the same bucket as documents that have the value `100`.
21 changes: 13 additions & 8 deletions src/main/java/org/elasticsearch/common/geo/GeoUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -409,19 +409,24 @@ public static GeoPoint parseGeoPoint(XContentParser parser, GeoPoint point) thro
return point.reset(lat, lon);
} else if(parser.currentToken() == Token.VALUE_STRING) {
String data = parser.text();
int comma = data.indexOf(',');
if(comma > 0) {
lat = Double.parseDouble(data.substring(0, comma).trim());
lon = Double.parseDouble(data.substring(comma + 1).trim());
return point.reset(lat, lon);
} else {
return point.resetFromGeoHash(data);
}
return parseGeoPoint(data, point);
} else {
throw new ElasticsearchParseException("geo_point expected");
}
}

/** parse a {@link GeoPoint} from a String */
public static GeoPoint parseGeoPoint(String data, GeoPoint point) {
int comma = data.indexOf(',');
if(comma > 0) {
double lat = Double.parseDouble(data.substring(0, comma).trim());
double lon = Double.parseDouble(data.substring(comma + 1).trim());
return point.reset(lat, lon);
} else {
return point.resetFromGeoHash(data);
}
}

private GeoUtils() {
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ public abstract class ValuesSourceAggregationBuilder<B extends ValuesSourceAggre
private String script;
private String lang;
private Map<String, Object> params;
private Object missing;

/**
* Constructs a new builder.
Expand Down Expand Up @@ -117,6 +118,14 @@ public B params(Map<String, Object> params) {
return (B) this;
}

/**
* Configure the value to use when documents miss a value.
*/
public B missing(Object missingValue) {
this.missing = missingValue;
return (B) this;
}

@Override
protected final XContentBuilder internalXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject();
Expand All @@ -132,6 +141,9 @@ protected final XContentBuilder internalXContent(XContentBuilder builder, Params
if (this.params != null) {
builder.field("params").map(this.params);
}
if (missing != null) {
builder.field("missing", missing);
}

doInternalXContent(builder, params);
return builder.endObject();
Expand Down
Loading

0 comments on commit 32e23b9

Please sign in to comment.