New mapping parameters to annotate dimensions and metrics in timeseries data #74014

csoulios · 2021-06-10T14:23:50Z

As Elasticsearch embraces time series data, we must make sure that fundamental timeseries concepts become first class citizens. We propose that we implement two new mapping parameters to be used for annotating dimension and metric fields in the index mapping.

Mapping dimensions

To mark a field as a dimension we will create a mapping paramenter named time_series_dimension that can take boolean values.

Mapping metrics

To mark a field as a metric we must create a mapping paramenter named time_series_metric. Its value will be a string that can take one of the following values: gauge, counter, histogram and summary.

For each metric type there should be a set of supported downsampling aggregations. However, there are cases that users want to override the downsampling aggregations that are supported by default. To allow them to override the default aggregations, we can allow the time_series_metric to be an object containing a type and a aggregations field.

An example to illustrate the index mapping can be found below:

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "hostname": {
        "type": "keyword",
        "time_series_dimension": true
      },
      "free_memory": {
        "type": "double",
        "time_series_metric": "gauge"
      },
      "free_memory": {
        "type": "double",
        "time_series_metric": {
            "type" : "counter",
            "aggregations": ["min". "max", "value_count", "sum"]
        }
      }
    }
  }
}

This issue deprecates the support for metric_type key in the field mapping meta (#72536).

Also, we should expose those parameters through the field capabilities API so that Kibana can access this information. However, this feature will be described in a separate issue.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-06-10T14:23:53Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

elasticmachine · 2021-06-10T14:25:41Z

Pinging @elastic/es-search (Team:Search)

imotov · 2021-06-10T23:22:56Z

I feel like timeseries_dimension and timeseries_metric use case is pretty generic, but the use case for aggregations is pretty specific to rollups. So, I wonder if we should do:

timeseries_dimension -> dimension
timeseries_metric -> metric_type or even metric
aggregations -> rollup_aggregations

nik9000 · 2021-06-11T12:00:43Z

I like dimension and metric_type!

axw · 2021-06-14T02:50:27Z

I feel like timeseries_dimension and timeseries_metric use case is pretty generic, but the use case for aggregations is pretty specific to rollups.

As long as it's only used for downsampling/rollups, and doesn't bleed into restricting storage or aggregations generally. There's an issue with summary metrics I'd like to raise.

In Elastic APM (maybe eventually Metricbeat's Prometheus module? CC @exekias), we would like to store pre-aggregated "summary" metrics: always sum/count, and optionally min/max and possibly other quantiles. These metrics would always support value_count, sum, and avg; possibly min/max. Ideally we would use aggregate_metric_double, but you have to explicitly state up front which metric sub-fields to store. We don't necessarily know this up front.

csoulios · 2021-06-15T16:17:20Z

@imotov I am fine with renaming timeseries_dimension to dimension and timeseries_metric to just metric. They are less verbose and simpler.

However, I think that renaming aggregations to rollup_aggregations is too restrictive. Although any aggregation can be computed on the raw data, after the data has been rolled up only the set of aggregations listed for the metric will be supported. So, this field will be used for adveritising to clients (kibana, metrics etc) what aggregations are supported.

Finally, in my view the most important part is that aggregations is a subfield of the metric parameter and not another parameter. Thisway there will not be any confusion that aggregations refers to the metric property of the field.

imotov · 2021-06-15T17:53:50Z

@csoulios had a chat about the aggregation field. The aggregations field in metrics plays 2 different roles here. Before role up it indicates fields that we want to keep, after the rollup it indicates the fields that were kept. We agreed that rollup_aggregations while is appropriate for the first role, doesn't reflect the second role correctly. So, we should probably keep something generic like aggregations there.

csoulios · 2021-06-15T18:54:48Z

In Elastic APM (maybe eventually Metricbeat's Prometheus module? CC @exekias), we would like to store pre-aggregated "summary" metrics: always sum/count, and optionally min/max and possibly other quantiles. These metrics would always support value_count, sum, and avg; possibly min/max. Ideally we would use aggregate_metric_double, but you have to explicitly state up front which metric sub-fields to store. We don't necessarily know this up front.

@axw I have created a separate issue #74145 to discuss about relaxing the constraints in aggregate_metric_double

axw · 2021-06-17T02:24:18Z

A related problem has has been brewing around histograms. I had been putting it off, but @benwtrent has just come across the same problem (see #74213)

The histogram field docs say:

The histogram field is "algorithm agnostic" and does not store data specific to either T-Digest or HDRHistogram. While this means the field can technically be aggregated with either algorithm, in practice the user should chose one algorithm and index data in that manner (e.g. centroids for T-Digest or intervals for HDRHistogram) to ensure best accuracy.

The problem with this is that a consumer doesn't necessarily know how the data was stored. This certainly applies to custom application metrics, e.g. so Lens could automatically decide the most appropriate algorithm to use for calculating percentiles. I imagine this would also apply when rolling up histogram metrics. Moreover, for rollups you might want to reduce the histogram resolution to reduce storage cost.

Would it make sense to extend the metric mapping parameter with information about histograms? i.e. which algorithm (and parameters) were used to record them? If this is not the right place to discuss, maybe let's take it to #74213.

imotov · 2021-06-17T20:22:19Z

Would it make sense to extend the metric mapping parameter with information about histograms?

I wouldn't mix these two. The issue in #74213 is quite specific to histogram fields. In a sense these two concernes are somewhat orthogonal, I would rather add an algorithm fields to the histogram field mapper than overload metrics with it.

Added the dimension parameter to the following field types: keyword ip Numeric field types (integer, long, byte, short) The dimension parameter is of type boolean (default: false) and is used to mark that a field is a time series dimension field. Relates to #74014

This PR renames dimension mapping parameter to time_series_dimension to make it consistent with time_series_metric parameter (#76766) Relates to #74450 and #74014

Added the time_series_metric mapping parameter to the unsigned_long and scaled_float field types Added the time_series_dimension mapping parameter to the unsigned_long field type Fixes #78100 Relates to #76766, #74450 and #74014

…tic#78012) This PR renames dimension mapping parameter to time_series_dimension to make it consistent with time_series_metric parameter (elastic#76766) Relates to elastic#74450 and elastic#74014

…#78204) Added the time_series_metric mapping parameter to the unsigned_long and scaled_float field types Added the time_series_dimension mapping parameter to the unsigned_long field type Fixes elastic#78100 Relates to elastic#76766, elastic#74450 and elastic#74014

…rameters (#78265) Backports the following PRs: * Add dimension mapping parameter (#74450) Added the dimension parameter to the following field types: keyword ip Numeric field types (integer, long, byte, short) The dimension parameter is of type boolean (default: false) and is used to mark that a field is a time series dimension field. Relates to #74014 * Add constraints to dimension fields (#74939) This PR adds the following constraints to dimension fields: It must be an indexed field and must has doc values It cannot be multi-valued The number of dimension fields in the index mapping must not be more than 16. This should be configurable through an index property (index.mapping.dimension_fields.limit) keyword fields cannot be more than 1024 bytes long keyword fields must not use a normalizer Based on the code added in PR #74450 Relates to #74660 * Expand DocumentMapperTests (#76368) Adds a test for setting the maximum number of dimensions setting and tests the names and types of the metadata fields in the index. Previously we just asserted the count of metadata fields. That made it hard to read failures. * Fix broken test for dimension keywords (#75408) Test was failing because it was testing 1024 bytes long keyword and assertion was failing. Closes #75225 * Checkstyle * Add time_series_metric parameter (#76766) This PR adds the time_series_metric parameter to the following field types: Numeric field types histogram aggregate_metric_double * Rename `dimension` mapping parameter to `time_series_dimension` (#78012) This PR renames dimension mapping parameter to time_series_dimension to make it consistent with time_series_metric parameter (#76766) Relates to #74450 and #74014 * Add time series params to `unsigned_long` and `scaled_float` (#78204) Added the time_series_metric mapping parameter to the unsigned_long and scaled_float field types Added the time_series_dimension mapping parameter to the unsigned_long field type Fixes #78100 Relates to #76766, #74450 and #74014 Co-authored-by: Nik Everett <[email protected]>

csoulios added >enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data labels Jun 10, 2021

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 10, 2021

csoulios added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Jun 10, 2021

elasticmachine added the Team:Search Meta label for search team label Jun 10, 2021

jsoriano mentioned this issue Jun 15, 2021

[Fleet] Add support for meta in fields.yml elastic/kibana#82273

Closed

yuliacech mentioned this issue Jun 17, 2021

[Index Management][Mappings Editor] Add mappings parameters for dimensions and metrics elastic/kibana#102531

Open

csoulios mentioned this issue Jun 22, 2021

Add dimension mapping parameter #74450

Merged

csoulios mentioned this issue Jul 5, 2021

Add better support for metric data types (TSDB) #74660

Closed

axw mentioned this issue Jul 6, 2021

Add units to metric duration fields elastic/apm-server#5395

Merged

1 task

jsoriano mentioned this issue Jul 9, 2021

[Discuss] Avoiding duplication of ECS field definitions elastic/package-spec#63

Closed

csoulios mentioned this issue Aug 20, 2021

Add time_series_metric parameter #76766

Merged

jsoriano mentioned this issue Aug 30, 2021

Add support for dimension fields elastic/package-spec#215

Closed

csoulios closed this as completed in #76766 Sep 20, 2021

csoulios mentioned this issue Sep 20, 2021

Rename dimension mapping parameter to time_series_dimension #78012

Merged

csoulios mentioned this issue Sep 22, 2021

Add time series params to unsigned_long and scaled_float #78204

Merged

This was referenced Oct 19, 2021

Add support for dimension fields elastic/package-spec#236

Merged

Add support for dimension fields defined in packages elastic/kibana#115620

Closed

Add support for new mapping parameters for metrics defined in packages elastic/kibana#115621

Closed

jsoriano mentioned this issue Nov 9, 2021

Add meta information to fields where this applied elastic/ecs#721

Open

2 tasks

juliaElastic mentioned this issue Feb 23, 2022

[Fleet] added support for dimension field elastic/kibana#126257

Merged

1 task

jsoriano mentioned this issue Feb 24, 2022

[Fleet] added time_series_metric mapping for metric_type package field elastic/kibana#126322

Merged

1 task

javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New mapping parameters to annotate dimensions and metrics in timeseries data #74014

New mapping parameters to annotate dimensions and metrics in timeseries data #74014

csoulios commented Jun 10, 2021 •

edited

Loading

elasticmachine commented Jun 10, 2021

elasticmachine commented Jun 10, 2021

imotov commented Jun 10, 2021

nik9000 commented Jun 11, 2021

axw commented Jun 14, 2021

csoulios commented Jun 15, 2021

imotov commented Jun 15, 2021

csoulios commented Jun 15, 2021 •

edited

Loading

axw commented Jun 17, 2021

imotov commented Jun 17, 2021

New mapping parameters to annotate dimensions and metrics in timeseries data #74014

New mapping parameters to annotate dimensions and metrics in timeseries data #74014

Comments

csoulios commented Jun 10, 2021 • edited Loading

Mapping dimensions

Mapping metrics

elasticmachine commented Jun 10, 2021

elasticmachine commented Jun 10, 2021

imotov commented Jun 10, 2021

nik9000 commented Jun 11, 2021

axw commented Jun 14, 2021

csoulios commented Jun 15, 2021

imotov commented Jun 15, 2021

csoulios commented Jun 15, 2021 • edited Loading

axw commented Jun 17, 2021

imotov commented Jun 17, 2021

csoulios commented Jun 10, 2021 •

edited

Loading

csoulios commented Jun 15, 2021 •

edited

Loading