Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New mapping parameters to annotate dimensions and metrics in timeseries data #74014

Closed
csoulios opened this issue Jun 10, 2021 · 10 comments · Fixed by #76766
Closed

New mapping parameters to annotate dimensions and metrics in timeseries data #74014

csoulios opened this issue Jun 10, 2021 · 10 comments · Fixed by #76766
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@csoulios
Copy link
Contributor

csoulios commented Jun 10, 2021

As Elasticsearch embraces time series data, we must make sure that fundamental timeseries concepts become first class citizens. We propose that we implement two new mapping parameters to be used for annotating dimension and metric fields in the index mapping.

Mapping dimensions

To mark a field as a dimension we will create a mapping paramenter named time_series_dimension that can take boolean values.

Mapping metrics

To mark a field as a metric we must create a mapping paramenter named time_series_metric. Its value will be a string that can take one of the following values: gauge, counter, histogram and summary.

For each metric type there should be a set of supported downsampling aggregations. However, there are cases that users want to override the downsampling aggregations that are supported by default. To allow them to override the default aggregations, we can allow the time_series_metric to be an object containing a type and a aggregations field.

An example to illustrate the index mapping can be found below:

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "hostname": {
        "type": "keyword",
        "time_series_dimension": true
      },
      "free_memory": {
        "type": "double",
        "time_series_metric": "gauge"
      },
      "free_memory": {
        "type": "double",
        "time_series_metric": {
            "type" : "counter",
            "aggregations": ["min". "max", "value_count", "sum"]
        }
      }
    }
  }
}

This issue deprecates the support for metric_type key in the field mapping meta (#72536).

Also, we should expose those parameters through the field capabilities API so that Kibana can access this information. However, this feature will be described in a separate issue.

@csoulios csoulios added >enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data labels Jun 10, 2021
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@csoulios csoulios added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Jun 10, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jun 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@imotov
Copy link
Contributor

imotov commented Jun 10, 2021

I feel like timeseries_dimension and timeseries_metric use case is pretty generic, but the use case for aggregations is pretty specific to rollups. So, I wonder if we should do:

  • timeseries_dimension -> dimension
  • timeseries_metric -> metric_type or even metric
  • aggregations -> rollup_aggregations

@nik9000
Copy link
Member

nik9000 commented Jun 11, 2021

I like dimension and metric_type!

@axw
Copy link
Member

axw commented Jun 14, 2021

I feel like timeseries_dimension and timeseries_metric use case is pretty generic, but the use case for aggregations is pretty specific to rollups.

As long as it's only used for downsampling/rollups, and doesn't bleed into restricting storage or aggregations generally. There's an issue with summary metrics I'd like to raise.

In Elastic APM (maybe eventually Metricbeat's Prometheus module? CC @exekias), we would like to store pre-aggregated "summary" metrics: always sum/count, and optionally min/max and possibly other quantiles. These metrics would always support value_count, sum, and avg; possibly min/max. Ideally we would use aggregate_metric_double, but you have to explicitly state up front which metric sub-fields to store. We don't necessarily know this up front.

@csoulios
Copy link
Contributor Author

@imotov I am fine with renaming timeseries_dimension to dimension and timeseries_metric to just metric. They are less verbose and simpler.

However, I think that renaming aggregations to rollup_aggregations is too restrictive. Although any aggregation can be computed on the raw data, after the data has been rolled up only the set of aggregations listed for the metric will be supported. So, this field will be used for adveritising to clients (kibana, metrics etc) what aggregations are supported.

Finally, in my view the most important part is that aggregations is a subfield of the metric parameter and not another parameter. Thisway there will not be any confusion that aggregations refers to the metric property of the field.

@imotov
Copy link
Contributor

imotov commented Jun 15, 2021

@csoulios had a chat about the aggregation field. The aggregations field in metrics plays 2 different roles here. Before role up it indicates fields that we want to keep, after the rollup it indicates the fields that were kept. We agreed that rollup_aggregations while is appropriate for the first role, doesn't reflect the second role correctly. So, we should probably keep something generic like aggregations there.

@csoulios
Copy link
Contributor Author

csoulios commented Jun 15, 2021

In Elastic APM (maybe eventually Metricbeat's Prometheus module? CC @exekias), we would like to store pre-aggregated "summary" metrics: always sum/count, and optionally min/max and possibly other quantiles. These metrics would always support value_count, sum, and avg; possibly min/max. Ideally we would use aggregate_metric_double, but you have to explicitly state up front which metric sub-fields to store. We don't necessarily know this up front.

@axw I have created a separate issue #74145 to discuss about relaxing the constraints in aggregate_metric_double

@axw
Copy link
Member

axw commented Jun 17, 2021

A related problem has has been brewing around histograms. I had been putting it off, but @benwtrent has just come across the same problem (see #74213)

The histogram field docs say:

The histogram field is "algorithm agnostic" and does not store data specific to either T-Digest or HDRHistogram. While this means the field can technically be aggregated with either algorithm, in practice the user should chose one algorithm and index data in that manner (e.g. centroids for T-Digest or intervals for HDRHistogram) to ensure best accuracy.

The problem with this is that a consumer doesn't necessarily know how the data was stored. This certainly applies to custom application metrics, e.g. so Lens could automatically decide the most appropriate algorithm to use for calculating percentiles. I imagine this would also apply when rolling up histogram metrics. Moreover, for rollups you might want to reduce the histogram resolution to reduce storage cost.

Would it make sense to extend the metric mapping parameter with information about histograms? i.e. which algorithm (and parameters) were used to record them? If this is not the right place to discuss, maybe let's take it to #74213.

@imotov
Copy link
Contributor

imotov commented Jun 17, 2021

Would it make sense to extend the metric mapping parameter with information about histograms?

I wouldn't mix these two. The issue in #74213 is quite specific to histogram fields. In a sense these two concernes are somewhat orthogonal, I would rather add an algorithm fields to the histogram field mapper than overload metrics with it.

csoulios added a commit that referenced this issue Jun 24, 2021
Added the dimension parameter to the following field types:

    keyword
    ip
    Numeric field types (integer, long, byte, short)

The dimension parameter is of type boolean (default: false) and is used 
to mark that a field is a time series dimension field.

Relates to #74014
csoulios added a commit that referenced this issue Sep 20, 2021
This PR renames dimension mapping parameter to time_series_dimension to make it consistent with time_series_metric parameter (#76766)

Relates to #74450 and #74014
csoulios added a commit that referenced this issue Sep 23, 2021
    Added the time_series_metric mapping parameter to the unsigned_long and scaled_float field types
    Added the time_series_dimension mapping parameter to the unsigned_long field type

Fixes #78100

Relates to #76766, #74450 and #74014
csoulios added a commit to csoulios/elasticsearch that referenced this issue Sep 23, 2021
…tic#78012)

This PR renames dimension mapping parameter to time_series_dimension to make it consistent with time_series_metric parameter (elastic#76766)

Relates to elastic#74450 and elastic#74014
csoulios added a commit to csoulios/elasticsearch that referenced this issue Sep 23, 2021
…#78204)

    Added the time_series_metric mapping parameter to the unsigned_long and scaled_float field types
    Added the time_series_dimension mapping parameter to the unsigned_long field type

Fixes elastic#78100

Relates to elastic#76766, elastic#74450 and elastic#74014
csoulios added a commit that referenced this issue Sep 27, 2021
…rameters (#78265)

Backports the following PRs:

* Add dimension mapping parameter (#74450)

Added the dimension parameter to the following field types:

    keyword
    ip
    Numeric field types (integer, long, byte, short)

The dimension parameter is of type boolean (default: false) and is used
to mark that a field is a time series dimension field.

Relates to #74014

* Add constraints to dimension fields (#74939)

This PR adds the following constraints to dimension fields:

    It must be an indexed field and must has doc values
    It cannot be multi-valued
    The number of dimension fields in the index mapping must not be more than 16. This should be configurable through an index property (index.mapping.dimension_fields.limit)
    keyword fields cannot be more than 1024 bytes long
    keyword fields must not use a normalizer

Based on the code added in PR #74450
Relates to #74660

* Expand DocumentMapperTests (#76368)

Adds a test for setting the maximum number of dimensions setting and
tests the names and types of the metadata fields in the index.
Previously we just asserted the count of metadata fields. That made it
hard to read failures.

* Fix broken test for dimension keywords (#75408)

Test was failing because it was testing 1024 bytes long keyword and assertion was failing.

Closes #75225

* Checkstyle

* Add time_series_metric parameter (#76766)

This PR adds the time_series_metric parameter to the following field types:

    Numeric field types
    histogram
    aggregate_metric_double

* Rename `dimension` mapping parameter to `time_series_dimension` (#78012)

This PR renames dimension mapping parameter to time_series_dimension to make it consistent with time_series_metric parameter (#76766)

Relates to #74450 and #74014

* Add time series params to `unsigned_long` and `scaled_float` (#78204)

    Added the time_series_metric mapping parameter to the unsigned_long and scaled_float field types
    Added the time_series_dimension mapping parameter to the unsigned_long field type

Fixes #78100

Relates to #76766, #74450 and #74014

Co-authored-by: Nik Everett <[email protected]>
@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants