Create a default mapping template for metrics #72536

exekias · 2021-04-30T09:50:19Z

With dynamic templates in bulk requests (#69948) merged, we are considering the option of creating a default template to allow for dynamic metrics ingestion. This would be leveraged by many Elastic Agent integrations, and is specially interesting for the ones reporting dynamic metrics (where we only get to know the metric / field names at runtime, and hence, we cannot define a mapping before ingestion).

The overall idea is to map all possible combinations of <metric_type> and <unit>, so each combination gets a unique predictable name that can be referenced at ingest time. For example:

{
  "mappings": {
    "dynamic_templates": [{
      "gauge_double_s": {
        "mapping": {
          "type": "double",
          "meta": {
            "metric_type": "gauge",
            "unit": "s"
          }
        }
      }
    },
    {
      "gauge_double_byte": {
        "mapping": {
          "type": "double",
          "meta": {
            "metric_type": "gauge",
            "unit": "byte"
          }
        }
      }
    },
    ...]
  }
}

The obvious concern is: We would be creating a dynamic mapping entry per every combination of <metric_type> and , which will explode into many entries:

metric_type: gauge, counter, histogram. Not expected to grow much
unit: byte, percent, d, h, m, s, ms, micros, nanos, with more units to come as we onboard other metrics.
ES field type: long, double, integer, byte, float, scaled_float.

So the main question is: Would this be considered a good practice? Can it cause any issues because of a too large mapping template? Perhaps we should do this in a different way?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-04-30T09:50:22Z

Pinging @elastic/es-search (Team:Search)

elasticmachine · 2021-04-30T09:50:22Z

Pinging @elastic/es-core-features (Team:Core/Features)

jpountz · 2021-05-03T08:13:52Z

@exekias I can imagine the number of dynamic templates growing very quickly using this approach so I'd like to take some time to look into whether Elasticsearch can make it easier.

One idea I'm considering is giving clients the ability to override metadata as part of bulk requests similarly to how we allow selecting a dynamic template, something like this:

POST /_bulk
{ "create" : { "_index" : "my_index", "_id" : "1", "dynamic_templates": {"system.cpu.user.pct": "scaled_float_100"}, "meta": {"system.cpu.user.pct": { "unit": "percent", "metric_type": "gauge" }}} }
{ "system": { "cpu": { "user": { "pct": "22.7" }}}}

With the dynamic_templates option, we avoided giving the ability to pass arbitrary mappings as otherwise users who only have the create_doc and auto_configure privileges could effectively put arbitrary mappings even though they don't have the manage privilege. Maybe it's ok to make an exception for the meta of mappings given that it never affects how fields get indexed. But then we'd need to make sure it remains that way.

cc @csoulios given the ongoing discussion about leveraging meta to configure rollups
cc @tvernum do you have an opinion about the security implications?

tvernum · 2021-05-04T05:33:41Z

Because other stack components rely on meta being meaningful, I think there are some implications if we allow all users to set the metadata too freely.
Or more accurately, I'm not confident that we can be sure there are no implications. It might be fine, but the system is sufficiently complex that it's hard to be sure.

I'd feel better if the definition of those dynamic templates had some way to indicate that they expected parameterized metadata, either a new flag (accept_custom_meta), or by making use of template variables in some way.

axw · 2021-05-04T05:48:18Z

or by making use of template variables in some way.

I really like this idea. I imagine a request would look something like:

POST /_bulk
{ "create" : { "_index" : "my_index", "_id" : "1", "dynamic_templates": {"system.cpu.user.pct": {"name": "scaled_float", "params": {"scaling_factor": 100, "metric_type": "gauge", "unit": "percent"}}} } }
{ "system": { "cpu": { "user": { "pct": "22.7" }}}}

and the dynamic_template (but probably with some conditional rendering of meta):

"scaled_float": {
  "mapping": {
    "type": "scaled_float",
    "scaling_factor": {{scaling_factor}},
    "meta": {
      "metric_type": {{metric_type}},
      "unit": {{unit}}
    }
  }
}

jpountz · 2021-07-19T14:43:46Z

I'm downgrading to team-discuss for the Search area to discuss whether to move forward with @tvernum + @axw 's proposal of templating.

joegallo · 2023-03-23T14:42:01Z

I'm removing the team-discuss label from some older Team:Data Management issues -- we've had plenty of time to discuss them, but we haven't, so the label isn't serving its purpose. Feel free to delete this comment and/or re-add the team-discuss label.

elasticsearchmachine · 2023-03-23T14:43:24Z

Pinging @elastic/es-data-management (Team:Data Management)

felixbarny · 2024-01-11T14:48:53Z

When working on #104037, this issue came up again. It is particularly relevant for dynamically mapping the unit for OTel metrics to Elasticsearch mapping metadata.

In addition to @axw's proposal (#72536 (comment)), I'd like to propose another alternative that is based around encoding the unit in the field path and then reading out the unit from the field name in the meta.unit field of the metric itself.

We could store metrics in a metrics.<type>.<unit>.* namespace, for example metrics.counter.ms.my_counter.

With a dynamic template like this, we could then read out the ms part and attach it to unit.meta for the counter:

{
  "mappings": {
    "dynamic_templates": [
      {
        "counter": {
          "path_match": "metrics.counter.*.*",
          "mapping": {
            "type": "{dynamic_type}",
            "time_series_metric": "counter",
            "meta": {
              "unit": "{parent.name}"
            }
          }
        }
      }
    ]
  }
}

This introduces a new {parent.name} template variable for dynamic templates, in addition to the existing ones {name} and {dynamic_type}. Similarly to {name}, it reads out the name of a mapper. But in this case, it reads out the name of the parent mapper, which holds the information about the duration.

To make sure that you can still just do aggregations on my_counter, we could map the parent object that encodes information about the unit (for example metrics.counter.ms) to the passthrough field type (see #103648). Maybe we'd also want to introduce a way to somehow hide the physical name metrics.counter.ms.my_counter and only expose my_counter in the field caps API.

The dynamic mapping for the unit field could look like the following:

{
  "mappings": {
    "dynamic_templates": [
      {
        "units": {
          "path_match": "metrics.*.*",
          "path_unmatch": "metrics.*.*.*",
          "match_mapping_type": "object",
          "mapping": {
            "subobjects": false,
            "type": "passthrough"
          }
        }
      }
    ]
  }
}

felixbarny · 2024-01-11T17:05:11Z

I had a chat about the two proposals with the @elastic/es-storage-engine team.

@kkrik-es felt like that we should maybe defer support for dynamically specifying the unit for now and rather build a dedicated metrics intake API in Elasticsearch first, possibly OTLP. This is not the only reason why a dedicated metrics API would make sense but it would probably also make it easier to support units compared to building something that relies on _bulk.

We've also discussed whether changing the internal structure of how we store metrics may be considered a breaking change. For example, if we store metric in metrics.<metric_name> or metrics.<metric_type>.<metric_name> now but change that to metrics.<metric_type>.<unit>.<metric_name> later on. Ideally, such changes would not constitute a breaking change but as we're exposing the object structure in _source and via ingest pipelines, users may depend on the structure of the documents.

Maybe we shouldn't allow any features for OTel metrics that rely on the physical structure of the ES documents (such as ingest processing, and exposing _source) and instead rely on OTLP as the stable interface. But that would imply a big shift to what we've done so far.

If we consider structural changes to be a breaking change, there's some urgency to get the mapping right soon.

axw · 2024-01-16T00:21:19Z

@kkrik-es felt like that we should maybe defer support for dynamically specifying the unit for now and rather build a dedicated metrics intake API in Elasticsearch first, possibly OTLP. This is not the only reason why a dedicated metrics API would make sense but it would probably also make it easier to support units compared to building something that relies on _bulk.

++ If I were to choose, I'd just implement OTLP support. Otherwise we'll still need to adapt OTel metrics to whatever we build -- so unless there's some additional metric features that don't exist in OTel, I don't see the point.

Maybe we shouldn't allow any features for OTel metrics that rely on the physical structure of the ES documents (such as ingest processing, and exposing _source) and instead rely on OTLP as the stable interface. But that would imply a big shift to what we've done so far.

++ A big shift indeed, and potentially far-reaching. e.g. it could also cover removing the need for _id by not exposing metric documents at all, and removing the ability to expose individual data points -- so in essence all searches over metrics/TSDS would be aggregations over time. That could allow things like automatic transformation of delta to cumulative metrics, and rolling up metrics at arbitrary points in time (e.g. at ingest, segment merge, or query time).

felixbarny · 2024-01-16T08:15:19Z

it could also cover removing the need for _id by not exposing metric documents at all, and removing the ability to expose individual data points -- so in essence all searches over metrics/TSDS would be aggregations over time.

I like that but it seems it would make hard to migrate to that because it breaks a lot of assumptions.

That could allow things like automatic transformation of delta to cumulative metrics, and rolling up metrics at arbitrary points in time (e.g. at ingest, segment merge, or query time).

Why can't we do that now and why does not exposing individual data points enable us to do that?

axw · 2024-01-17T00:42:40Z

Why can't we do that now and why does not exposing individual data points enable us to do that?

It comes down to user expectations: if you expect to be able to get out of ES exactly what you put in, then it precludes the kind of automatic transformations I'm thinking of. If on the other hand you are happy to record a data point in a time series, and only ever get back aggregations with some bounded granularity, then it's fine. It wouldn't really matter if the storage is cumulative, delta, rolled up at ingest time, etc., as it would be hidden from the user.

felixbarny · 2024-01-17T07:33:56Z

I agree that we should be able to do that but I don't think we're blocked to do that now. I'd argue that with rollups, we've kind of broken the glass already. Configuring rollups is currently a conscious choice made by the user. However, I don't think we'd be blocked from enabling rollups by default in integrations, for example.

felixbarny · 2024-02-22T13:46:22Z

Another idea that came to mind is that we could allow accessing field values of the document via template variables. We could then store the unit in a top-level unit field and access it like this in a dynamic template:

{
  "counter": {
    "mapping": {
      "type": "{dynamic_type}",
      "time_series_metric": "counter",
      "meta": {
        "unit": "{doc[unit]}"
      }
    }
  }
}

elasticsearchmachine · 2024-07-16T12:45:52Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

exekias added >enhancement discuss :Search Foundations/Mapping Index mappings, including merging and defining field types :Data Management/Data streams Data streams and their lifecycles labels Apr 30, 2021

elasticmachine added Team:Search Meta label for search team Team:Data Management Meta label for data/management team labels Apr 30, 2021

exekias mentioned this issue Apr 30, 2021

libbeat: support explicit dynamic templates elastic/beats#25422

Merged

4 tasks

axw mentioned this issue May 28, 2021

Add support for histograms to metrics intake elastic/apm-server#5360

Merged

1 task

csoulios mentioned this issue Jun 10, 2021

New mapping parameters to annotate dimensions and metrics in timeseries data #74014

Closed

jpountz added team-discuss and removed discuss labels Jul 19, 2021

axw mentioned this issue Oct 13, 2021

Add support for Cloudwatch metrics streams elastic/integrations#956

Closed

joegallo removed the team-discuss label Mar 23, 2023

felixbarny mentioned this issue Jan 10, 2024

Add support for time, byte, and percent units in NumberFieldMapper #104037

Open

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

javanna removed the Team:Search Meta label for search team label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a default mapping template for metrics #72536

Create a default mapping template for metrics #72536

exekias commented Apr 30, 2021

elasticmachine commented Apr 30, 2021

elasticmachine commented Apr 30, 2021

jpountz commented May 3, 2021

tvernum commented May 4, 2021

axw commented May 4, 2021

jpountz commented Jul 19, 2021

joegallo commented Mar 23, 2023

elasticsearchmachine commented Mar 23, 2023

felixbarny commented Jan 11, 2024

felixbarny commented Jan 11, 2024 •

edited

Loading

axw commented Jan 16, 2024 •

edited

Loading

felixbarny commented Jan 16, 2024

axw commented Jan 17, 2024

felixbarny commented Jan 17, 2024

felixbarny commented Feb 22, 2024 •

edited

Loading

elasticsearchmachine commented Jul 16, 2024

Create a default mapping template for metrics #72536

Create a default mapping template for metrics #72536

Comments

exekias commented Apr 30, 2021

elasticmachine commented Apr 30, 2021

elasticmachine commented Apr 30, 2021

jpountz commented May 3, 2021

tvernum commented May 4, 2021

axw commented May 4, 2021

jpountz commented Jul 19, 2021

joegallo commented Mar 23, 2023

elasticsearchmachine commented Mar 23, 2023

felixbarny commented Jan 11, 2024

felixbarny commented Jan 11, 2024 • edited Loading

axw commented Jan 16, 2024 • edited Loading

felixbarny commented Jan 16, 2024

axw commented Jan 17, 2024

felixbarny commented Jan 17, 2024

felixbarny commented Feb 22, 2024 • edited Loading

elasticsearchmachine commented Jul 16, 2024

felixbarny commented Jan 11, 2024 •

edited

Loading

axw commented Jan 16, 2024 •

edited

Loading

felixbarny commented Feb 22, 2024 •

edited

Loading