Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a default mapping template for metrics #72536

Open
exekias opened this issue Apr 30, 2021 · 16 comments
Open

Create a default mapping template for metrics #72536

exekias opened this issue Apr 30, 2021 · 16 comments
Labels
:Data Management/Data streams Data streams and their lifecycles >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Data Management Meta label for data/management team Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@exekias
Copy link

exekias commented Apr 30, 2021

With dynamic templates in bulk requests (#69948) merged, we are considering the option of creating a default template to allow for dynamic metrics ingestion. This would be leveraged by many Elastic Agent integrations, and is specially interesting for the ones reporting dynamic metrics (where we only get to know the metric / field names at runtime, and hence, we cannot define a mapping before ingestion).

The overall idea is to map all possible combinations of <metric_type> and <unit>, so each combination gets a unique predictable name that can be referenced at ingest time. For example:

{
  "mappings": {
    "dynamic_templates": [{
      "gauge_double_s": {
        "mapping": {
          "type": "double",
          "meta": {
            "metric_type": "gauge",
            "unit": "s"
          }
        }
      }
    },
    {
      "gauge_double_byte": {
        "mapping": {
          "type": "double",
          "meta": {
            "metric_type": "gauge",
            "unit": "byte"
          }
        }
      }
    },
    ...]
  }
}

The obvious concern is: We would be creating a dynamic mapping entry per every combination of <metric_type> and , which will explode into many entries:

  • metric_type: gauge, counter, histogram. Not expected to grow much
  • unit: byte, percent, d, h, m, s, ms, micros, nanos, with more units to come as we onboard other metrics.
  • ES field type: long, double, integer, byte, float, scaled_float.

So the main question is: Would this be considered a good practice? Can it cause any issues because of a too large mapping template? Perhaps we should do this in a different way?

@exekias exekias added >enhancement discuss :Search Foundations/Mapping Index mappings, including merging and defining field types :Data Management/Data streams Data streams and their lifecycles labels Apr 30, 2021
@elasticmachine elasticmachine added Team:Search Meta label for search team Team:Data Management Meta label for data/management team labels Apr 30, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@jpountz
Copy link
Contributor

jpountz commented May 3, 2021

@exekias I can imagine the number of dynamic templates growing very quickly using this approach so I'd like to take some time to look into whether Elasticsearch can make it easier.

One idea I'm considering is giving clients the ability to override metadata as part of bulk requests similarly to how we allow selecting a dynamic template, something like this:

POST /_bulk
{ "create" : { "_index" : "my_index", "_id" : "1", "dynamic_templates": {"system.cpu.user.pct": "scaled_float_100"}, "meta": {"system.cpu.user.pct": { "unit": "percent", "metric_type": "gauge" }}} }
{ "system": { "cpu": { "user": { "pct": "22.7" }}}}

With the dynamic_templates option, we avoided giving the ability to pass arbitrary mappings as otherwise users who only have the create_doc and auto_configure privileges could effectively put arbitrary mappings even though they don't have the manage privilege. Maybe it's ok to make an exception for the meta of mappings given that it never affects how fields get indexed. But then we'd need to make sure it remains that way.

cc @csoulios given the ongoing discussion about leveraging meta to configure rollups
cc @tvernum do you have an opinion about the security implications?

@tvernum
Copy link
Contributor

tvernum commented May 4, 2021

Because other stack components rely on meta being meaningful, I think there are some implications if we allow all users to set the metadata too freely.
Or more accurately, I'm not confident that we can be sure there are no implications. It might be fine, but the system is sufficiently complex that it's hard to be sure.

I'd feel better if the definition of those dynamic templates had some way to indicate that they expected parameterized metadata, either a new flag (accept_custom_meta), or by making use of template variables in some way.

@axw
Copy link
Member

axw commented May 4, 2021

or by making use of template variables in some way.

I really like this idea. I imagine a request would look something like:

POST /_bulk
{ "create" : { "_index" : "my_index", "_id" : "1", "dynamic_templates": {"system.cpu.user.pct": {"name": "scaled_float", "params": {"scaling_factor": 100, "metric_type": "gauge", "unit": "percent"}}} } }
{ "system": { "cpu": { "user": { "pct": "22.7" }}}}

and the dynamic_template (but probably with some conditional rendering of meta):

"scaled_float": {
  "mapping": {
    "type": "scaled_float",
    "scaling_factor": {{scaling_factor}},
    "meta": {
      "metric_type": {{metric_type}},
      "unit": {{unit}}
    }
  }
}

@jpountz
Copy link
Contributor

jpountz commented Jul 19, 2021

I'm downgrading to team-discuss for the Search area to discuss whether to move forward with @tvernum + @axw 's proposal of templating.

@joegallo
Copy link
Contributor

I'm removing the team-discuss label from some older Team:Data Management issues -- we've had plenty of time to discuss them, but we haven't, so the label isn't serving its purpose. Feel free to delete this comment and/or re-add the team-discuss label.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@felixbarny
Copy link
Member

When working on #104037, this issue came up again. It is particularly relevant for dynamically mapping the unit for OTel metrics to Elasticsearch mapping metadata.

In addition to @axw's proposal (#72536 (comment)), I'd like to propose another alternative that is based around encoding the unit in the field path and then reading out the unit from the field name in the meta.unit field of the metric itself.

We could store metrics in a metrics.<type>.<unit>.* namespace, for example metrics.counter.ms.my_counter.

With a dynamic template like this, we could then read out the ms part and attach it to unit.meta for the counter:

{
  "mappings": {
    "dynamic_templates": [
      {
        "counter": {
          "path_match": "metrics.counter.*.*",
          "mapping": {
            "type": "{dynamic_type}",
            "time_series_metric": "counter",
            "meta": {
              "unit": "{parent.name}"
            }
          }
        }
      }
    ]
  }
}

This introduces a new {parent.name} template variable for dynamic templates, in addition to the existing ones {name} and {dynamic_type}. Similarly to {name}, it reads out the name of a mapper. But in this case, it reads out the name of the parent mapper, which holds the information about the duration.

To make sure that you can still just do aggregations on my_counter, we could map the parent object that encodes information about the unit (for example metrics.counter.ms) to the passthrough field type (see #103648). Maybe we'd also want to introduce a way to somehow hide the physical name metrics.counter.ms.my_counter and only expose my_counter in the field caps API.

The dynamic mapping for the unit field could look like the following:

{
  "mappings": {
    "dynamic_templates": [
      {
        "units": {
          "path_match": "metrics.*.*",
          "path_unmatch": "metrics.*.*.*",
          "match_mapping_type": "object",
          "mapping": {
            "subobjects": false,
            "type": "passthrough"
          }
        }
      }
    ]
  }
}

@felixbarny
Copy link
Member

felixbarny commented Jan 11, 2024

I had a chat about the two proposals with the @elastic/es-storage-engine team.

@kkrik-es felt like that we should maybe defer support for dynamically specifying the unit for now and rather build a dedicated metrics intake API in Elasticsearch first, possibly OTLP. This is not the only reason why a dedicated metrics API would make sense but it would probably also make it easier to support units compared to building something that relies on _bulk.

We've also discussed whether changing the internal structure of how we store metrics may be considered a breaking change. For example, if we store metric in metrics.<metric_name> or metrics.<metric_type>.<metric_name> now but change that to metrics.<metric_type>.<unit>.<metric_name> later on. Ideally, such changes would not constitute a breaking change but as we're exposing the object structure in _source and via ingest pipelines, users may depend on the structure of the documents.

Maybe we shouldn't allow any features for OTel metrics that rely on the physical structure of the ES documents (such as ingest processing, and exposing _source) and instead rely on OTLP as the stable interface. But that would imply a big shift to what we've done so far.

If we consider structural changes to be a breaking change, there's some urgency to get the mapping right soon.

@axw
Copy link
Member

axw commented Jan 16, 2024

@kkrik-es felt like that we should maybe defer support for dynamically specifying the unit for now and rather build a dedicated metrics intake API in Elasticsearch first, possibly OTLP. This is not the only reason why a dedicated metrics API would make sense but it would probably also make it easier to support units compared to building something that relies on _bulk.

++ If I were to choose, I'd just implement OTLP support. Otherwise we'll still need to adapt OTel metrics to whatever we build -- so unless there's some additional metric features that don't exist in OTel, I don't see the point.

Maybe we shouldn't allow any features for OTel metrics that rely on the physical structure of the ES documents (such as ingest processing, and exposing _source) and instead rely on OTLP as the stable interface. But that would imply a big shift to what we've done so far.

++ A big shift indeed, and potentially far-reaching. e.g. it could also cover removing the need for _id by not exposing metric documents at all, and removing the ability to expose individual data points -- so in essence all searches over metrics/TSDS would be aggregations over time. That could allow things like automatic transformation of delta to cumulative metrics, and rolling up metrics at arbitrary points in time (e.g. at ingest, segment merge, or query time).

@felixbarny
Copy link
Member

it could also cover removing the need for _id by not exposing metric documents at all, and removing the ability to expose individual data points -- so in essence all searches over metrics/TSDS would be aggregations over time.

I like that but it seems it would make hard to migrate to that because it breaks a lot of assumptions.

That could allow things like automatic transformation of delta to cumulative metrics, and rolling up metrics at arbitrary points in time (e.g. at ingest, segment merge, or query time).

Why can't we do that now and why does not exposing individual data points enable us to do that?

@axw
Copy link
Member

axw commented Jan 17, 2024

Why can't we do that now and why does not exposing individual data points enable us to do that?

It comes down to user expectations: if you expect to be able to get out of ES exactly what you put in, then it precludes the kind of automatic transformations I'm thinking of. If on the other hand you are happy to record a data point in a time series, and only ever get back aggregations with some bounded granularity, then it's fine. It wouldn't really matter if the storage is cumulative, delta, rolled up at ingest time, etc., as it would be hidden from the user.

@felixbarny
Copy link
Member

I agree that we should be able to do that but I don't think we're blocked to do that now. I'd argue that with rollups, we've kind of broken the glass already. Configuring rollups is currently a conscious choice made by the user. However, I don't think we'd be blocked from enabling rollups by default in integrations, for example.

@felixbarny
Copy link
Member

felixbarny commented Feb 22, 2024

Another idea that came to mind is that we could allow accessing field values of the document via template variables. We could then store the unit in a top-level unit field and access it like this in a dynamic template:

{
  "counter": {
    "mapping": {
      "type": "{dynamic_type}",
      "time_series_metric": "counter",
      "meta": {
        "unit": "{doc[unit]}"
      }
    }
  }
}

@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
@javanna javanna removed the Team:Search Meta label for search team label Jul 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Data Management Meta label for data/management team Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

9 participants