-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/transform] Efficiently interact with higher-scope contexts #14578
Comments
I like this, especially if we have a 1:1 mapping from statement to context in the config. So instead of grouping all metrics/traces/logs rule, maybe we do something like |
Funny enough thats the originally proposed syntax haha I need to think about what it would be like if the transform processor allowed stuff like: transform:
statements:
- from traces drop() where ...
- from resource set(...) where ...
- from datapoint set(...) where ...
- from metric drop() where ... The mixing of signals and allowing functions to work on the lowest data level and then go back up to a higher data level might be messy. |
Thinking about this some more, introducing the concept of the context directly into the grammar has some complex implications, but would significantly expand its capabilities. I think thats a longer-term goal, though, and will rely on some of the other inflight ottl PRs to being flushed out before we can get to this space. In the meantime, we can still make the specific contexts and update the transform processor to use them. This will allow for more interesting functions like |
Something we'll still have to solve (although it might be a transform processor issue or it might not be needed at all) is how to provide access to the Resource's There are some custom functions in the transform processor that rely on access to |
@bogdandrutu what do you think about contexts for SpanEvents and SpanLinks? Feels similar to Metrics and Datapoints, but at a smaller scale. Kind feels like any nested object in a slice could benefit from a context. |
@TylerHelmuth they may feel similar to that, but very rarely someone needs access to individual events/links. For datapoints there are backends that have this as a first class concept ("timeseries") and lots of changes may be worth doing at datapoint level. |
@bogdandrutu #7151 is at least one situation where a context specific to SpanEvents would be useful. Without a SpanEvents context we cannot get the same level of granularity in a condition statement that would |
Proposal for how the transformprocessor's config can take advantage of the new contexts: We continue to separate by traces, metrics, and logs and within each signal we allow specifying statements by context transform:
traces:
resourceStatements:
- set(attributes["host"], "new value")
instrumentationScopeStatements:
- set(name, "new value")
spanStatements:
- set(name, "new value")
spanEventStatements:
- set(name, "new value")
metrics:
resourceStatements:
- set(attributes["host"], "new value")
instrumentationScopeStatements:
- set(name, "new value")
metricStatements:
- set(name, "new value")
datapointStatements:
- set(attributes["name"], "new value")
logs:
resourceStatements:
- set(attributes["host"], "new value")
instrumentationScopeStatements:
- set(name, "new value")
logsStatements:
- set(body, "new value") Statements would not be required for each section. Statements would always be processed from "top down", so for metrics it would be Resource -> InstrumentationScope -> Metric -> Datapoints. The transform processor would continue to loop through telemetry, and at each appropriate step check for statements. func (p *Processor) ProcessMetrics(_ context.Context, td pmetric.Metrics) (pmetric.Metrics, error) {
for i := 0; i < td.ResourceMetrics().Len(); i++ {
rmetrics := td.ResourceMetrics().At(i)
// process resource statements here
for j := 0; j < rmetrics.ScopeMetrics().Len(); j++ {
smetrics := rmetrics.ScopeMetrics().At(j)
// process instrumentation scope statements here
metrics := smetrics.Metrics()
for k := 0; k < metrics.Len(); k++ {
metric := metrics.At(k)
// process metric statements here
// process datapoint here like we already do
switch metric.Type() {
case pmetric.MetricTypeSum:
p.handleNumberDataPoints(metric.Sum().DataPoints(), metrics.At(k), metrics, smetrics.Scope(), rmetrics.Resource())
case pmetric.MetricTypeGauge:
p.handleNumberDataPoints(metric.Gauge().DataPoints(), metrics.At(k), metrics, smetrics.Scope(), rmetrics.Resource())
case pmetric.MetricTypeHistogram:
p.handleHistogramDataPoints(metric.Histogram().DataPoints(), metrics.At(k), metrics, smetrics.Scope(), rmetrics.Resource())
case pmetric.MetricTypeExponentialHistogram:
p.handleExponetialHistogramDataPoints(metric.ExponentialHistogram().DataPoints(), metrics.At(k), metrics, smetrics.Scope(), rmetrics.Resource())
case pmetric.MetricTypeSummary:
p.handleSummaryDataPoints(metric.Summary().DataPoints(), metrics.At(k), metrics, smetrics.Scope(), rmetrics.Resource())
}
}
}
}
return td, nil
} We can also get efficient and check for the presence of statements in lower categories; if there aren't any more statements, we can return early and avoid looping through any more telemetry. Pros:
Cons:
|
@bogdandrutu I played around with an implementation for If we moved the context selection to be defined from the grammar, then we have to be able to create a parser that is independent from any context, as the context that it will use to parse statements will change. I think this is possible, but I think the impact is that Let me know if you have any ideas. |
I spoke with Tyler today at some length and came up with a syntax that has a fairly efficient implementation while still allowing for a sequence of operations at different layers of the context. It would look something like this:
Note that the outer layer of traces (and equivalently, metrics and logs) has a series of scopes. At each scope we can have a series of statements (one or more). At parse time, an ordered list of these is built, combining scope with a sequence of statements; the iteration at each scope only needs to go as deep as the scope (so the resource scope need not iterate the instrumentation level, the instrumentation doesn't need to touch spans, etc. The parser will bind the statements with the outer loop that iterates to the appropriate level. At runtime, this sequence of scopes and statements is iterated in sequence. This model allows for something like the above -- a span-level statement could set an attribute at the resource level, and a subsequent resource-level decision could be made to further modify or drop the entire trace. It also doesn't require any major refactoring of the systems that have already been built. @TylerHelmuth does this agree with what you understood from the conversation? |
Yes, I like this approach a lot. I think we could make the trace/metric/log sections named like |
Config format example proposed via #15381: transform:
trace_statements:
- context: trace
statements:
- set(name, "bear") where attributes["http.path"] == "/animal"
- keep_keys(attributes, ["http.method", "http.path"])
- context: resource
statements:
- set(attributes["name"], "bear")
metric_statements:
- context: datapoint
statements:
- set(metric.name, "bear") where attributes["http.path"] == "/animal"
- keep_keys(attributes, ["http.method", "http.path"])
- context: resource
statements:
- set(attributes["name"], "bear")
log_statements:
- context: log
statements:
- set(body, "bear") where attributes["http.path"] == "/animal"
- keep_keys(attributes, ["http.method", "http.path"])
- context: resource
statements:
- set(attributes["name"], "bear") Since the Parser needs to know its TransformContext before parsing a statement we can't include it in the statement itself, but we can allow grouping of statements by context within the config. In either situation the user must still know which context they want to use. |
Very much looking forward to what is enabled by this, and wanted to offer my support on the most recent developments and decisions. Looking forward to using it. |
Is your feature request related to a problem? Please describe.
At the moment, the transform processor only supports span, logrecord, and datapoint processing. This works for many use cases, but not all. There are times when transformation needs to occur on the Resource, InstrumentationScope, or Metric only, and not on the underlying telemetry.
Today this can be achieved, but not efficiently. For example, you can set a resource attribute associated with a Span, but the attribute will be set over and over again, once for each span associated with the resource.
Other use cases, like filtering, the transform processor cannot support.
Describe the solution you'd like
We need a way to improve the efficiency of the processor when interacting with these higher-scope fields. Here are a few ideas I currently have.
1. Add contexts in ottl for Resource, InstrumentationScope and Metric.
We can add new contexts that just interact with the exact telemetry. In the transform processor config we can have a section per context that it uses. We would then process statements from the top down. So for metrics we'd process resource statements, then instrumentation scope statements, then metric statements, and finally datapoint statements.
2. Update transform processor to be more intelligent with its statements
Technically the Traces, DataPoint, and Logs contexts have all the logic we need to access all the different telemetry. It is the transform processor that is forcing the "for each span" logic. The transform processor could be updated have different sections for the different hierarchies, and then process the statements from the top down, using Traces, DataPoints, and Logs contexts accordingly.
Either way the transform processor's config gets a little more complex. I think option 1 is probably more reusable for other components.
Describe alternatives you've considered
No response
Additional context
related issues:
The text was updated successfully, but these errors were encountered: