Status | |
---|---|
Stability | beta: traces, metrics, logs |
Distributions | contrib |
Issues | |
Code Owners | @rnishtala-sumo |
This processor re-associates spans, log records and metric datapoints to a Resource that matches with the specified attributes. As a result, all spans, log records or metric datapoints with the same values for the specified attributes are "grouped" under the same Resource.
Typical use cases:
- extract resources from "flat" data formats, such as Fluentbit logs or Prometheus metrics
- associate Prometheus metrics to a Resource that describes the relevant host, based on label present on all metrics
- optimize data packaging by extracting common attributes
- compacting multiple records that share the same Resource and InstrumentationLibrary attributes but are under multiple ResourceSpans/ResourceMetrics/ResourceLogs, into a single ResourceSpans/ResourceMetrics/ResourceLogs (when empty list of keys is being provided). This might happen e.g. when groupbytrace processor is being used or data comes in multiple requests. By compacting data, it takes less memory, is more efficiently processed, serialized and the number of export requests is reduced.
It is recommended to use the groupbyattrs
processor together with batch processor, as a consecutive step, as this will reduce the fragmentation of data (by grouping records together under matching Resource/Instrumentation Library)
Consider the below metrics, all originally associated to the same Resource:
Resource {host.name="localhost",source="prom"}
Metric "gauge-1" (GAUGE)
DataPoint {host.name="host-A",id="eth0"}
DataPoint {host.name="host-A",id="eth0"}
DataPoint {host.name="host-B",id="eth0"}
Metric "gauge-1" (GAUGE) // Identical to previous Metric
DataPoint {host.name="host-A",id="eth0"}
DataPoint {host.name="host-A",id="eth0"}
DataPoint {host.name="host-B",id="eth0"}
Metric "mixed-type" (GAUGE)
DataPoint {host.name="host-A",id="eth0"}
DataPoint {host.name="host-A",id="eth0"}
DataPoint {host.name="host-B",id="eth0"}
Metric "mixed-type" (SUM)
DataPoint {host.name="host-A",id="eth0"}
DataPoint {host.name="host-A",id="eth0"}
Metric "dont-move" (Gauge)
DataPoint {id="eth0"}
With the below configuration, the groupbyattrs will re-associate the metrics with either host-A
or host-B
, based on the value of the host.name
attribute.
processors:
groupbyattrs:
keys:
- host.name
The output of the processor will therefore be:
Resource {host.name="localhost",source="prom"}
Metric "dont-move" (Gauge)
DataPoint {id="eth0"}
Resource {host.name="host-A",source="prom"}
Metric "gauge-1"
DataPoint {id="eth0"}
DataPoint {id="eth0"}
DataPoint {id="eth0"}
DataPoint {id="eth0"}
Metric "mixed-type" (GAUGE)
DataPoint {id="eth0"}
DataPoint {id="eth0"}
Metric "mixed-type" (SUM)
DataPoint {id="eth0"}
DataPoint {id="eth0"}
Resource {host.name="host-B",source="prom"}
Metric "gauge-1"
DataPoint {id="eth0"}
DataPoint {id="eth0"}
Metric "mixed-type" (GAUGE)
DataPoint {id="eth0"}
Notes:
- The DataPoints for the
gauge-1
(GAUGE) metric were originally split under 2 Metric instances and have been merged in the output - The DataPoints of the
mixed-type
(GAUGE) andmixed-type
(SUM) metrics have not been merged under the same Metric, because their DataType is different - The
dont-move
metric DataPoints don't have ahost.name
attribute and therefore remained under the original Resource - The new Resources inherited the attributes from the original Resource (
source="prom"
), plus the specified attributes from the processed metrics (host.name="host-A"
orhost.name="host-B"
) - The specified "grouping" attributes that are set on the new Resources are also removed from the metric DataPoints
- While not shown in the above example, the processor also merges collections of records under matching InstrumentationLibrary
In some cases, the data might come in single requests to the collector or become fragmented due to use of groupbytrace processor. Even after batching there might be multiple duplicated ResourceSpans/ResourceLogs/ResourceMetrics objects, which leads to additional memory consumption, increased processing costs, inefficient serialization and increase of the export requests. As a remedy, groupbyattrs
processor might be used to compact the data with matching Resource and InstrumentationLibrary properties.
For example, consider the following input:
Resource {host.name="localhost"}
InstrumentationLibrary {name="MyLibrary"}
Spans
Span {span_id=1, ...}
InstrumentationLibrary {name="OtherLibrary"}
Spans
Span {span_id=2, ...}
Resource {host.name="localhost"}
InstrumentationLibrary {name="MyLibrary"}
Spans
Span {span_id=3, ...}
Resource {host.name="localhost"}
InstrumentationLibrary {name="MyLibrary"}
Spans
Span {span_id=4, ...}
Resource {host.name="otherhost"}
InstrumentationLibrary {name="MyLibrary"}
Spans
Span {span_id=5, ...}
With the below configuration, the groupbyattrs will re-associate the spans with matching Resource and InstrumentationLibrary.
processors:
batch:
groupbyattrs:
pipelines:
traces:
processors: [batch, groupbyattrs/grouping]
...
The output of the processor will therefore be:
Resource {host.name="localhost"}
InstrumentationLibrary {name="MyLibrary"}
Spans
Span {span_id=1, ...}
Span {span_id=3, ...}
Span {span_id=4, ...}
InstrumentationLibrary {name="OtherLibrary"}
Spans
Span {span_id=2, ...}
Resource {host.name="otherhost"}
InstrumentationLibrary {name="MyLibrary"}
Spans
Span {span_id=5, ...}
The configuration is very simple, as you only need to specify an array of attribute keys that will be used to "group" spans, log records or metric data points together, as in the below example:
processors:
groupbyattrs:
keys:
- foo
- bar
The keys
property describes which attribute keys will be considered for grouping:
- If the processed span, log record and metric data point has at least one of the specified attributes key, it will be moved to a Resource with the same value for these attributes. The Resource will be created if none exists with the same attributes.
- If none of the specified attributes key is present in the processed span, log record or metric data point, it remains associated to the same Resource (no change), with multiple instances of the same Resource still compacted.
Please refer to:
- config.go for the config spec
- config.yaml for detailed examples on using the processor
The following internal metrics are recorded by this processor:
Metric | Description |
---|---|
num_grouped_spans |
the number of spans that had attributes grouped |
num_non_grouped_spans |
the number of spans that did not have attributes grouped |
span_groups |
distribution of groups extracted for spans |
num_grouped_logs |
number of logs that had attributes grouped |
num_non_grouped_logs |
number of logs that did not have attributes grouped |
log_groups |
distribution of groups extracted for logs |
num_grouped_metrics |
number of metrics that had attributes grouped |
num_non_grouped_metrics |
number of metrics that did not have attributes grouped |
metric_groups |
distribution of groups extracted for metrics |