-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/groupbyattrsprocessor] allow empty keys for compaction #7793
[processor/groupbyattrsprocessor] allow empty keys for compaction #7793
Conversation
8b5ba0c
to
a715bc9
Compare
I have 2 insights:
|
I have done some tests with adding groupbyattrs with empty keys for jaeger exporter On the following screenshots 2 versions are compared: first window without groupbyattrs, and second with this processor. The benchmark was conducted via tracegen docker-compose, adding this diff to current branch
I added groupbytrace to split (possibly) sent batches by tracegen and make all |
a715bc9
to
f988c4b
Compare
@pkositsyn @jpkrohling I updated README, added some test cases and examples. Let me know what do you think |
1f71a94
to
6714230
Compare
I'll add this to my review queue and should provide some feedback soon. |
6714230
to
ecde4ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, basically, there wasn't a change in the processing itself, only on the constraint checking right? I like it :-)
Because this feels like an esoteric feature, I recommend documenting it well, including when and how it should be used and when/how it should not be used.
@@ -83,6 +86,64 @@ Notes: | |||
* The specified "grouping" attributes that are set on the new *Resources* are also **removed** from the metric *DataPoints* | |||
* While not shown in the above example, the processor also merges collections of records under matching InstrumentationLibrary | |||
|
|||
### Compaction | |||
|
|||
In some cases, the data might come in single requests to the collector and even after batching there might be multiple duplicated ResourceSpans/ResourceLogs/ResourceMetrics objects, which leads to additional memory consumption and increased processing costs. As a remedy, `groupbyattrs` processor might be used to compact the data which has matching Resource and InstrumentationLibrary properties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the appalling aspect of this feature is to get better performance while sending data out. Without calling this out explicitly, people might not realize that the advantages are good on the transport side as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this will reduce the size of the message for many of the formats (e.g. OTLP). In some cases (Jaeger) this will also reduce the number of RPC calls (since the Jaeger model maps one batch to one ResourceSpans) - if I got it right. Perhaps it would be worth calling out in jaegerexporter that groupbyattrs
is recommended (or worth considering)?
c7b691c
to
6600874
Compare
Benchmark results (compacting 100 spans in different layouts):
|
|
||
## Example | ||
It is recommended to use the `groupbyattrs` processor together with [batch](https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor) processor, as a consecutive step, as this will reduce the fragmentation of data (by grouping records together under matching Resource/Instrumentation Library) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpkrohling @pkositsyn I put the note on batch
processor in this section, also updated the wording and included it in the examples. If you have any suggestions how to express this better, you are more than welcome :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
…used for compaction
6600874
to
0a95f37
Compare
Thank you @jpkrohling! Just rebased |
Description:
Extension of
groupbyattrsprocessor
for compacting data when spread across multiple ResourceSpans/ResourceMetrics/ResourceLogs with matching Resource and InstrumentationLibraryLink to tracking Issue: #2265 in core
Testing: Several unit tests added
Documentation: Docs clarified on usage of empty keys. Provided example on compaction
Benchmark results (compacting 100 spans in different layouts):