-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Resource Sharing Across Exporters with Arc Implementation #1526
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1526 +/- ##
=====================================
Coverage 63.2% 63.3%
=====================================
Files 144 144
Lines 20299 20315 +16
=====================================
+ Hits 12847 12863 +16
Misses 7452 7452 ☔ View full report in Codecov by Sentry. |
Can you also add benchmarks to logs, so we can how much this would improve the common use case (like 5-10 Resource attributes)? |
Just to add, the stress test uses NoOpLogProcessor, so the performance is measured for creating the LogData structure for each event (which involves copying of resources in existing design), and not processing by the exporter.
With PR, the throughput doesn't change as we increase the number of attributes, as we are just doing shared resource increment and no real data copy. However, with existing code, the throughput reduces drastically as we increase the resource attributes. 5 resource attributes: main: PR: 10 resource attributes: PR: 100 resource attributes: PR:
Modified the benchmark to include attributes in common scenario: KeyValue::new("service.name", "my-service"),
KeyValue::new("service.version", "1.0.0"),
KeyValue::new("service.environment", "production"),
KeyValue::new("service.instance.id", "1234"), The results: simple-log/no-context time: [254.13 ns 255.45 ns 257.16 ns] PR: simple-log/no-context time: [167.73 ns 168.10 ns 168.54 ns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent gains
resource.attrs.insert(key, value); | ||
// This call ensures that if the Arc is not uniquely owned, | ||
// the data is cloned before modification, preserving safety. | ||
// If the Arc is uniquely owned, it simply returns a mutable reference to the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a constructor, at what point would the inner not be uniquely owned?
That being said is there a reason we shouldn't put the make_mut
before the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would be safe to move make_mut
before loop at this point. The inner
is ref-incremented only during LogData/SpanData creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the work again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great performance improvements!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Before merge, I'd suggest to add changelog entry to let users know of the perf gains.
Also, please update PR description with before/after numbers.
Changes
This PR introduces a refactor of the
Resource
structure to improve the efficiency of resource sharing across exporters for owned resource attributes. By encapsulating the Resource data within aResourceInner
struct and wrapping it with an Arc (Atomic Reference Counting), we eliminate the need for deep copies of resource data, thereby reducing memory overhead and improving performance.Existing
PR
The stress test results (default has 4 resource attributes) for the main and PR branch. The results are more pronounced if the resource is added with
LoggerProviderBuilder::with_resource
andTracerProviderBuilder::with_resource
methods as indicated here, as the resource created this way are owned, and can't be optimized by the compiler.Spans:
main
Number of threads: 10
Throughput: 8,584,000 iterations/sec
Throughput: 8,560,000 iterations/sec
Throughput: 8,793,800 iterations/sec
Throughput: 8,465,200 iterations/sec
PR
Number of threads: 10
Throughput: 9,059,800 iterations/sec
Throughput: 8,719,200 iterations/sec
Throughput: 8,831,400 iterations/sec
Throughput: 8,767,800 iterations/sec
logs
main
Number of threads: 10
Throughput: 23,298,600 iterations/sec
Throughput: 21,358,600 iterations/sec
Throughput: 21,797,000 iterations/sec
PR
Number of threads: 10
Throughput: 25,054,600 iterations/sec
Throughput: 24,265,000 iterations/sec
Throughput: 24,576,600 iterations/sec
Throughput: 23,008,000 iterations/sec