-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cardinality limits to the Lightstep metrics SDK #385
Conversation
Codecov ReportBase: 87.84% // Head: 87.76% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #385 +/- ##
==========================================
- Coverage 87.84% 87.76% -0.09%
==========================================
Files 74 74
Lines 4205 4218 +13
==========================================
+ Hits 3694 3702 +8
- Misses 438 442 +4
- Partials 73 74 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! one small comment typo.
Co-authored-by: Kristina Pathak <[email protected]>
The code reviewer in me wants to suggest this could be two PRs because the two limits are independent. |
Description: This is a prototype for open-telemetry/opentelemetry-specification#2960.
This adds both
InstrumentCardinalityLimit
andAggregatorCardinalityLimit
.The instrument-level limit is meant as a circuit-breaker and the aggregator-level limit is a meant to allow views to correct overflow problems. Both of these features use
otel.metrics.overflow=true
to indicate the loss of attributes due to overflow. Only the synchronous instrument code path usesInstrumentCardinalityLimit
in this PR, asynchronous do not, a low-priority issue to be solved separately. The problem is described in #410.The aggregator-level limit is applied after view-configured filters are applied, so is useful to correct overflow problems.
The aggregator-level is treated as a hard-limit, when configured the aggregator will produce no more than the configured number of aggregations. The instrument-level limit is considered a circuit-breaker. For delta-temporality outputs, this should be set larger than the aggregator-level limit (considering inactivity periods) sufficiently to ensure that instrument-level records are recycled fast enough for dynamic cardinality changes.
Note there is a special case for overflow treatment in the async-delta output path (in viewstate); the cumulative overflow value is calculated so that the delta is correct. This is also low-priority, but as this is a prototype for an OTel specification, I considered it and tested it.
Link to tracking Issue: open-telemetry/opentelemetry-specification#1891
Testing: Detailed new tests in syncstate, viewstate.
Documentation: Added.