-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow exporters to influence Aggregation #3762
Allow exporters to influence Aggregation #3762
Conversation
- Remove all state from Aggregator but for exemplar reservoir factory. - Reosurce/IL passed from MP/MeterSharedState - MetricDescriptor already stored on MetricStorage. - Give Aggregators explicit "diff" and "merge" operations, split Sum aggregator - Update Storage to use explicit diff/merge in appropriate locations. - Aggregtors are now passed AggregationTemporality in as a parameter - Update MetricStorage to take requested aggregation tempoarlity as parameter, always storing data for either. Note: Things compile, aggregator + storage tests pass, but e2e is broken until aggregation temporality is appropriatley wired through to readers.
- Add supported/preferred temporality to export interfaces - Update tracking of collection information through internal packages to pass one data class instead of many individual components.
- Update gauge tests to account for start time (as recommended in spec). - Allow configured temporality to be null for views - Propogate configured temporality from View API through to storage
Codecov Report
@@ Coverage Diff @@
## main #3762 +/- ##
============================================
+ Coverage 89.15% 89.30% +0.15%
- Complexity 3950 3957 +7
============================================
Files 476 473 -3
Lines 12324 12309 -15
Branches 1206 1207 +1
============================================
+ Hits 10987 10993 +6
+ Misses 924 909 -15
+ Partials 413 407 -6
Continue to review full report at Codecov.
|
- Add cumulative+delta tests - Fix error messages from sum data assert
This interface was fully replaced by Aggregation, and offerred little benefit post-refactor.
default AggregationTemporality getPreferedTemporality() { | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have thought that this method would take an instrument type as a parameter, since a given exporter might want this to vary by instrument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Do you have an example of this (A backend/exporter that needs this level of sophistication)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought Prometheus wanted cumulative for everything but histograms, but I might very well be making that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they need cumulative for everything. (Writing that spec now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may be thinking of Summary
? However in that case it's not DELTA, it's actually "last 10 minutes", which could be longer than the DELTA export interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yeah, Summary is probably what I'm thinking of. But, I could imagine an exporter might want to impose a full view or set of views, in order to limit cardinality, etc. What would you think of handing the exporter some sort of Builder thingee that it could use to fully customize the output that is needed for a particular backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence.
I think for the first version, I wish we had decided to go with ALL delta or ALL cumulative and let OTLP pick one or the other via config. Now we have an expectation of flexibility that I'm not sure where it truly belongs.
My thinking is that I expect the VIEW SDK to eventually have a "consistent across SDK" configuration file, something like:
views:
select:
instrumentType: counter
instrumentationLibrary: grpc
name: request.bytes
aggregation:
type: Histogram
attributes:
drop: [route]
I'm not so certain we'll be exposing that kind of configuration on a per-exporter level. While I think the choice of aggregation temporality really is an exporter decision, I find myself torn in that I think a consistent View configuration file to be more likely. (And what I suggest above is a really ugly version of what I think we'll eventually have, something halfway between prometheus-rewrite rules and otel-collector processor config).
...tlp/metrics/src/main/java/io/opentelemetry/exporter/otlp/metrics/OtlpGrpcMetricExporter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the motivation for this, since right now you could configure the aggregations via the view API that are incompatible with the prometheus exporter.
But I don't quite understand why both the view API and the exporters get a say in the aggregation. Should aggregation temporality configuration be removed from the view API?
...etrics/src/main/java/io/opentelemetry/exporter/otlp/http/metrics/OtlpHttpMetricExporter.java
Outdated
Show resolved
Hide resolved
sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/export/MetricExporter.java
Outdated
Show resolved
Hide resolved
sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/export/MetricReader.java
Outdated
Show resolved
Hide resolved
sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/aggregator/CountAggregator.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/opentelemetry/sdk/metrics/internal/aggregator/DoubleHistogramAggregator.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/opentelemetry/sdk/metrics/internal/aggregator/DoubleHistogramAggregator.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/opentelemetry/sdk/metrics/internal/aggregator/DoubleLastValueAggregator.java
Outdated
Show resolved
Hide resolved
sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/DefaultSdkMeterProvider.java
Outdated
Show resolved
Hide resolved
...rics/src/main/java/io/opentelemetry/sdk/metrics/internal/aggregator/DoubleSumAggregator.java
Outdated
Show resolved
Hide resolved
...metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/state/TemporalMetricStorage.java
Show resolved
Hide resolved
Yes, not a huge fan of this right now, but I think one place needs to win in the event of conflict, and right now that's the View API.
I think (as you see in this PR) we should do our best to avoid having anyone force an aggregation temporality. This is a step towards trying to convince people to never specify it in the View API. If @jkwatson's suggestion of having metric-level configuration for aggregation temporality is taken, then absolutely, I'd remove it as a parameter to view aggregation. I don't think we have quite enough data behind mixed cumulative/delta use cases to remove all fine grained controls, but it's something we should consider. Again if you look at the goals listed in this PR, primarily, I want to stop 99% of usage of aggregation temporality in the Views API because exporter-based configuration should handle all of our known use cases. |
Also for both @jkwatson and @jack-berg would be good for your thoughts on this PR open-telemetry/opentelemetry-specification#2032 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally makes sense to me, though it would indeed be great if the knob could be removed from the Views then unless there is user demand for that
/** Returns the preferred temporality for metrics. */ | ||
@Nullable | ||
default AggregationTemporality getPreferredTemporality() { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AggregatedTemporality.CUMULATIVE? Just kidding (mostly) :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha!
...rc/main/java/io/opentelemetry/sdk/metrics/internal/aggregator/DoubleHistogramAggregator.java
Outdated
Show resolved
Hide resolved
...etrics/src/main/java/io/opentelemetry/exporter/otlp/http/metrics/OtlpHttpMetricExporter.java
Show resolved
Hide resolved
...esting/src/main/java/io/opentelemetry/sdk/testing/assertj/metrics/AbstractSumDataAssert.java
Outdated
Show resolved
Hide resolved
...esting/src/main/java/io/opentelemetry/sdk/testing/assertj/metrics/AbstractSumDataAssert.java
Outdated
Show resolved
Hide resolved
...esting/src/main/java/io/opentelemetry/sdk/testing/assertj/metrics/AbstractSumDataAssert.java
Outdated
Show resolved
Hide resolved
@@ -17,6 +20,17 @@ | |||
*/ | |||
public interface MetricExporter { | |||
|
|||
/** Returns the set of all supported temporalities for this exporter. */ | |||
default EnumSet<AggregationTemporality> getSupportedTemporality() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the purpose of this to fail if someone configures a view for an exporter that doesn't support the specified temporality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically.
However, the Metrics SiG Is removing AggregationTemporality from the View API, so the purpose here is that an exporter can "handle everything" by default.
AbstractSumAggregator(InstrumentDescriptor instrumentDescriptor) { | ||
this.isMonotonic = MetricDataUtils.isMonotonicInstrument(instrumentDescriptor); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems, on its face, to be a refactoring unrelated to the title in the PR. Am I reading this wrong? It would be easier to understand these PRs if they were a bit smaller and focussed. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I forgot to remove the AbstractSumAggregator
. I hear you about focused refactorings, promise to do a better job, going forward. IF you want I can try to pull back in the old code here and do the bug-fixing to histograms around monotonic instruments with the fix to this.
super( | ||
resource, instrumentationLibraryInfo, instrumentDescriptor, metricDescriptor, temporality); | ||
/** | ||
* Constructs a histogram aggregator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy pasta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, yeah good catch!
...metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/state/TemporalMetricStorage.java
Outdated
Show resolved
Hide resolved
...metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/state/TemporalMetricStorage.java
Outdated
Show resolved
Hide resolved
if (temporality == AggregationTemporality.DELTA && !isSynchronous) { | ||
MetricStorageUtils.diffInPlace(last.getAccumlation(), currentAccumulation, aggregator); | ||
result = last.getAccumlation(); | ||
} else if (temporality == AggregationTemporality.CUMULATIVE && isSynchronous) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are definitely a few combinations missing here. If that's intentional, please add a comment as to why the others aren't needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, and will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a lot of docs (here and below) so PTAL.
|
||
@Override | ||
public AggregationTemporality getPreferredTemporality() { | ||
return AggregationTemporality.CUMULATIVE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason for this choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a strong reason, it would pick this option even if we didn't prefer it. This is here to help reinforce that to folks attempting to use this for debugging.
...s/src/test/java/io/opentelemetry/sdk/metrics/testing/InMemoryMetricReaderCumulativeTest.java
Outdated
Show resolved
Hide resolved
Co-authored-by: John Watson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@jack-berg Do you have anything left that needs to be resolved here? |
👍 🚢 |
This an implementation of open-telemetry/opentelemetry-specification#2013. This may guide changes in the overall design.
Sorry this is a bit larger than I'd like. I could tease out the Aggregator refactoring as a separate PR, but it's a bit odd without the motivation. LMK if you want that, also sending it out now for early review.
Goals
MetricExporter
andMetricReader
should be able to denote whatAggregationTemporality
they support and prefer. This aligns w/ the .NET implementation, and matches the above specification.Oddities
null
for "no temporality specified" on aggregators, and use this to denote when the reader-driven-aggregation-temporality will win. SeeTemporalityUtils
for detailsAggregatorFactory
was attempting to throw when an aggregator was configured improperly in the View API. This was broken by the newAggregation
public interface, so we moved the behavior forward.Cleanups/Shifts
Aggregator
API to simplify creating new ones.merge
anddiff
vs.merge
andisStateful
. Behavior is now more accurate overall.Resource
,InstrumentationLibraryInfo
andMetricDescriptor
now stored outside the aggregator and passed in when needed.AggregatorFactory
asAggregation
now serves that job.diff
vs.merge
and account for aggregation temporality.Aggregator
implementations almost don't need to care outside ofbuildMetric
Remaining Tasks
XYZInPlace
private methods to helper class