-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AsyncCounter and AsyncUpDownCounter expected callback value changed in v0.32.0 #3278
Comments
The new SDK follows the OpenTelemetry specification in defining the default temporality to be cumulative for all instrument kinds. Setting a I'm still looking through the specification and other implementations to see if this is the expected behavior. |
It is not about For example, the following app reports package main
import (
"context"
"os"
"os/signal"
"syscall"
"time"
"go.opentelemetry.io/otel/exporters/stdout/stdoutmetric"
"go.opentelemetry.io/otel/metric/global"
"go.opentelemetry.io/otel/metric/instrument"
"go.opentelemetry.io/otel/sdk/metric"
)
var meter = global.MeterProvider().Meter("app_or_package_name")
func main() {
ctx := context.Background()
exp, err := stdoutmetric.New()
if err != nil {
panic(err)
}
reader := metric.NewPeriodicReader(
exp,
metric.WithInterval(time.Second),
)
provider := metric.NewMeterProvider(
metric.WithReader(reader),
)
global.SetMeterProvider(provider)
go upDownCounterObserver(ctx)
ch := make(chan os.Signal, 3)
signal.Notify(ch, syscall.SIGINT, syscall.SIGQUIT, syscall.SIGTERM)
<-ch
}
func upDownCounterObserver(ctx context.Context) {
counter, err := meter.AsyncInt64().UpDownCounter(
"some.prefix.up_down_counter_async",
)
if err != nil {
panic(err)
}
if err := meter.RegisterCallback(
[]instrument.Asynchronous{
counter,
},
func(ctx context.Context) {
counter.Observe(ctx, 3)
},
); err != nil {
panic(err)
}
} |
Setting the temporatlity selector like mentioned does indeed produce your requested behavior: package main
import (
"context"
"os"
"os/signal"
"syscall"
"time"
"go.opentelemetry.io/otel/exporters/stdout/stdoutmetric"
"go.opentelemetry.io/otel/metric/global"
"go.opentelemetry.io/otel/metric/instrument"
"go.opentelemetry.io/otel/sdk/metric"
"go.opentelemetry.io/otel/sdk/metric/metricdata"
"go.opentelemetry.io/otel/sdk/metric/view"
)
var meter = global.MeterProvider().Meter("app_or_package_name")
func delta(view.InstrumentKind) metricdata.Temporality {
return metricdata.DeltaTemporality
}
func main() {
ctx := context.Background()
exp, err := stdoutmetric.New()
if err != nil {
panic(err)
}
reader := metric.NewPeriodicReader(
exp,
metric.WithInterval(time.Second),
metric.WithTemporalitySelector(delta),
)
provider := metric.NewMeterProvider(
metric.WithReader(reader),
)
global.SetMeterProvider(provider)
go upDownCounterObserver(ctx)
ch := make(chan os.Signal, 3)
signal.Notify(ch, syscall.SIGINT, syscall.SIGQUIT, syscall.SIGTERM)
<-ch
}
func upDownCounterObserver(ctx context.Context) {
counter, err := meter.AsyncInt64().UpDownCounter(
"some.prefix.up_down_counter_async",
)
if err != nil {
panic(err)
}
if err := meter.RegisterCallback(
[]instrument.Asynchronous{
counter,
},
func(ctx context.Context) {
counter.Observe(ctx, 3)
},
); err != nil {
panic(err)
}
} go 1.18
require (
go.opentelemetry.io/otel/exporters/stdout/stdoutmetric v0.32.3
go.opentelemetry.io/otel/metric v0.32.3
go.opentelemetry.io/otel/sdk/metric v0.32.3
)
require (
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/go-logr/logr v1.2.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
go.opentelemetry.io/otel v1.11.0 // indirect
go.opentelemetry.io/otel/sdk v1.11.0 // indirect
go.opentelemetry.io/otel/trace v1.11.0 // indirect
golang.org/x/sys v0.0.0-20220919091848-fb04ddd9f9c8 // indirect
)
|
Well, if you mean that is how And it is not like I am requesting a new behavior. That is how it worked until v0.32.0. And right now async counters are broken. |
The v0.31 and prior SDK were correct. @MrAlias this has the appearance of a bug -- where the asynchronous form of aggregator should start from scratch (i.e., 0) on each collection round so that the cumulative output matches the instrumented value (if filtering, the sum of instrumented values) and the delta output is the difference as described in the API spec. Looking for your input on how the SDK specification could be improved to help clarify this, thanks. |
Thanks! This is the thing I was looking for. If we wanted to make this clearer in the specification, I was initially looking for this in the Asynchronous Counter operations section. Having this defined there would be helpful, but I don't think its critical. |
@MrAlias this issue was only fixed for cumulative temporality. Delta is still broken.
package main
import (
"context"
"os"
"os/signal"
"syscall"
"time"
"go.opentelemetry.io/otel/exporters/stdout/stdoutmetric"
"go.opentelemetry.io/otel/metric/global"
"go.opentelemetry.io/otel/metric/instrument"
"go.opentelemetry.io/otel/sdk/metric"
"go.opentelemetry.io/otel/sdk/metric/metricdata"
"go.opentelemetry.io/otel/sdk/metric/view"
)
var meter = global.MeterProvider().Meter("app_or_package_name")
func main() {
ctx := context.Background()
exp, err := stdoutmetric.New()
if err != nil {
panic(err)
}
reader := metric.NewPeriodicReader(
exp,
metric.WithInterval(time.Second),
metric.WithTemporalitySelector(statelessTemporalitySelector),
)
provider := metric.NewMeterProvider(
metric.WithReader(reader),
)
global.SetMeterProvider(provider)
go upDownCounterObserver(ctx)
ch := make(chan os.Signal, 3)
signal.Notify(ch, syscall.SIGINT, syscall.SIGQUIT, syscall.SIGTERM)
<-ch
}
func upDownCounterObserver(ctx context.Context) {
counter, err := meter.AsyncInt64().UpDownCounter(
"some.prefix.up_down_counter_async",
)
if err != nil {
panic(err)
}
if err := meter.RegisterCallback(
[]instrument.Asynchronous{
counter,
},
func(ctx context.Context) {
counter.Observe(ctx, 3)
},
); err != nil {
panic(err)
}
}
func statelessTemporalitySelector(kind view.InstrumentKind) metricdata.Temporality {
return metricdata.DeltaTemporality
} |
@vmihailenco This code snippet:
creates confusion with me because the implementation proposed for "stateless" in the Lightstep metrics SDK (and the otel specification) is that "stateless" means delta for synchronous instruments while it means cumulative for asynchronous instruments. Otherwise, using delta temporality for an asynchronous instrument requires state in the SDK. (Note: I personally argued that supporting delta temporality for asynchronous counters is unnecessary complexity, but several vendors (New Relic and DynaTrace) convinced us otherwise. That said, I think you your expectation is correct: configured for delta temporality, an asynchronous counter that reports a constant value should appear as a sequence like [3 0 0 0 0] as you suggest. |
Your expectation does not match my understanding of the specification. Each cycle your callback is reporting
|
The word "absolute" is potentially misleading--it's been debated in the past and we left that word in place, but other words we considered are "current" and "cumulative" and "total". Another way to describe this is that the callback reports the value that a cumulative-temporality reader will see. For asynchronous counters, a delta temporality reader should see the difference between successive "absolute" values. |
Ah, gotcha! I think I see now. I had understood it as the user will tell you the temporality of the value that is provided. However, it is that the instrumentation will always record a cumulative value and if a user overrides the temporality to be delta the exported data needs to be converted delta. I'll look into a PR to address this. Also, I'm going to take another look at the other implementation. I think I either misinterpreted their code or they are doing what is implemented by #3350. |
Confusingly, 1000 PRs prior I removed the I did that because I thought the specification would not require cumulative-to-delta translation; however the specification reversed itself on that topic. In my newer implementation, there is a |
@jmacd Is this also the way |
Yes, that's correct. @MrAlias I'd like your help pinpointing where the specification can improve--although now that we're on the topic, when faced with questions in this territory @reyang responded with supplementary guidelines. Here's a section on the asynchronous counter, delta temporality case: https://opentelemetry.io/docs/reference/specification/metrics/supplementary-guidelines/#asynchronous-example-delta-temporality |
I think having normative statement defining the required behavior in the specification would be helpful. For example, in the SDK definition of an asynchronous counter, having a section on the
In the top level definition it should also mention that the asynchronous counter types have restriction on the aggregations they support. Possiblly:
|
There is a note in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#asynchronous-counter "Note: Unlike Counter.Add() which takes the increment/delta value, the callback function reports the absolute value of the counter. To determine the reported rate the counter is changing, the difference between successive measurements is used." I feel "the passed value MUST be interpreted as a cumulative sum" is asking too much, since one could decide to interpret it as Gauge (which is not cumulative) as well. |
For Uptrace/ClickHouse I need delta temp whenever possible, but I guess I should just avoid using func temporalitySelector(kind view.InstrumentKind) metricdata.Temporality {
switch kind {
case view.SyncCounter, view.AsyncCounter, view.SyncHistogram:
return metricdata.DeltaTemporality
default:
return metricdata.CumulativeTemporality
}
} In Python, they have |
I have verified #3398 to produce the expected output |
Before 0.32, the value in a callback function was expected to be an absolute value, but since 0.32 it must be a delta or metrics are incorrect:
Is that intentional? It is not documented anywhere and not mentioned in the changelog. Also, runtimemetrics still uses the old way to report observable values.
The text was updated successfully, but these errors were encountered: