Cache instruments so repeatedly creating identical instruments doesn't leak memory #4820

dashpole · 2024-01-10T20:28:33Z

Resolve #3527
Fixes open-telemetry/opentelemetry-go-contrib#4226.

This is one potential solution to the problem of creating multiple instances of an instrumentation library leaking memory. Alternatively, we could require users to cache the instrumentation library instance itself, as was done in googleapis/google-api-go-client#2329.

This works well for synchronous instruments, as making observations on the instrument from multiple instances of a library correctly aggregates across instances. But for async instruments, this is a bit more confusing. e.g. what should happen if I instantiate two instances of the runtime metrics library? Presumably the last observation should win?

This also only solves the problem of caching instruments themselves. The pattern of making multiple instances of an instrumentation library would still leak memory if each instance created and registered a new Callback. If RegisterCallback is used, the callback can be Unregistered(), but if the callback is passed as an option to the instrument, it will exist forever, which would also leak memory.

TODO:

Benchmarks
Cache callbacks as well

codecov · 2024-01-10T20:31:17Z

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (cef39a1) 82.3% compared to head (55829b3) 82.3%.

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #4820   +/-   ##
=====================================
  Coverage   82.3%   82.3%           
=====================================
  Files        226     226           
  Lines      18481   18557   +76     
=====================================
+ Hits       15222   15286   +64     
- Misses      2973    2983   +10     
- Partials     286     288    +2

Files	Coverage Δ
sdk/metric/cache.go	`100.0% <100.0%> (ø)`
sdk/metric/meter.go	`90.4% <81.6%> (-2.2%)`	⬇️

... and 1 file with indirect coverage changes

MrAlias · 2024-01-11T15:42:42Z

I think this looks like a decent proposal. Looking back through the record, we had originally tried to do this.

dashpole · 2024-01-11T15:44:47Z

Ah, cool. I'd definitely like feedback from @MadVikingGod, then

sdk/metric/meter.go

MadVikingGod · 2024-01-11T16:02:36Z

Ah, cool. I'd definitely like feedback from @MadVikingGod, then

That work was pulled out because of added complexity, trying to get something working, and concerns over memory freeing.

At first glance, this looks to add caching in a simple way. I would suggest putting a benchmark to ensure that it performs better when reused and doesn't affect single-use too much.

MrAlias · 2024-01-11T16:14:37Z

It isn't clear if the aggregators cache is still needed if we are caching instruments.

I think this is still going to be needed to ensure view modified instruments that result in the same aggregators get the same aggregators. But I haven't tested it.

MrAlias · 2024-01-11T16:17:09Z

The pattern of making multiple instances of an instrumentation library would still leak memory if each instance created and registered a new Callback.

This is a bit worrisome. Solving the issue to return the same instrument instance may lead users into thinking they can just make the same calls repeatedly.

Should we document that only first call using With*Callback will be honored? And users should use RegisterCallback after that?

dashpole · 2024-01-12T19:30:59Z

Looks like python logs a warning and ignores callbacks after the first: https://github.com/open-telemetry/opentelemetry-python/blob/975733c71473cddddd0859c6fcbd2b02405f7e12/opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/__init__.py#L171

MrAlias · 2024-01-18T18:59:48Z

This should resolve #3527

MrAlias · 2024-01-19T20:10:15Z

Looks like python logs a warning and ignores callbacks after the first: https://github.com/open-telemetry/opentelemetry-python/blob/975733c71473cddddd0859c6fcbd2b02405f7e12/opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/__init__.py#L171

+1 to this behavior being implemented here.

sdk/metric/cache.go

dashpole · 2024-01-20T00:58:04Z

This now ignores callbacks after the first.

For benchmarking, If I run the existing BenchmarkInstrumentCreation, it shows a substantial increase in performance because it is now returning the cached version instead of making a new one each time:

From main

$ go test -bench=BenchmarkInstrumentCreation
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkInstrumentCreation-24    	   98368	     11940 ns/op	    3043 B/op	      69 allocs/op
PASS
ok  	go.opentelemetry.io/otel/sdk/metric	1.379s

This PR:

$ go test -bench=BenchmarkInstrumentCreation
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkInstrumentCreation-24    	  318759	      3187 ns/op	     480 B/op	       6 allocs/op
PASS
ok  	go.opentelemetry.io/otel/sdk/metric	1.144s

If I move NewMeterProvider and Meter() inside the loop (so that it isn't using the cached one), I get

From main

$ go test -bench=BenchmarkInstrumentCreation
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkInstrumentCreation-24    	   32074	     37227 ns/op	   18326 B/op	     225 allocs/op
PASS
ok  	go.opentelemetry.io/otel/sdk/metric	1.651s

This PR:

$ go test -bench=BenchmarkInstrumentCreation
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkInstrumentCreation-24    	   27579	     44411 ns/op	   23228 B/op	     237 allocs/op
PASS
ok  	go.opentelemetry.io/otel/sdk/metric	1.745s

Which is slightly slower, but that is partially because we are doing some extra instantiation of the caches in the MeterProvider.

…s doesn't leak memory (open-telemetry#4820)" This reverts commit 1978044.

…ith callbacks (#5606) In #4820, I only added a comment describing the behavior to `Int64ObservableCounter`, but forgot other instruments. This adds the comment to all observable instruments. Fixes #5561 --------- Co-authored-by: Robert Pająk <[email protected]> Co-authored-by: Tyler Yahn <[email protected]>

dashpole requested review from MrAlias, Aneurysm9, evantorrie, XSAM, MadVikingGod, pellared, hanyuancheung and dmathieu as code owners January 10, 2024 20:28

dashpole mentioned this pull request Jan 10, 2024

Calling otelgrpc.UnaryClientInterceptor repeatedly creates multiple Int64Histogram resulting in a memory leak open-telemetry/opentelemetry-go-contrib#4226

Closed

MrAlias reviewed Jan 11, 2024

View reviewed changes

sdk/metric/meter.go Outdated Show resolved Hide resolved

dashpole force-pushed the duplicate_instruments branch from a805a5d to e87d2e5 Compare January 19, 2024 19:28

dashpole force-pushed the duplicate_instruments branch from d27e99d to fd4d810 Compare January 19, 2024 21:16

MrAlias reviewed Jan 19, 2024

View reviewed changes

sdk/metric/cache.go Outdated Show resolved Hide resolved

dashpole added 5 commits January 22, 2024 14:56

cache instruments to avoid leaking memory

c2e0710

add cacheWithErr to simplify error handling

414d8bc

add wanring on repeated obserbable instrument creation with callbacks

68a29ca

documentation for new behavior

8e4ef1a

address feedback

80d90cf

dashpole force-pushed the duplicate_instruments branch from fd4d810 to 80d90cf Compare January 22, 2024 14:58

MrAlias approved these changes Jan 23, 2024

View reviewed changes

Merge branch 'main' into duplicate_instruments

55829b3

dmathieu approved these changes Jan 24, 2024

View reviewed changes

MrAlias added this to the v1.23.0 milestone Jan 24, 2024

MrAlias merged commit 1978044 into open-telemetry:main Jan 24, 2024
25 checks passed

dashpole deleted the duplicate_instruments branch January 24, 2024 15:48

MrAlias mentioned this pull request Feb 5, 2024

Release v1.23.0/v0.45.1 #4885

Merged

cpheps mentioned this pull request Feb 9, 2024

Duplicate Registered Metric with Exporter Helper and OTel Metrics open-telemetry/opentelemetry-collector#9514

Closed

strideynet mentioned this pull request Mar 1, 2024

Memory leak caused by otelhttp when no Meter Provider is configured. open-telemetry/opentelemetry-go-contrib#5190

Closed

dmitryax added a commit to dmitryax/opentelemetry-go that referenced this pull request Jun 28, 2024

Revert "Cache instruments so repeatedly creating identical instrument…

26a886c

…s doesn't leak memory (open-telemetry#4820)" This reverts commit 1978044.

dmitryax mentioned this pull request Jun 29, 2024

Observable metrics with different attributes reported only once #5561

Closed

dmitryax added a commit to dmitryax/opentelemetry-go that referenced this pull request Jul 2, 2024

Revert "Cache instruments so repeatedly creating identical instrument…

bd0dafd

…s doesn't leak memory (open-telemetry#4820)" This reverts commit 1978044.

dashpole mentioned this pull request Jul 10, 2024

Add comment to other observable instruments about repeated creation with callbacks #5606

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache instruments so repeatedly creating identical instruments doesn't leak memory #4820

Cache instruments so repeatedly creating identical instruments doesn't leak memory #4820

dashpole commented Jan 10, 2024 •

edited

Loading

codecov bot commented Jan 10, 2024 •

edited

Loading

MrAlias commented Jan 11, 2024

dashpole commented Jan 11, 2024

MadVikingGod commented Jan 11, 2024

MrAlias commented Jan 11, 2024

MrAlias commented Jan 11, 2024

dashpole commented Jan 12, 2024

MrAlias commented Jan 18, 2024

MrAlias commented Jan 19, 2024

dashpole commented Jan 20, 2024 •

edited

Loading

Cache instruments so repeatedly creating identical instruments doesn't leak memory #4820

Cache instruments so repeatedly creating identical instruments doesn't leak memory #4820

Conversation

dashpole commented Jan 10, 2024 • edited Loading

codecov bot commented Jan 10, 2024 • edited Loading

Codecov Report

MrAlias commented Jan 11, 2024

dashpole commented Jan 11, 2024

MadVikingGod commented Jan 11, 2024

MrAlias commented Jan 11, 2024

MrAlias commented Jan 11, 2024

dashpole commented Jan 12, 2024

MrAlias commented Jan 18, 2024

MrAlias commented Jan 19, 2024

dashpole commented Jan 20, 2024 • edited Loading

dashpole commented Jan 10, 2024 •

edited

Loading

codecov bot commented Jan 10, 2024 •

edited

Loading

dashpole commented Jan 20, 2024 •

edited

Loading