You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@seankhliao reported in the otel-operator channel in slack that the TA was using around 200 gigs of memory for 4k+ pods. This is definitely excessive and makes me think that we either have a memory leak and/or poor memory usage. I'm planning on going through the TA code, running some benchmarks/profiling, and doing some refactoring to improve the performance of the service.
The text was updated successfully, but these errors were encountered:
I think most of what I experienced is in that slack thread.
Some additional info:
I think it's mostly from the map copies generated from TargetItems() in allocation. Running the allocator without clients doesn't produce the increasing memory usage profile.
I made some attempt at using sync.Pool and/or some rate limiting on TargetItems(), it only slowed but didn't prevent the runaway memory usage.
I ran with 10 collector instances with a 10s refresh interval (later reduced to 30s).
Setting GOMEMLIMIT did not prevent it from exceeding the soft memory limit.
@seankhliao reported in the otel-operator channel in slack that the TA was using around 200 gigs of memory for 4k+ pods. This is definitely excessive and makes me think that we either have a memory leak and/or poor memory usage. I'm planning on going through the TA code, running some benchmarks/profiling, and doing some refactoring to improve the performance of the service.
The text was updated successfully, but these errors were encountered: