Application crash caused by infinite concurrencyLimit during extended collector unavailability #4204
Labels
bug
Something isn't working
priority:p1
Bugs which cause problems in end-user applications such as crashes, data inconsistencies, etc
What happened?
Steps to Reproduce
Run the NodeJS application using auto-instrumentation as follows, but without starting the collector (to simulate extended collector unavailability).
Expected Result
OTel agent should drop the traces after a (configurable) queue size is reached.
Actual Result
OTel agent keep queuing traces infinitely, ultimately causing the application process to crash due to OOM.
Additional Details
#1708 has implemented
concurrencyLimit
option to limit the queue size. However, the default value isInfinity
and there is no environment variable to setconcurrencyLimit
. It can be set via constructor argument, but exporter construction happens deep inside the auto-instrumentation flow.OpenTelemetry specification says that all components should have bounded memory usage. Most components that use some sort of queue also specify reasonable default values for max queue size. In my view:
concurrencyLimit
should be changed fromInfinity
to a reasonable finite number. [critical]concurrencyLimit
(e.g. environment variable) for auto-instrumentation use-cases. [nice-to-have]OpenTelemetry Setup Code
package.json
@opentelemetry/auto-instrumentations-node@^0.38.0
Relevant log output
No response
The text was updated successfully, but these errors were encountered: