Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[NO-TICKET] Fix breaking applications due to concurrency issue in tra…
…cing **What does this PR do?** This PR fixes #2851 . As explained in the original bug report, there's a concurrency issue that can be triggered by the environment logger running concurrently with activating an instrumentation. **Motivation:** Fix a long-standing concurrency issue. **Additional Notes:** To fix this issue, I've made two changes: 1. Introduced a mutex to protect the `@instrumented_integrations`, thus making sure no two threads can be touching it at the same time 2. Took advantage of the fact that `#instrumented_integrations` was marked as private, and only being used by the environment logger and telemetry (e.g. read-only usage) to return a copy of the data. This way, we can safely iterate on the data while reconfiguration is happening concurrently. I believe the overhead of this lock is negligible, since we don't need to read this information very often. **How to test the change?** I wrote a reproducer to be able to see this issue easily. The reproducer is in two parts -- `concurrent_bug_repo.rb`: ```ruby require 'datadog' Datadog.configure do |c| c.tracing.instrument(:http, split_by_domain: true) end $sync_with_logger = Queue.new Thread.new do Datadog::Tracing::Diagnostics::EnvironmentLogger.collect_and_log! puts "Background thread finished!" end $sync_with_logger.pop Datadog.configure do |c| c.tracing.instrument(:rake, split_by_domain: true) end ``` and the following change to the environment logger: ```diff diff --git a/lib/datadog/tracing/diagnostics/environment_logger.rb b/lib/datadog/tracing/diagnostics/environment_logger.rb index d96cfa14c6..427861bff7 100644 --- a/lib/datadog/tracing/diagnostics/environment_logger.rb +++ b/lib/datadog/tracing/diagnostics/environment_logger.rb @@ -128,7 +128,15 @@ module Datadog end def collect_integrations_settings! + sync_once = $sync_with_logger + instrumented_integrations.each_with_object({}) do |(name, integration), result| + if sync_once + sync_once << true + sync_once = nil + sleep 5 + end + integration.configuration.to_h.each do |setting, value| next if setting == :tracer # Skip internal objects ``` That is, I specifically injected a `sleep` in the environment logger, and made it run on a background thread, triggering the exact same issue that had been plaguing us for years. Running the above change without my PR will trigger the issue, and with my PR it no longer does -- since the environment logger now gets its own copy of the instrumented integrations.
- Loading branch information