Thread-local storage for Tracer.state #2475

markiz · 2024-03-02T18:21:31Z

Overview

Background

We were working on a certain sidekiq job that was generating a large (300+mb) CSV file, and to deal with the growing memory usage, we've implemented streaming. That implementation included using custom Ruby Enumerators.

When we launched it in production, we've noticed that NewRelic no longer traces database calls within the sidekiq job transaction. After some debugging, it turned out that the reason for that was that NewRelic holds its tracer state in the Thread#[...] storage. However, Thread#[...] is actually a fiber-local storage, and our implementation specifically used Enumerator#next, which spawns a new Fiber. As a result, NewRelic lost track of the current tracer state and attributed everything within that enumerator to CPU burn.

Support case where this issue originated is at https://support.newrelic.com/s/case-details?caseId=500Ph000007YRJX

Proposed Solution

This PR includes a small new abstraction NewRelic::ThreadLocalStorage that is used instead of the Thread#[...] where applicable. Its behavior is controlled by the new configuration option: thread_local_tracer_state (default: false) — when false, keeps the original behavior, when true, uses Thread#thread_variable_get and Thread#thread_variable_set in place of Thread#[] and Thread#[]= respectively. I'm not 100% sure how safe it would be to use the new behavior across the board, because there could be fiber-based server implementations that also use NewRelic and rely on that storage being fiber-local, hence the configuration option.

There is also one spot where I wasn't entirely sure whether a change from Thread.current[] to ThreadLocalStorage[] makes sense (NewRelic::Agent::Instrumentation::NotificationsSubscriber#segment_stack), so I haven't touched that.

Submitter Checklist:

Include a link to the related GitHub issue, if applicable
Include a security review link, if applicable

Reviewer Checklist

Perform code review
Add performance label
Perform appropriate level of performance testing
Confirm all checks passed
Add version label prior to acceptance

CLAassistant · 2024-03-02T18:21:37Z

All committers have signed the CLA.

markiz · 2024-03-06T10:44:50Z

Hey, regarding that test failure: so the test basically makes Agent.config throw an exception which doesn't get logged. I could code around that defensively, like this (the tests are passing on my machine after the change):

diff --git a/lib/new_relic/thread_local_storage.rb b/lib/new_relic/thread_local_storage.rb
index 0d3e161cb..2f2766ba9 100644
--- a/lib/new_relic/thread_local_storage.rb
+++ b/lib/new_relic/thread_local_storage.rb
@@ -5,7 +5,7 @@
 module NewRelic
   module ThreadLocalStorage
     def self.get(thread, key)
-      if Agent.config[:thread_local_tracer_state]
+      if use_thread_local_tracer_state?
         thread.thread_variable_get(key)
       else
         thread[key]
@@ -13,7 +13,7 @@ module NewRelic
     end

     def self.set(thread, key, value)
-      if Agent.config[:thread_local_tracer_state]
+      if use_thread_local_tracer_state?
         thread.thread_variable_set(key, value)
       else
         thread[key] = value
@@ -27,5 +27,11 @@ module NewRelic
     def self.[]=(key, value)
       set(::Thread.current, key, value)
     end
+
+    def self.use_thread_local_tracer_state?
+      Agent.config[:thread_local_tracer_state]
+    rescue StandardError
+      false
+    end
   end
 end

... or we could change the test itself, what do you think?

markiz · 2024-03-06T11:07:44Z

Here's a version that changes the test rather than the code:

--- a/test/multiverse/suites/rake/instrumentation_test.rb
+++ b/test/multiverse/suites/rake/instrumentation_test.rb
@@ -57,7 +57,7 @@ class RakeInstrumentationTest < Minitest::Test
     NewRelic::Agent::Instrumentation::Rake.stub :should_trace?, true, [instance.name] do
       error = RuntimeError.new('expected')
       # produce the exception we want to have the method rescue
-      NewRelic::Agent.stub :config, -> { raise error } do
+      NewRelic::Agent.stub :instance, -> { raise error } do
         logger = MiniTest::Mock.new
         NewRelic::Agent.stub :logger, logger do
           logger.expect :error, nil, [/^Exception/, error]

Also makes the tests pass on my machine. I like this one better — as long as we generally don't expect Agent.config to actually raise exceptions.

hannahramadan · 2024-03-07T00:12:08Z

lib/new_relic/agent/threading/agent_thread.rb

@@ -9,8 +9,7 @@ class AgentThread
        def self.create(label, &blk)
          ::NewRelic::Agent.logger.debug("Creating AgentThread: #{label}")
          wrapped_blk = proc do
-            if ::Thread.current[:newrelic_tracer_state] && Thread.current[:newrelic_tracer_state].current_transaction
-              txn = ::Thread.current[:newrelic_tracer_state].current_transaction
+            if (txn = ::NewRelic::ThreadLocalStorage[:newrelic_tracer_state]&.current_transaction)


Nice syntax update!

hannahramadan · 2024-03-07T00:21:52Z

Hi @markiz—thanks for your PR! This is a great improvement to how we handle thread state. Thanks for the extra effort of putting this change behind a configuration and adding tests.

For the failing test, we like your second option better too (changing the test to stubbing :instance). Once that test is updated, we will be happy to get this approved and merged!

…een thread-local and fiber-local storage

markiz · 2024-03-07T13:54:24Z

@hannahramadan alright, I've amended the MR, let's check it out

kaylareopelle

Thank you, @markiz!

markiz requested review from fallwith, hannahramadan, kaylareopelle and tannalynn as code owners March 2, 2024 18:21

github-actions bot added the community To tag external issues and PRs submitted by the community label Mar 2, 2024

markiz force-pushed the ma/thread-local-storage branch from b84f934 to 95ead86 Compare March 2, 2024 18:33

kford-newrelic added the estimate Issue needing estimation label Mar 4, 2024

hannahramadan reviewed Mar 7, 2024

View reviewed changes

markiz added 2 commits March 7, 2024 17:50

Add ThreadLocalStorage abstraction to store thread-local data

bcb0d5c

Add thread_local_tracer_state configuration option that switches betw…

554a459

…een thread-local and fiber-local storage

markiz force-pushed the ma/thread-local-storage branch from 3d2e7cf to 554a459 Compare March 7, 2024 13:52

hannahramadan approved these changes Mar 7, 2024

View reviewed changes

kaylareopelle approved these changes Mar 7, 2024

View reviewed changes

hannahramadan merged commit 6be912c into newrelic:dev Mar 7, 2024
25 checks passed

hannahramadan mentioned this pull request Mar 8, 2024

CHANGELOG update #2498

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread-local storage for Tracer.state #2475

Thread-local storage for Tracer.state #2475

markiz commented Mar 2, 2024 •

edited

Loading

CLAassistant commented Mar 2, 2024 •

edited

Loading

markiz commented Mar 6, 2024 •

edited

Loading

markiz commented Mar 6, 2024

hannahramadan Mar 7, 2024

hannahramadan commented Mar 7, 2024

markiz commented Mar 7, 2024

kaylareopelle left a comment

Thread-local storage for Tracer.state #2475

Thread-local storage for Tracer.state #2475

Conversation

markiz commented Mar 2, 2024 • edited Loading

Overview

Background

Proposed Solution

Submitter Checklist:

Reviewer Checklist

CLAassistant commented Mar 2, 2024 • edited Loading

markiz commented Mar 6, 2024 • edited Loading

markiz commented Mar 6, 2024

hannahramadan Mar 7, 2024

Choose a reason for hiding this comment

hannahramadan commented Mar 7, 2024

markiz commented Mar 7, 2024

kaylareopelle left a comment

Choose a reason for hiding this comment

markiz commented Mar 2, 2024 •

edited

Loading

CLAassistant commented Mar 2, 2024 •

edited

Loading

markiz commented Mar 6, 2024 •

edited

Loading