-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread-local storage for Tracer.state #2475
Conversation
b84f934
to
95ead86
Compare
Hey, regarding that test failure: so the test basically makes diff --git a/lib/new_relic/thread_local_storage.rb b/lib/new_relic/thread_local_storage.rb
index 0d3e161cb..2f2766ba9 100644
--- a/lib/new_relic/thread_local_storage.rb
+++ b/lib/new_relic/thread_local_storage.rb
@@ -5,7 +5,7 @@
module NewRelic
module ThreadLocalStorage
def self.get(thread, key)
- if Agent.config[:thread_local_tracer_state]
+ if use_thread_local_tracer_state?
thread.thread_variable_get(key)
else
thread[key]
@@ -13,7 +13,7 @@ module NewRelic
end
def self.set(thread, key, value)
- if Agent.config[:thread_local_tracer_state]
+ if use_thread_local_tracer_state?
thread.thread_variable_set(key, value)
else
thread[key] = value
@@ -27,5 +27,11 @@ module NewRelic
def self.[]=(key, value)
set(::Thread.current, key, value)
end
+
+ def self.use_thread_local_tracer_state?
+ Agent.config[:thread_local_tracer_state]
+ rescue StandardError
+ false
+ end
end
end ... or we could change the test itself, what do you think? |
Here's a version that changes the test rather than the code: --- a/test/multiverse/suites/rake/instrumentation_test.rb
+++ b/test/multiverse/suites/rake/instrumentation_test.rb
@@ -57,7 +57,7 @@ class RakeInstrumentationTest < Minitest::Test
NewRelic::Agent::Instrumentation::Rake.stub :should_trace?, true, [instance.name] do
error = RuntimeError.new('expected')
# produce the exception we want to have the method rescue
- NewRelic::Agent.stub :config, -> { raise error } do
+ NewRelic::Agent.stub :instance, -> { raise error } do
logger = MiniTest::Mock.new
NewRelic::Agent.stub :logger, logger do
logger.expect :error, nil, [/^Exception/, error] Also makes the tests pass on my machine. I like this one better — as long as we generally don't expect |
@@ -9,8 +9,7 @@ class AgentThread | |||
def self.create(label, &blk) | |||
::NewRelic::Agent.logger.debug("Creating AgentThread: #{label}") | |||
wrapped_blk = proc do | |||
if ::Thread.current[:newrelic_tracer_state] && Thread.current[:newrelic_tracer_state].current_transaction | |||
txn = ::Thread.current[:newrelic_tracer_state].current_transaction | |||
if (txn = ::NewRelic::ThreadLocalStorage[:newrelic_tracer_state]&.current_transaction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice syntax update!
Hi @markiz—thanks for your PR! This is a great improvement to how we handle thread state. Thanks for the extra effort of putting this change behind a configuration and adding tests. For the failing test, we like your second option better too (changing the test to stubbing :instance). Once that test is updated, we will be happy to get this approved and merged! |
…een thread-local and fiber-local storage
3d2e7cf
to
554a459
Compare
@hannahramadan alright, I've amended the MR, let's check it out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @markiz!
Overview
Background
We were working on a certain sidekiq job that was generating a large (300+mb) CSV file, and to deal with the growing memory usage, we've implemented streaming. That implementation included using custom Ruby Enumerators.
When we launched it in production, we've noticed that NewRelic no longer traces database calls within the sidekiq job transaction. After some debugging, it turned out that the reason for that was that NewRelic holds its tracer state in the
Thread#[...]
storage. However,Thread#[...]
is actually a fiber-local storage, and our implementation specifically usedEnumerator#next
, which spawns a newFiber
. As a result, NewRelic lost track of the current tracer state and attributed everything within that enumerator to CPU burn.Support case where this issue originated is at https://support.newrelic.com/s/case-details?caseId=500Ph000007YRJX
Proposed Solution
This PR includes a small new abstraction
NewRelic::ThreadLocalStorage
that is used instead of theThread#[...]
where applicable. Its behavior is controlled by the new configuration option:thread_local_tracer_state
(default: false) — whenfalse
, keeps the original behavior, whentrue
, usesThread#thread_variable_get
andThread#thread_variable_set
in place ofThread#[]
andThread#[]=
respectively. I'm not 100% sure how safe it would be to use the new behavior across the board, because there could be fiber-based server implementations that also use NewRelic and rely on that storage being fiber-local, hence the configuration option.There is also one spot where I wasn't entirely sure whether a change from
Thread.current[]
toThreadLocalStorage[]
makes sense (NewRelic::Agent::Instrumentation::NotificationsSubscriber#segment_stack
), so I haven't touched that.Submitter Checklist:
Reviewer Checklist