-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix logging with interest cache #2165
base: v0.1.x
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,101 @@ | |||
#![cfg(feature = "interest-cache")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test isn't run as part of the CI. Should I add it to the CI config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there should be a CI job that runs with --all-features
for every crate.
081c3f1
to
fa7b2c8
Compare
This should also be released directly in a new |
I haven't reviewed this yet, but can you make this PR target the master branch? We generally handle backports; it's easier for us if this targets master. |
On master the interests cache doesn't exist, so this is not an issue there? |
Ugh, that's an oversight. That should be on master. |
I saw the big fat warning in the readme that the v0.1.x is what's released on crates.io, I saw a bunch of other PRs made on v0.1.x too and for testing purposes it was easier to initially base it on v0.1.x, so I just made the PR for v0.1.x, and then no one complained/told me to make it on master... (: Sorry about that! Can we merge this PR as-is, and then as a followup PR I'll take this and my previous PR and port it to master? |
i wasn't paying attention, sorry. :) i'm fine with merging this change against i haven't had time to actually review this change yet, but hopefully i'll have time to get to it tomorrow or monday! |
Then I'm waiting for your feedback @hawkw. Ty you already! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've finally had time to take a look at this PR (sorry it took me a big longer than expected!). I understand the issue this resolves and I think the fix makes sense --- we should not be setting the local default to Dispatch::none
in the case where a global default has not been set yet.
However, running the benchmarks on this branch, I notice that this change has a rather significant performance impact in the subscriber.rs
benchmarks, in comparison to v0.1.x
(). In particular, the benchmarks that measure the overhead of get_default
have regressed significantly:
dispatch/no_dispatch_get_ref
time: [20.388 ns 20.458 ns 20.533 ns]
change: [+783.82% +788.94% +792.85%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low mild
4 (4.00%) high mild
dispatch/no_dispatch_get_clone
time: [26.754 ns 26.762 ns 26.769 ns]
change: [+141.86% +141.94% +142.04%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) low mild
dispatch/get_ref time: [5.2403 ns 5.2463 ns 5.2552 ns]
change: [+122.87% +123.72% +124.57%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
dispatch/get_clone time: [12.155 ns 12.156 ns 12.157 ns]
change: [+10.745% +10.878% +11.031%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) low severe
2 (2.00%) high mild
which effects the performance of recording an event or creating a new span.
Full `tracing` benchmark results
Running unittests (target/release/deps/no_subscriber-dc9048f119b3e39d)
Gnuplot not found, disabling plotting
no_subscriber/span time: [329.58 ps 331.20 ps 332.77 ps]
change: [-0.0178% +0.2168% +0.4588%] (p = 0.11 > 0.05)
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
10 (10.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
no_subscriber/span_enter
time: [235.25 ps 235.32 ps 235.43 ps]
change: [+0.6917% +0.7828% +0.8696%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
no_subscriber/empty_span
time: [233.82 ps 234.29 ps 234.64 ps]
change: [-0.5106% -0.1015% +0.3728%] (p = 0.66 > 0.05)
No change in performance detected.
no_subscriber/empty_struct
time: [676.61 ps 677.40 ps 678.50 ps]
change: [-2.5811% -2.2355% -1.8730%] (p = 0.00 < 0.05)
Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
2 (2.00%) low mild
1 (1.00%) high mild
17 (17.00%) high severe
no_subscriber/event time: [227.36 ps 230.21 ps 235.99 ps]
change: [-1.8480% +0.2089% +3.2513%] (p = 0.89 > 0.05)
No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) low mild
6 (6.00%) high mild
7 (7.00%) high severe
no_subscriber/relaxed_load
time: [466.51 ps 467.00 ps 467.58 ps]
change: [-0.0715% +0.0541% +0.1849%] (p = 0.43 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
no_subscriber/acquire_load
time: [464.25 ps 466.81 ps 469.31 ps]
change: [-0.7907% -0.4652% -0.1568%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
13 (13.00%) low mild
1 (1.00%) high mild
no_subscriber/log time: [235.08 ps 235.14 ps 235.23 ps]
change: [-0.0774% -0.0457% -0.0177%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
3 (3.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
no_subscriber_field/span
time: [1.4180 ns 1.4193 ns 1.4217 ns]
change: [+1.8668% +2.1892% +2.5196%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
5 (5.00%) high severe
no_subscriber_field/event
time: [468.34 ps 469.12 ps 469.75 ps]
change: [-0.3684% +0.0417% +0.2853%] (p = 0.86 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
no_subscriber_field/log time: [235.18 ps 235.23 ps 235.30 ps]
change: [+0.3970% +0.5570% +0.6734%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
6 (6.00%) high severe
Gnuplot not found, disabling plotting
Running unittests (target/release/deps/subscriber-b3c6395267c4f6e6)
Gnuplot not found, disabling plotting
span_no_fields time: [21.978 ns 22.031 ns 22.100 ns]
change: [+17.484% +18.044% +18.607%] (p = 0.00 < 0.05)
Performance has regressed.
enter_span time: [2.6206 ns 2.6273 ns 2.6358 ns]
change: [-1.3274% -0.4722% +0.3517%] (p = 0.27 > 0.05)
No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
1 (1.00%) low mild
2 (2.00%) high mild
17 (17.00%) high severe
span_repeatedly time: [3.8106 us 3.8113 us 3.8119 us]
change: [+6.8139% +6.8656% +6.9101%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) low mild
2 (2.00%) high mild
span_with_fields time: [58.965 ns 58.972 ns 58.979 ns]
change: [+3.8399% +3.8606% +3.8813%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
3 (3.00%) low mild
2 (2.00%) high mild
span_with_fields_record time: [307.05 ns 307.61 ns 308.15 ns]
change: [+0.1574% +0.3189% +0.4827%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
dispatch/no_dispatch_get_ref
time: [20.388 ns 20.458 ns 20.533 ns]
change: [+783.82% +788.94% +792.85%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low mild
4 (4.00%) high mild
dispatch/no_dispatch_get_clone
time: [26.754 ns 26.762 ns 26.769 ns]
change: [+141.86% +141.94% +142.04%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) low mild
dispatch/get_ref time: [5.2403 ns 5.2463 ns 5.2552 ns]
change: [+122.87% +123.72% +124.57%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
dispatch/get_clone time: [12.155 ns 12.156 ns 12.157 ns]
change: [+10.745% +10.878% +11.031%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) low severe
2 (2.00%) high mild
Gnuplot not found, disabling plotting
I think it may be possible to resolve the issue that this branch fixes with less overhead. I'm going to look into some potential changes to this PR to see if they decrease the performance regression --- in particular, I think we can avoid generating unwinding code in Entered
. I'll get back to you once I've had the time to look into potential changes.
tracing-log/src/interest_cache.rs
Outdated
@@ -235,7 +235,7 @@ mod tests { | |||
|
|||
fn lock_for_test() -> impl Drop { | |||
// We need to make sure only one test runs at a time. | |||
static LOCK: Lazy<Mutex<()>> = Lazy::new(Mutex::new); | |||
static LOCK: Lazy<Mutex<()>> = Lazy::new(|| Mutex::new(())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For whatever reason, my rustc version didn't liked it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I revert this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Okay, I've made some additional changes to this branch and the performance regression is now...less severe:
|
@hawkw ty! :) |
9d1a308
to
78f3b01
Compare
@hawkw how do we want to move forward? |
I would like to reduce the performance regression in the |
Whoops, we were actually not running the benchmarks with a global default subscriber (:upside_down_face:). After fixing that (in 181bdbe), we see that the perf diff in that case is more severe:
Note that recording an event is about 25% slower on 20d9695 than on Going to keep seeing what I can do to reduce the overhead of this change. |
Okay, yeah that is a lot of overhead. Looking at the code, it is not really that much different. It could be probably the |
When the interest cache is configured as part of initing the `LogTracer` before the global dispatcher is set, the main thread fails to log. The problem is that the interests cache registers a dummy callsite. With tokio-rs#2083 the callsite registration changed to use `dispatcher::get_default`. This call internally sets the thread local `default` dispatcher to `Dispatch::none` when there isn't yet any dispatcher set. This `default` dispatcher isn't overwritten when setting the global dispatcher. The solution is to not store the `Dispatch::none` as `default` dispatcher.
this brings performance of most of the `subscriber.rs` benchmarks closer to where they were on `v0.1.x` prior to this change (commit cf98b73): ``` Running unittests (target/release/deps/subscriber-b3c6395267c4f6e6) Gnuplot not found, disabling plotting span_no_fields time: [18.814 ns 18.900 ns 18.967 ns] change: [-17.562% -17.045% -16.552%] (p = 0.00 < 0.05) Performance has improved. enter_span time: [2.8471 ns 2.8620 ns 2.8799 ns] change: [+7.5923% +8.2941% +8.9390%] (p = 0.00 < 0.05) Performance has regressed. Found 20 outliers among 100 measurements (20.00%) 2 (2.00%) low severe 2 (2.00%) high mild 16 (16.00%) high severe span_repeatedly time: [3.6959 us 3.6988 us 3.7017 us] change: [-3.0008% -2.9276% -2.8528%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild span_with_fields time: [56.854 ns 56.911 ns 56.975 ns] change: [-3.5756% -3.4955% -3.3997%] (p = 0.00 < 0.05) Performance has improved. Found 20 outliers among 100 measurements (20.00%) 1 (1.00%) low mild 3 (3.00%) high mild 16 (16.00%) high severe span_with_fields_record time: [317.15 ns 318.02 ns 318.94 ns] change: [+3.1700% +3.3617% +3.5555%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe dispatch/no_dispatch_get_ref time: [17.823 ns 17.966 ns 18.134 ns] change: [-10.826% -10.144% -9.4796%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild dispatch/no_dispatch_get_clone time: [24.984 ns 25.013 ns 25.041 ns] change: [-6.7511% -6.6349% -6.5020%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild 4 (4.00%) high severe dispatch/get_ref time: [3.8508 ns 3.8532 ns 3.8555 ns] change: [-26.641% -26.518% -26.404%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe dispatch/get_clone time: [10.737 ns 10.750 ns 10.765 ns] change: [-11.632% -11.429% -11.219%] (p = 0.00 < 0.05) Performance has improved. Found 17 outliers among 100 measurements (17.00%) 3 (3.00%) low mild 2 (2.00%) high mild 12 (12.00%) high severe Gnuplot not found, disabling plotting ``` performance of `get_default` where there is no current subscriber is still significantly regressed vs `v0.1.x`, due to `Arc` allocations for `Dispatch::none` occurring every time. i think this can be solved as well.
Signed-off-by: Eliza Weisman <[email protected]>
20d9695
to
45c477e
Compare
Hey @hawkw, sorry for the long delay. I pushed some commits. Could you take a look please. I also run the benchmarks. Besides the
|
@bkchr thanks for continuing to work on this, I'm excited to take a look at your recent changes. |
@bkchr just to clarify, are change percentages in the benchmark results you posted in #2165 (comment) relative to |
They are relative to |
Fantastic, thanks! |
tracing-core/src/dispatcher.rs
Outdated
GLOBAL_DEFAULT_DISPATCH.store( | ||
GLOBAL_DISPATCH.as_mut().unwrap() as *mut _, | ||
Ordering::SeqCst, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally, i think this would be a compare_exchange
so that it fails if the global dispatch value was something other than the initial default NO_DISPATCH
value. we could panic if the CAS fails (at least in debug mode)
tracing-core/src/dispatcher.rs
Outdated
fn with_current<T>(&self, f: impl FnOnce(&Dispatch) -> T) -> T { | ||
match &*self.0.default.borrow() { | ||
Some(default) => f(default), | ||
None => unsafe { f(&*GLOBAL_DEFAULT_DISPATCH.load(Ordering::SeqCst)) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure this load needs to be sequentially consistent, or can it just be an Acquire
load? what other atomic value does this need to synchronize with?
tracing-core/src/dispatcher.rs
Outdated
// The global default dispatch that should be used by [`Enter::with_current`] | ||
// when there is no explicit local dispatch set. | ||
// | ||
// By default this is initialized to [`NO_DISPATCH`] and will be set to | ||
// [`GLOBAL_DISPATCH`] when this is being set. | ||
static GLOBAL_DEFAULT_DISPATCH: Lazy<AtomicPtr<Dispatch>> = | ||
Lazy::new(|| AtomicPtr::new(unsafe { &mut *NO_DISPATCH as *mut _ })); | ||
|
||
// This is only `mut` to make it work as default value of [`GLOBAL_DEFAULT_DISPATCH`]. | ||
// | ||
// We can not use `new` as we don't want this to be registered at the callsites. | ||
static mut NO_DISPATCH: Lazy<Dispatch> = Lazy::new(|| Dispatch { | ||
subscriber: Arc::new(NoSubscriber::default()), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it occurs to me that we could potentially avoid the need to have two Lazy
cells and a static mut
if we made a way for a "no dispatcher" Dispatch
to be constructed in a const fn
. we could investigate changing Dispatch
to:
pub struct Dispatch {
subscriber: Option<Arc<dyn Subscriber + ...>>,
}
and having the None
variant always nop. Of course, we'd want to ensure that doesn't have a worse performance impact, but it might be worth considering.
It would also make cloning the NO_DISPATCH
dispatcher cheaper, as it wouldn't have to do an Arc
refcount bump...
It also rewrites the logic on how the `current` dispatcher is determined.
Back, with your comments included @hawkw :) Now it also looks cleaner IMO.
|
@hawkw would be nice if you could take a look when you have time |
@hawkw could I get some new review? 🙈 |
Sorry @hawkw for pinging you again 🙈 |
Motivation
When the interest cache is configured as part of initing the
LogTracer
beforethe global dispatcher is set, the main thread fails to log. The problem
is that the interests cache registers a dummy callsite. With
#2083 the callsite registration
changed to use
dispatcher::get_default
. This call internally sets thethread local
default
dispatcher toDispatch::none
when there isn'tyet any dispatcher set. This
default
dispatcher isn't overwritten whensetting the global dispatcher.
Solution
The solution is to not store the
Dispatch::none
asdefault
dispatcher.