Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses try_recv() for the accounts read cache evictor #3867

Merged

Conversation

brooksprumo
Copy link

@brooksprumo brooksprumo commented Dec 2, 2024

Problem

Similar to #3813 and #3814, we want the foreground thread that calls send() on a channel to not need to grab a mutex to wake up the background thread doing recv().

For the Accounts Read Cache, the channel in question here is for handling evictions. These are purposely done in the background to unburden for the foreground. We don't want the foreground threads to now grab a mutex! The background evictor can lazily check the channel.

Summary of Changes

Change the evictor to use try_recv() instead of recv().

@brooksprumo brooksprumo self-assigned this Dec 2, 2024
@brooksprumo brooksprumo force-pushed the read-cache/evictor-uses-try-recv branch from fecc6d2 to 245e3b3 Compare December 2, 2024 16:20
@brooksprumo brooksprumo marked this pull request as ready for review December 2, 2024 17:04
Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find.
lgtm!

Copy link

@alessandrod alessandrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good with 1 nit and 1 question to make sure we don't spike memory too much

Ok(_) => {
// Evict request received!
}
Err(crossbeam_channel::TryRecvError::Empty) => {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use crossbeam_channel::TryRecvError;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, originally I had that in my first iteration: fecc6d2. I removed it since in this file we explicitly call out crossbeam_channel types explicitly, so I wanted to follow that precedent. Wdyt, keep as is, or go back to v1 with the use?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no yeah consistency is gud

// No requests to evict were received, so sleep and check again.
// Note: We don't need/want to evict often. 100 ms is already four
// times per slot, which should be plenty.
thread::sleep(Duration::from_millis(100));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many evict request can be queued in 100ms? Is the channel unbounded? What's the upper bound on the memory that could be queued?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel is size 1. The eviction channel is therefore tiny. The main question is how much memory the read cache itself will use in that 100 milliseconds while the evictor is sleeping. Let me spin up a node a see how it compares.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over a 15 minute interval, here's how may times a v2.0 canary wakes up and does productive work. Note, each of these counters/datapoints is submitted once per second. So if there were 10 wakeups in a second, we'd see 10. Since the value is either 1 or 0, it means we are only waking up once per second max at the moment.

Screenshot 2024-12-02 at 7 23 21 PM

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the total wakeups on the same node. The difference is another thread likely queued a request to the evictor just after the evictor woke up and popped the channel, but before eviction had completed. So we see a second wakeup sometimes that doesn't have work to do.

Screenshot 2024-12-02 at 7 25 45 PM

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have watermarks for the size (high and low), and wakeups of at most 1/2 per second, I think that means a sleep of 100 ms is fine here too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original idea for the evictor was that it would run infrequently, and being awoken was better than polling and sleeping. Maybe that design should be revisited. We can remove the channel and the background evictor can wakeup and check the size at a fixed time interval.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this out on mnb, and it seems to be working fine.

Graphs are over 6 hours. Blue is running this PR, purple is running master.

total number of wakeups is up, which is expected as now we check ever 100 ms:
wakeups all

The number of productive wakeups is basically the same:
wakeups productive

The time spent storing new accounts into the read cache, aka the important one, is about the same? Maybe slightly better? Not by a lot though:
store time

And lastly the size of the cache itself. There's a spike at the end, but only 10 MB, so seems fine to me.
size

@brooksprumo brooksprumo merged commit 9067f3a into anza-xyz:master Dec 3, 2024
40 checks passed
@brooksprumo brooksprumo deleted the read-cache/evictor-uses-try-recv branch December 3, 2024 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants