-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Background eviction #4191
Background eviction #4191
Conversation
I would recommend adding a flag of sorts that continues to perform eviction in the foreground:
|
9dbb1ae
to
2848532
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm late with reviewing, I only did a partial review so far. Thanks for taking a stab at this! I found it too difficult to reason about the changes unfortunately, due to a few things:
-
With the current interface, both background and main threads have access to basically every component of core, because both hold references to the Application class. This is very dangerous, and makes it nearly impossible to verify correctness of code. Maybe we should take a step back, and start with a diagram of interactions/dependencies between LedgerManager, BucketManager, etc followed by a sketch of interface changes (like for simplicity start with public API only, we can focus on intra-class synchronization later).
-
I think a design issue that makes things complicated is that the snapshot manages itself so it has to query LedgerManager for latest ledgers, and BucketManager for snapshot mutex (it feels weird that it has to acquire a mutex held by someone else to modify its own data). Why do we need state in snapshot class in the first place? It could be a const wrapper around buckets (with no access to any of the managers) maintained by some higher-level abstraction, which just swaps copies with the new snapshot once ledger close is done. We could introduce a new class that manages this hand-off with some invariance: e.g. only main thread can refresh the snapshot, while background threads can only retrieve a const copy of state to do whatever is needed.
-
As part of the interface cleanup, can we think about encapsulation and removal of unneeded mutexes. For example, with the changes suggested above, I think you won't need LCL mutex. We should encapsulate mutexes as much as possible, potentially inside the "handoff manager", so that callers don't need to worry about synchronization (note that we'll still need to keep track of ledger numbers, but conceptually it's part of "the snapshot", so it would all be guarded with one mutex).
-
I'd suggest incorporating additional invariance, so we can detect bugs faster and blow up rather than go into UB land. Some ideas here: assert main vs non-main thread,
future
state management like we do in FutureBucket, etc
src/bucket/BucketManager.h
Outdated
@@ -207,9 +207,11 @@ class BucketManager : NonMovableOrCopyable | |||
virtual void maybeSetIndex(std::shared_ptr<Bucket> b, | |||
std::unique_ptr<BucketIndex const>&& index) = 0; | |||
|
|||
virtual std::unique_ptr<SearchableBucketListSnapshot const> | |||
virtual std::unique_ptr<SearchableBucketListSnapshot> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the removal of const here doesn't look right: any consumer of snapshots shouldn't be able to modify the underlying object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree in principal, the only members of SearchableBucketListSnapshot
are mSnapshotManager
, which is already const, and std::unique_ptr<BucketListSnapshot const> mSnapshot{};
. Because SearchableBucketListSnapshot
checks and automatically updates mSnapshot
on each load, to be const correct mSnapshot
would have to be mutable. I don't think this makes much sense to have the only piece of non-const state be mutable, and I don't think it's dangerous because the underlying BucketListSnapshot
is const regardless.
src/bucket/BucketManagerImpl.cpp
Outdated
} | ||
|
||
auto iter = mBucketListDBBulkTimers.find(label); | ||
if (iter == mBucketListDBBulkTimers.end()) | ||
mEvictionFuture.wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is redundant: get
calls wait
already
if (numEntries != 0) | ||
ZoneScoped; | ||
|
||
if (!mEvictionFuture.valid()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look right: why isn't the future valid if we get to this state (resolving the eviction scan)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is std::future::valid() returns false if the future has been default initialized or has had a call to get
. It returns true if the future has been moved from (i.e. mEvictionFuture = std::packaged_task<EvictionResult()>::get_future()
) but has not yet been called by get
. Essentially, valid
returns true if there is ready or pending work, false if the work is done and has been retrieved, or if no work has been set yet.
This particular state is possible on startup, when we're closing our first ledger and haven't had a chance to call startBackgroundEvictionScan
as part of closeLedger
yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this only the case for tests then where we close v20 genesis ledger? In practice, genesis is always protocol 0, so it shouldn't need to do any eviction at all, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, by "closing our first ledger", I don't mean the genesis ledger, but the first ledger that a node closes since that node's startup. Like during catchup after assuming state, we'd hit this path on the first call to ledgerClose
, since the assume state path doesn't kick off the background eviction scan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like during catchup after assuming state, we'd hit this path on the first call to ledgerClose, since the assume state path doesn't kick off the background eviction scan.
why not? it should, right? otherwise, we'd stall on that first ledger and have to block until the scan is done. Seems like we should kick off the eviction scan in LedgerManagerImpl::setLastClosedLedger
which is called by catchup
void | ||
BucketManagerImpl::resolveBackgroundEvictionScan( | ||
AbstractLedgerTxn& ltx, uint32_t ledgerSeq, | ||
LedgerKeySet const& modifiedKeys) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use asserts like threadIsMain
in critical codepaths like this one
auto& metric = | ||
mApp.getMetrics().NewTimer({"bucketlistDB", "bulk", label}); | ||
iter = mBucketListDBBulkTimers.emplace(label, metric).first; | ||
startBackgroundEvictionScan(ledgerSeq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're on the main thread here scanning synchronously, so why not just do it with scanForEvictionLegacySQL
that is much less of a foot gun?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While scanForEvictionLegacySQL
and the background eviction scan should behave identically, there implementations are fairly different. The goal is to have background scanning by default and eventually deprecate the synchronous scan. For this reason, I think it's safer to only call variant of the scan, and of the two I default to the background scan when it's enabled. Personally, I think it's less of a footgun only calling into one implementation instead of mixing and matching 2.
auto iter = mBucketListDBBulkTimers.find(label); | ||
if (iter == mBucketListDBBulkTimers.end()) | ||
mEvictionFuture.wait(); | ||
auto evictionCandidates = mEvictionFuture.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should reset the current future as soon as we retrieve the result: some compilers keep lambdas alive as long as the future is alive (even if it's invalidated). I'm not sure this is specifically a problem with what we capture at the moment, but it's definitely a footgun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by reset here? Looking at the interface of std::future
, there's no functions I can see like reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I mean stop referencing the underlying object so it's destroyed properly, like we do in FutureBucket
stellar-core/src/bucket/FutureBucket.cpp
Line 267 in 7fc7671
mOutputBucketFuture = std::shared_future<std::shared_ptr<Bucket>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should only be an issue in shared_future
, not regular future
though right? Like there's no shared state being referenced here that can be kept alive, mEvictionFuture.get()
calls move
on the underlying state into auto evictionCandidates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yea, you're right, I didn't see it's just std::future. in this case yeah, get should release shared state as per the standard (note that std::move doesn't actually guarantee a move here, and the object can still be copied, but it doesn't matter as long as the future releases its shared state)
src/bucket/BucketManagerImpl.cpp
Outdated
}); | ||
|
||
mEvictionFuture = task->get_future(); | ||
mApp.postOnBackgroundThread(bind(&task_t::operator(), task), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current worker threads are all low priority, so I think we need a high-priority thread for eviction scans, since they are time-sensitive.
actually something else related to this (as I am seeing reports in Discord of something that could be related to this): |
68ae3c5
to
05e5388
Compare
void | ||
runCurrentThreadWithLowPriority() | ||
{ | ||
constexpr auto const LOW_PRIORITY_NICE = 5; | ||
runCurrentThreadWithPriority(THREAD_PRIORITY_LOWEST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about changing worker threads from THREAD_PRIORITY_BELOW_NORMAL
to THREAD_PRIORITY_LOWEST
. Any input from Windows folks would be appreciated.
05e5388
to
1941c3c
Compare
c9b2cbe
to
4bf9087
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks .. fairly plausible! I mean I didn't see earlier iterations of this but as it stands I don't see anything serious left to change. A few minor things:
- I'm a bit nervous about a thread / packaged task capturing an unsynchronized & to the statistics structure. I guess it's the only writer and the only reader looks at the stats once the task is complete, but it's a somewhat implicit / distant synchronization. I might like to see these change to atomics, or be bundled together with the result as a value type that gets returned from the packaged task rather than a lambda capture, or something.
- The number of tests that have an
isUsingBucketListDB
branch that basically skips subsequent integrity checking makes me uneasy; I realize we haven't got rid of the legacy SQL path yet so you're still getting logic coverage from the other branch, but I assume we're going to someday, and it'd be nice to actuall do the integrity checking against the bucketlist DB too! So .. maybe at least file a followup issue to refactor the integrity checks there so that these branches can go away? - It'd be good to figure out what happened with the test hashes removed from the json files and reconcile to what they're "supposed" to be on master currently.
4bf9087
to
3b8a91e
Compare
While probably not strictly necessary, to be on the safe side I've refactored the stats to be thread safe. The
Opened #4295.
I think something here got mixed up in a rebase, after rebasing on master and regenerating the meta files everything seems fine. |
Marta verbally agreed this was ready to land, but is on vacation
r+ 3b8a91e |
Unfortunately, I didn't have a chance to review/approve v2 of the PR since I was out. My previous review is on v1, and I've resolved some comments there. |
Description
This PR enables background eviction scans when running BucketListDB. This increases the time we have to scan for expired entries to evict and will improve eviction throughput.
It is currently in draft form, as it is based on #4172. This will also fail CI at the moment since it has not been updated with the new XDR changes.
TODO:
Additional test for entries that should be evicted in ledger N but are recreated during N tx application.DoneGraceful shutdown for background eviction thread.Not required, background eviction scan is sufficiently short lived.Rebase on updated XDR.DoneChecklist
clang-format
v8.0.0 (viamake format
or the Visual Studio extension)