Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evict Temporary Entries #3874

Merged
merged 16 commits into from
Aug 24, 2023
Merged

Evict Temporary Entries #3874

merged 16 commits into from
Aug 24, 2023

Conversation

SirTyson
Copy link
Contributor

@SirTyson SirTyson commented Aug 8, 2023

Description

Resolves #3784

This PR evicts expired TEMPORARY entries. When evicted, entries are deleted from the BucketList (and DB if enabled) and a new kind of meta is emitted to assist downstream systems with garbage collection.

While the eviction scan portion is well tested, I still need to add one more unit test to check the actual meta emitted. This turned out to be non-trivial and is coming soon. I also removed network config's lazy loading. Previously, network config were only loaded from disk if an upgrade occurred. Now that the network config stores an iterator that is updated on each ledger close, this lazy loading no longer makes sense since every ledger is essentially an upgrade. Ideally, we would break network configs and this sort of "validator metadata" (BucketList size snapshots and Eviction iterators) into a new LedgerEntry type, but this would require and XDR change and probably doesn't fit in our timeline.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

@SirTyson SirTyson requested review from dmkozh, sisuresh and graydon August 8, 2023 09:37
@SirTyson SirTyson force-pushed the evict-temp-v2 branch 3 times, most recently from ce2ee2e to abd64eb Compare August 15, 2023 17:24
if (isSorobanDataEntry(le.data) && isTemporaryEntry(le.data) &&
!isLive(le, ledgerSeq))
{
if (ltx.maybeEvict(le))
Copy link
Contributor

@sisuresh sisuresh Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of modifying LedgerTxn to handle eviction, can you just -

  1. Load the entry from ltx.
  2. Verify that the entry in ltx is expired.
  3. Call erase.

I'm trying to reason if LedgerTxn needs to track EntryChangeType. If you just want the meta for entries evicted, I think you should be able to created a nested LedgerTxn, call BucketManager::scanForEviction, and then get the changes from that, which should only contain the meta for the evicted entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I was having is that whenever I tried to do some sort of nested LedgerTxn, the meta changes appeared as if they were TX based changes, which is incorrect. I'm not super knowledgeable on the LedgerTxn system though, so if there's a better way to hack it with a nested LedgerTxn I have no objections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I have it working for the most part with a nested LedgerTxn, but testing is kinda weird. Another thing I noticed is that you assert for LEDGER_ENTRY_REMOVED, but when we scan to delete expired entries, we also emit a LEDGER_ENTRY_STATE (because I call erase in my version). The scan also updates the eviction iterator entry, so those changes will have to be ignored as well.

@sisuresh sisuresh mentioned this pull request Aug 18, 2023
6 tasks
Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really tidy! Just a small handful of nits, nothing huge.

@@ -88,6 +88,7 @@ class InMemoryLedgerTxn : public LedgerTxn
void eraseWithoutLoading(InternalLedgerKey const& key) override;

LedgerTxnEntry create(InternalLedgerEntry const& entry) override;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious diff, can remove

@@ -3,6 +3,7 @@
// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0

#include "herder/Upgrades.h"
#include "bucket/BucketList.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think spurious (no other edits in this file)

releaseAssert(mVersion == 2);
for (auto const& change : evictionChanges)
{
if (change.type() != LEDGER_ENTRY_REMOVED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you replace this if-with-embedded-if-else with a switch on change.type()? I can't really tell what the intent is of this logic.

uint32_t ledgerVers) override;
void transferLedgerEntriesToBucketList(
AbstractLedgerTxn& ltx,
std::unique_ptr<LedgerCloseMetaFrame> const& ledgerCloseMeta,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit but could we use a std::optional here rather than a unique_ptr-that-might-be-nullptr? Optional sends a stronger "some callers pass nullopt" message than unique_ptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but we already use std::unique for this type of interface in other places like processFeesSeqNums and applyTransactions. Also LedgerCloseMetaFrame stores XDR and is pretty big memory wise, and using std::optional would copy unless we also used std::reference_wrapper, which I'd rather not do.

@@ -441,6 +448,9 @@ class BucketList
// Returns true if at given `level` dead entries should be kept.
static bool keepDeadEntries(uint32_t level);

// Number of ledgers it takes a bucket to spill/receive an incoming spill
static uint32_t bucketChangeRate(uint32_t level, bool isCurr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to bucketUpdateInterval

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or "bucketUpdatePeriod" as you are using in the implementation comment)

{
#ifdef ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION
auto getBucketFromIter = [&levels = mLevels](EvictionIterator const& iter) {
auto& level = levels[iter.bucketListLevel];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: if you could avoid [...] and use .get(...), at least outside of fast paths, it does a bounds check and gives us an exception on failure rather than just scribbling on memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is a very general comment about "any access to a std::vector", not specific to this site)

src/bucket/BucketList.cpp Show resolved Hide resolved
evictionIter.isCurrBucket);
if (changeRate * scanSize < b->getSize())
{
CLOG_WARNING(Bucket,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume nobody ever reads warnings -- is this a temporary / harmless thing, or a problem that needs to be addressed by a config setting change, or something else? Should we surface this occurrence in a metric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be addressed by a config setting change and is most likely not temporary (essentially if you get this warning, you'll be stuck on the current level until a config upgrade). I can add a metric.

auto bytesRead = newPos - iter.bucketFileOffset;
iter.bucketFileOffset = newPos;
bytesScannedForEvictionMeter.inc(bytesRead);
if (bytesRead > bytesToScan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be >=, right? If you read 100 bytes and your goal was to scan 100 bytes then it's time to stop.

cfg.stateExpirationSettings().evictionScanSize >=
MinimumSorobanNetworkConfig::EVICTION_SCAN_SIZE &&
cfg.stateExpirationSettings().startingEvictionScanLevel >=
MinimumSorobanNetworkConfig::STARTING_EVICTION_LEVEL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and < BucketList::kNumLevels

// incoming_spill_frequency(i) = 2^(2i - 1) for i > 0
// incoming_spill_frequency(0) = 1
uint32_t
BucketList::bucketChangeRate(uint32_t level, bool isCurr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I believe this is likely correct, literally ever time we've done BL arithmetic in the past we've got it slightly wrong; could you add some explicit test vectors on this calculation? just like .. either fish out some specific spills that happened in the history and confirm that they happened this-number-of-ledgers apart, or write a test that just calculates these numbers and then drives the BL forward and checks that the spills happen at the frequency predicted?

@graydon
Copy link
Contributor

graydon commented Aug 24, 2023

r+ 5ba6c3a

@graydon
Copy link
Contributor

graydon commented Aug 24, 2023

r+ 8250b7b

@latobarita latobarita merged commit c5b96ca into stellar:master Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

State Expiration: Evict Temporary Entries
4 participants