-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert swingstore from LMDB to Sqlite, phase 1 #6561
Conversation
I'm wondering, does that mean that the behavior of node with the PR applied, and one without would be 100% consistent (long term deterministic)? Even without this, how confident would we be in applying this change to |
There are three answers to this, which are respectively, yes, no, and it doesn't matter. Yes, I've verified that the sequence of database operations are the same and that the action hashes match. No, there is an exception to the action hashes matching, which is when there is an action in the crank containing a bundle reference and the bundle hash has changed because the code has changed. And of course changing the database engine is a code change. On the other hand, I observed this with tests that themselves had changes, and it's possible that since the database code changes are all in the kernel we'll be fine if the kernel bundle does not appear in any action hash (which, long term, we need it to not do if we want kernel upgrade to be possible). @warner? And it doesn't matter (I think) because IIUC we'd be switching database engines as part of the bulldozer upgrade. |
Right! This is an interesting case since this change has an impact on the kernel, mostly due to the swingstore api refactoring from what I gather. I thought however that we removed the kernel bundle from being saved in the DB? Or did this somehow impact the liveslot/supervisor bundle?
I was wondering in the context of potentially having to deal with state-sync before a bulldozer upgrade. It sounds like with some effort, we might be able to swap the DB implementation for release-pismo. In that case we'd need to do other surgery, e.g. extract local XS snapshot hashes and move them in a different section of the DB (after verifying they're deterministic of course), so it'd definitely not be for the faint of heart. |
85fd98a
to
4ef54e2
Compare
Short take: this looks great. I'll start reviewing properly now. We no longer store the kernel bundle anywhere. We still need to build a kernel bundle (because the kernel runs in its own Compartment, and I'll think about whether this could cause visible behavioral changes as I review. My intention was that we make this switch without worrying about such changes, especially because of my goal to replace @FUDCo and walked through what the crankBuffer commit/abort replacement will be (which can happen after this lands). We concluded that we will:
And @FUDCo points out:
|
c90d991
to
2e24974
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew, ok sorry that took a while. Some small changes to make, some items for me to investigate further, and a few things to discuss.
Oh, an update on the WAL-mode and "checkpoint" operations. I just re-read https://sqlite.org/wal.html#performance_considerations , and realized there's a difference between the It says that if If we set Once we do that, we can stick to automatic checkpoints. These will be opportunistic: if a reader has a transaction open (e.g. to copy data out of the DB, maybe for state-sync purposes), the checkpoint/WAL-compaction might have to stop, but it will pick up where it left off at the next opportunity, and the application doesn't need to know about it. So:
|
f2643e2
to
47f1aba
Compare
@warner ready for re-review pending your final analyses of the test changes. |
Ok, that looks good. Maybe rebase one last time. Thanks! |
…h into swing-store
The refcount changes in "createVat holds refcount" were correct, but the comments gave the wrong reason. The original version (on trunk) was wrong, but happened to work because of a failure-to-commit bug/omission in kpResolution(). After using `c.kpResolution(kpid1)`, the refcount is indeed 3: one from v1-bootstrap, one from the pin added by kpResolution, and one from the kpid1 resolution value. Note that kpid1 itself has a zero refcount by this point, but has not yet been collected, because we only call processRefcounts() at the end of deliveries, not after the decrefs performed by kpResolution. This is arguably a bug, but fortunately we only call kpResolution() from tests. The original version thought the refcount was 2, and the test only passed because the incref performed by kpResolution was still sitting in the crankbuffer, and the test code looked directly at the kvStore (not the crankbuffer-wrapped version that kernelKeeper uses). The crankbuffer had refcount=3, the DB had refcount=2, and the test asserted that it was "2". The test comments didn't take into account the reference from kpid1 (and probably assumed that kpid1 was retired by that point). The subsequent delivery allowed the crankbuffer to be flushed (incrementing), but also allows processRefcounts() to run, which removes the kpid1 resolution value reference (decrementing), making it look like there was no net refcount change. After switching to SQLite and removing the crankbuffer, the test is correctly seeing the incref added by kpResolution, so it must assert that the count is 3. But the extra refcount didn't come from the 'getHeld' call: v1-bootstrap only has a single c-list entry for 'held', not two as the comments implied (even if multiple objects or Promises within v1-bootstrap held a reference, they all share a single valToSlot and c-list entry).
47f1aba
to
59de3ab
Compare
closeStreamStore(); | ||
await doCommit(true); | ||
await db.close(); | ||
commit(); | ||
db.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the behavior on close from abort
to commit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, huh, yeah the previous true
in doCommit(true)
triggered an LMDB abort. I must have missed this during review (also we don't have any tests of abort-on-close()
, and we never use it that way, applications just crash instead of calling close
). Yeah, we should make this abort.
txnFinish(abort ? lmdbAbort : undefined); | ||
return Promise.resolve(txnDone) | ||
.then(() => { | ||
trace(`${abort ? 'abort' : 'commit'}-tx`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This trace
was not kept in the change
This PR is the first phase of the conversion from LMDB to Sqlite (in partial satisfaction of #3087).
This PR includes the replacement of the underlying database engine and the downstream changes that flow from it. This is done in a series of four commits (individually reviewable for clarity), each of which realizes a different step of the transformation.
Step 1 - convert from LMDB to Sqlite, using Sqlite to realize a dumb key-value store that replicates the former LMDB semantics exactly, retaining all of the supporting implementation that assumes a dumb key-value store with LMDB's transaction model.
Step 2 - eliminate the
ephemeralSwingStore
implementation, used for testing, that realized the dumb key-value store in memory using a JavaScript Map object, replacing it with Sqlite's:memory:
pseudo-file which does essentially the same thing (i.e., stores everything ephemerally in RAM) but does it using Sqlite directly.Step 3 - use Sqlite savepoints for crank-level commits (rather than an in-memory change buffer), with full Sqlite transactions for block-level commits. This eliminates a couple of layers of wrapper objects. A necessary consequence of this is that the crank activity hash is now computed as part of the swingstore itself.
Step 4 - eliminate the rest of the storage wrappers, making the swingstore essentially self contained.
The system that results after each step is fully functional in a way that is compatible with the prior behavior of SwingSet.
If this PR is merged with the master branch, the result will be entirely usable.
I am pushing this PR now in order to get review started. However, there is more to be done. In the broadest possible strokes, this remaining work falls into two categories:
Short term changes to
SELECT
result into RAM before using it (which the implementation in this PR does but which exposes us to the risk of an unbounded memory suck from adversarial vat code), replacing them with something more like the vatstore'sgetAfter
machinery.Long term changes that exploit the capabilities of the SQL query mechanism in various ways. These include but are not limited to bulk operations for data deletion, improving GC of on-disk data, and removing the vatstore from the consensus SwingSet state. Most of these will require SwingStore API changes or additions, which argues for taking some care planning them before proceeding, hence their absence from this PR. In addition, we expect further exploration/experimentation to identify further optimization opportunities.