-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add "flight recorder", triggered by consensus failure #3742
Comments
The next proposal and results of the vote is what succeeds is what decides if the committed-to-store AppHash is in consensus, or if the node needs to halt. It would be even more useful if somehow the current state (Tendermint, Cosmos, and JS) wasn't committed until the AppHash was voted to be in consensus, sometime when the AppHash and the next block's transactions were confirmed before the |
That's probably the definition of "equivocation". Too bad. |
Rolling slog file with some retention is a possible short term solution. We can record crank hash for correlation. Could also time slice kernel database (but that is hard, maybe MN-2 or MN-3). |
One part is instructing the node to write the slog file, which we have an env variable for, can point to file or Unix named pipe. Could pipe to a tool which knows how to roll the files. Small code changes on the cosmic-swingset side (when build controller, here is a function to write to the slog file). Michael: would like to use slog hooks. Need to give instructions to validators in terms of best practicers (or give them a start script?). Initially, we should default to collection / rolling. |
I'll add a I might hijack |
We include the crankhash in the consensus state, which captures all changes to the kvstore. This ought to capture any kernel divergence within a single block, as well as any vat divergence that is visible as syscall divergence. Our validator guide instructions should direct most validators to retain a day or two of slog data, and to send us that data if their node ever falls out of sync. Then we can build a tool to correlate these slogfile fragments and identify the point of divergence. @michaelfig points out that in the longer run, we really want to be able to snapshot kernel kvStore (swingstore) state, for the benefit of "state sync" as applied to kernel state (#3769). If we had that now, we could ask the out-of-sync validators for a copy of their entire kernel DB, and compare those. But 1: we don't, and 2: the slog data is probably a lot smaller than the whole kernelDB. |
If we can allow validators to write slog entries out to files, and preserve so we can figure out failures. |
We should default to slog files are on, into a different file every hour, and the Validator can use whatever tool they want to not overflow disk. Cosmic Swingset needs to take an arg as to where the slogs should go. |
From today's meeting, our conclusion was:
We'll probably need to write some tools to compare two such files, remembering that every validator will be rotating the files at different times. We can probably defer writing those tools until we need them. We don't need any swingset-specific functionality for this: the host application above swingset (eg Given the inclusion of the crankhash in each block, it seems likely that any kernel divergence will be detected within a single block, so only the slog data from the last 10-ish seconds is likely to do any good. Knowing when the kernel was last restarted may be useful too, that is recorded in the slogfiles too. If divergence happens shortly after reboot, then the last few minutes of data may catch it. The case that might not be caught is if rebooting is the root cause, but the diverging vat doesn't get used for quite a while. In any case, we'd start with the most recent minute of slog data and then work backwards if necessary, so it'd be nice if we could go back to the validator and ask for older data. |
@michaelfig is working on a change to add a "write slog object" hook to the kernel, which is what cosmic-swingset will use to get at the slog data. This plan puts most of the work on the cosmic-swingset side, so I'm reassigning to mfig. Since he mentioned that Node.js seems to make immediate (non-async) We'd need to write a tool to read the circular buffer from the correct offsets and emit the slog records in the right order, but that should be trivial. The trickiest part I imagine is representing this within Node.js. Maybe with some sort of magic SharedArrayBuffer?? Another thought is to write the slog records to a pipe, and put a C/Rust/etc program on the other end of that pipe, which knows how to |
What is the Problem Being Solved?
The last three consensus-failure divergence bugs we've found were resolved by comparing the final block's slogfile data between the failing validator and one of the surviving ones. With the #3442 activityhash included in the Cosmos state, the divergence is discovered before moving on to the next block, and the slogfile records the divergent cranks even though Cosmos does not (necessarily?) commit the failing block, so the slogfile has always (so far) included fairly precise evidence of the nature of the divergence.
We need to be able to get this data from validators that aren't collecting slogfiles all the time.
I'm thinking of something like an airplane's "black box": the flight recorder that captures the last N minutes of activity, and is designed to survive a crash. For us, this would capture the slogfile entries of the most recent block. Once the validator proceeds past the point of a potential consensus failure, we no longer need to retain that data.
Description of the Design
I'm not sure about the cosmos-sdk code paths involved, but I'm thinking something like this:
flightRecorder
, which sits next tokvStore
andsnapStore
andstreamStore
flightRecorder
as wellflightRecorder
holds these entries in RAM, and deletes them uponcommit
.time
fields) and include$HEIGHT-$HASH
in the instructions, so that anyone else experiencing that same problem will get the same string and will know it's a duplicateI can think of a couple of reasons why we'd need to implement this differently:
commit
before checking the AppHash? If so,commit
would delete the flight recorder data too early. We'd need to add some additional APIs to manage the lifetime of that data.fsync
or two. To avoid this, it would be fine (and slightly preferable) to exclude the flight recorder file from thisfsync
, if possible, because we're not worried about the data being preserved against sudden kernel shutdown, merely against sudden process shutdown. As long as the flight recorder file writes are not buffered in userspace, and actualwrite(3)
syscalls can be buffered in the kernel without problems, as they'll still be written to disk when the process exits.systemd
-type supervisors)cc @michaelfig @JimLarson for thoughts about how to integrate this into cosmos-sdk
cc @FUDCo for thoughts about SwingStore (we should probably refactor/consolidate SwingStoreLMDB first)
Security Considerations
The flight recorder data contains the internal processing steps triggered by the shared input/transaction stream of a consensus-based SwingSet. A solo swingset would consider this to be private: even if the externally-supplied messages are known to another party (at least to the sender), the internal consequences are not. But in a consensus swingset, this data is known to all the other members.
So we should make it clear that the flight recorder data is privacy-equivalent to the chain's internal state, and should only be shared with someone else who could already know that state. Practically speaking, this is everybody, because chains are transparent. But when we add this feature, the swingset-side API should make a stronger statement, and the cosmic-swingset/chain -side should be where we acknowledge that the contents are effectively public.
Test Plan
We can write unit tests for the SwingStore portions. I don't know how to test the cosmic-swingset side. Ideally we'd have a unit test that somehow triggers the consensus failure and examines the output message for the "please upload the black box" string, and examines the contents of the named file, but I imagine that test would be hard to write. Maybe @michaelfig or @JimLarson have some better ideas.
The text was updated successfully, but these errors were encountered: