-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Light Client refactoring #237
Conversation
Just to ensure that I understand what we are trying to solve here. The previous light-spike defined the light-client process as an iterator where each iteration would handle a single event processed by a single component and multiple iterations were necessary to complete a “flow”. This granular process made it very testable but hard to read as your brain had to keep track of multiple iteration to see the whole flow. This is definitely a valid criticism. We defined it inductively which can be a challenge to read. If it takes three iterations through three different components to get anything meaningful done then we constantly have three things in our head. In this version we get rid of the iterative nature and just call a function which passes control through the different components. It seems similar although clearer than the current version on master. As you mentioned, we keep the components coupled here which I would say is a significant downside. I don’t know if async would help here or if we need more complex channel communication. Instead I think we just need to solve the granularity problem directly. With a demuxer (as outlined in ADR-006 which handles state as well routing control between components. This way we can have clear separation of concerns and execute entire flows as a single synchronous function calls. I hope this approach might be the best of both worlds in terms of testability and readability while using minimal language features. |
Here's take 3: https://github.com/informalsystems/tendermint-rs/tree/romac/light-spike-chan/light-spike This version uses async channels to decouple components, but can still be run synchronously on a single thread. Doing this gave me an idea for decoupling components without relying on async-await, so expect yet another take on this soon. |
This version removes all async/await from take 3, and replaces the |
Just updated this PR to take 4. |
e233d8d
to
0e58ab3
Compare
|
||
pub struct ProdIo { | ||
rpc_clients: HashMap<PeerId, rpc::Client>, | ||
peer_map: HashMap<PeerId, tendermint::net::Address>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have the abstraction of peerIDs here from the onset 👍
#[pre(light_store.latest(VerifiedStatus::Verified).is_some())] | ||
#[post(valid_schedule(ret, target_height, next_height, light_store))] | ||
pub fn schedule( | ||
light_store: &dyn LightStore, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to pass the whole store if all we need is the latest trusted height?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to match the spec, as @josef-widder suggested the scheduler could perform some optimizations as to what height to verify next based on what headers of nearby heights are already available in the store, but this is not implemented yet.
// ----------------------------------------------------------------------------- | ||
// Everything below is a temporary workaround for the lack of `provider` field | ||
// in the light blocks serialized in the JSON fixtures. | ||
// ----------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @greg-szabo @Shivani912 this is relevant to testing / serialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be implementation specific to me. Go code has a different approach to identify a node as a primary/secondary provider and does not deal with addresses. But also, this approach looks cleaner and reduces a lot of data from test files. Will need some more work around testing of Go code though. Need more thoughts here! @greg-szabo @liamsi @melekes
prost-amino = "0.5.0" | ||
contracts = "0.4.0" | ||
sled = "0.31.0" | ||
serde_cbor = "0.11.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet another binary encoding 😱 😄 I assume this is the most reasonable choice to use in combination with sled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit I didn't give it too much thought. I initially considered just serializing keys and values to JSON and then to bytes, but that seemed a bit wasteful, so I figured using a dedicated binary encoding was better. In the end, aside from the extra dependencies, the choice of encoding does not matter since it is internal to the light store and specific to the choice of sled
as a database. But I'd be happy to discuss alternatives, either for the binary encoding, or for sled
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal but couldnt prost work here for proto3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks amazing ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great. Let's do the rest in the follow ups 👍
Closes: #230
Closes: #229
Closes: #174
Here's take 2 of the light client spike.
This version follows pretty much the same decomposition as in the ADR (minus the Demuxer for lack of time), and I find It much more readable and understandable than the former one. It also drops the predicate library in favor of plain functions returning Result<(), Error> packed in a trait.
The big downside currently is that the components are now coupled (ie. the Scheduler has a reference to the Verifier and the Rpc components).
I haven’t found a way to decouple them without having to extract some of the logic into what would be the Demuxer router/event-loop.
I believe that the only way to get both decoupling + simple control-flow would be to go with async/await and have components communicate via channels with the Demuxer, and wait for specific responses. I will try to sketch something about that this weekend so that we can discuss both versions on Monday.