Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Node services - Brainstorm #123

Merged
merged 9 commits into from
Mar 16, 2019
Merged

WIP Node services - Brainstorm #123

merged 9 commits into from
Mar 16, 2019

Conversation

wemeetagain
Copy link
Member

WIP/brainstorming on organization of the various BeaconNode services.

Some questions:

  • Is this the right direction to push? Are these the right level of granularity to start with?

  • Are we missing any important services?
    I didn't copy prysm's services exactly, because 1. I didn't really see what each service was there for, 2. we may have different needs. But for the sake of brainstorming, do we need eg: a sync service, an operations service? (both are prysm services not represented explicitly here)

  • How do we best handle dependencies / data sharing/passing between various services?
    You may notice, some of the services instantiated in BeaconNode (intuitively-speaking) should rely in some way on other services (eg: "the beacon chain relies on the db")
    I think we want rough guidelines/patterns for how data is passed/shared that help us keep our services reasonably decoupled and help us stay sane as the codebase grows
    eg: do we

    • pass global object into each service's constructor?
    • pass dependent services into each service's constructor? (this seems to be how prysm operates)
    • emit events listened to at the 'node' or another level outside the service? (this seems like it could jibe well with how js works, but requires more detail)

@wemeetagain wemeetagain requested review from Mikerah, ansermino, GregTheGreek, noot and a team March 4, 2019 03:25
@Mikerah
Copy link
Contributor

Mikerah commented Mar 4, 2019

I think we should able to use Node's built-in EventEmitter API. It'll give us enough flexibility for supporting async communication, makes the code quite easy to maintain and read and we don't need to build out extra support.

As for passing global objects, I think we should minimize this as much as possible. It could potentially cause issues down the line. However, there are perhaps some cases in which we should use this approach.

As for the services you have, I think you covered most of them. What I would add would be services for light clients like in the ethereum-js client you posted. In the future, we might need shard services.

@GregTheGreek
Copy link
Member

I'm in a time bind and I'll contribute more in little bit, but a few things that came to my mind:

Events:

  • I'm a fan of using the event emitter, we should look for benchmarks...
  • We should come up with an API for event's I'm worried about absolutely crushing the node thread
  • Spin off v10 (maybe v11) child process and use event emitters to transmit data? (not sure if thats feasible). If it were we could de-load the node thread and would be super cool.

Global Objects:

  • It depends on what we're passing, we can structure this sort of how the sharding system looks with beacon chain and shards, having data be routed from our root class to relevant services, but I think thats a performance nightmare. None the less Higher order structuring will be our success.

DB:

  • This for sure i think can be isolated somehow (child process et al) because lets be real. I'm not sure how many reads we will need

@wemeetagain
Copy link
Member Author

I think we should able to use Node's built-in EventEmitter API. It'll give us enough flexibility for supporting async communication, makes the code quite easy to maintain and read and we don't need to build out extra support.

👍

As for passing global objects, I think we should minimize this as much as possible. It could potentially cause issues down the line.

Agreed, I threw it out there hoping there would be some consensus against going that route.

What I would add would be services for light clients like in the ethereum-js client you posted.

Good to keep in mind. Current eth2.0 light client proposal
I think we should focus on building out a full node for now, but make sure to keep our architecture open to light clients. If we keep things clean, we should be able to swap out the correct chain/syncing services once the spec is more locked down.

In the future, we might need shard services.

Oh yeah, the validator isn't represented anywhere here yet. I'm hoping that we can design that as a separate module that communicates w the beacon chain via rpc or some other such mechanism (eg: a MessageChannel/MessageChannel if they're running in different workers in the same deployment/process).

I'm worried about absolutely crushing the node thread

I feel like we should start simple, just assuming the single thread, and move to a more complicated multithreaded setup once its clear that we need it. I think we will probably need it, but I also think it will be easier to iterate on the codebase while its simpler. The needs of a multithreaded system will become more clear once we've gotten further. Also should give us time to research the relevant APIs and figure out how to best implement such a system for our needs. If we avoid gratuitous use of shared global objects between services and are careful about how we link dependent services, it should be a straightforward process.

Spin off v10 (maybe v11) child process and use event emitters to transmit data? (not sure if thats feasible). If it were we could de-load the node thread and would be super cool.

For multithreading, I think we should target use of Workers/Workers. The API looks the same between node and the web. I think in conjunction w the MessageChannel api linked above, we should be able to pass data pretty efficiently between worker threads.

@Mikerah
Copy link
Contributor

Mikerah commented Mar 4, 2019

One thing I would like to note is to perhaps take into account future dev tooling. I know this is super early but if we can design this to be quite modular and easy to use, then building out future dev tools become less of a hassle. However, unless you have a crystal ball, this will be very hard to predict and we can only try our best while designing out the architecture.

@GregTheGreek
Copy link
Member

Spin off v10 (maybe v11) child process and use event emitters to transmit data? (not sure if thats feasible). If it were we could de-load the node thread and would be super cool.

For multithreading, I think we should target use of Workers/Workers. The API looks the same between node and the web. I think in conjunction w the MessageChannel api linked above, we should be able to pass data pretty efficiently between worker threads.

Ah yes, thats actually what I was referencing. I'm very Pro to your earlier comment about not wanting to get too complicated.

One thing I would like to note is to perhaps take into account future dev tooling. I know this is super early but if we can design this to be quite modular and easy to use, then building out future dev tools become less of a hassle. However, unless you have a crystal ball, this will be very hard to predict and we can only try our best while designing out the architecture.

Thats ultimately the idea behind spinning off these "services"

@wemeetagain
Copy link
Member Author

or the sake of brainstorming, do we need eg: a sync service, an operations service?

I asked rauljordan about those services in the prysmatic discord:

Q:

whats the conceptual difference between the operations, sync and blockchain services?

A:

Sync handles everything related to receiving objects via p2p from other nodes
Such as blocks and the state
The blockchain service deals with processing incoming blocks, advancing a state transition, and applying the fork choice rule to update the chain head
The operations service deals with storing and retrieving block operations, which are the data contained within a beacon block’s body. They are attestations, validator exits, slashings, and more
We need some sort of pool like a tx pool to store them in, so the operations service is like that

@wemeetagain
Copy link
Member Author

Regarding a multithreaded / worker-based solution, what's tricky/interesting to think about is how to best (most ergonomically + efficiently) send data across the worker boundaries.

Each worker is a separate thread, but it seems that memory between workers isn't readily available, only available through the MessageChannel / MessagePort apis. Eg: data is sent using port1.postMessage(data) and data is retreived using port2.onmessage = function(event) {/* event.data */}. Both port1 and port2 can send and receive data.
Note that data is only passed asynchronously, and in a single-thread context, we usually get both a sync + async interface.

Data is mostly just copied/cloned across the channel, but ArrayBuffers (and MessagePorts) can be "transferred" across the channel (copied by reference, but no longer available in the source worker). In fact, in order to set up a data transfer using a custom MessageChannel, the first step is to transfer the channel's ports to the workers to be communicating.

MessageChannels only work for 1-to-1 communication, and can't be used to broadcast data to multiple workers at once, a channel to each worker would need to be opened.
One nice feature of the prysm codebase is their ability to subscribe/broadcast messages across a "feed" to send data between different services.

Another comment rauljordan made:

Yep we use event feeds and subscriptions
For communicating often across the repo
Theyre nice because when you broadcast to a feed you dont need to care about whos necessarily subscribing
So a feed can be used by multiple services that need an item

There does exist a BroadcastChannel which exposes a handle which can broadcast across multiple workers at once, but I don't really recommend relying on it directly, bc there is not good browser support yet and its not implemented in node (there's also a broadcast-channel npm package, but I don't really like the solution presented there).

We would probably want to research/develop a solution that provides a unified interface for broadcasting/subscribing that uses BroadcastChannels and falls back to MessageChannels.

@GregTheGreek
Copy link
Member

Hmm that is interesting. ... none the less I'm a fan of doing it naively, as long as its extremely modular

@Mikerah
Copy link
Contributor

Mikerah commented Mar 11, 2019

Another suggestion I got was from @jacobheun of libp2p. Instead of using node's EventEmitter API, we can use async iterators.

After doing a bit of research, it seems like the main advantage (depending on the context) of using async iterators over events is that async iterators uses a pull strategy and events use a push strategy. It is also more efficient in that if the generator is not being currently consumed, then it uses less resources. However, I'm not sure how this would look like with @wemeetagain's suggestion to use worker threads.

Any thoughts?

@wemeetagain
Copy link
Member Author

Not super familiar with how async iterators are created/used in practice, but I like the idea of using async iterators where we can.
Some thoughts:

  • I think that most/all of our events of our services will be consumed all of the time, so it may be less an issue of efficiency and more an issue of aesthetics.
  • We may want to 'broadcast' 'events' to multiple subscribers, allow async data to be used by multiple consumers. Eg: Perhaps both the rpc service and the sync service subscribe to a chain service new processed block feed.
  • We may want to support multiple 'event' types. Eg: the eth1 watcher should notify on new deposit logs and on new incoming eth1.0 blocks.
  • If we go the worker route, we are bound to communicate between threads using the MessagePort API, which is is EventEmitter/EventTarget based.
  • Awesome Endeavour: Async Iterators ipfs/js-ipfs#1670 (comment) and below have good libraries for working with async iterators, also the parent post has good info on async iterators

@wemeetagain wemeetagain marked this pull request as ready for review March 12, 2019 15:52
@codecov
Copy link

codecov bot commented Mar 12, 2019

Codecov Report

Merging #123 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #123   +/-   ##
=======================================
  Coverage   81.56%   81.56%           
=======================================
  Files          17       17           
  Lines         716      716           
  Branches       36       36           
=======================================
  Hits          584      584           
  Misses        130      130           
  Partials        2        2
Impacted Files Coverage Δ
...nChain/src/chain/helpers/stateTransitionHelpers.ts 38.74% <ø> (ø)
...tests/chain/helpers/stateTransitionHelpers.test.ts 100% <100%> (ø)
beaconChain/tests/utils/attestation.ts 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd8b4a2...bc83ea4. Read the comment docs.

@wemeetagain
Copy link
Member Author

I moved this PR from draft so we could try to get this skeleton code merged.
Hopefully it will be helpful for unblocking/providing rough structure for folks to start building out these underlying services.

RE worker threads:
It sounds like we have some consensus on starting simple and not going that route initially until we have a better sense of what our needs are.
I think its still useful to think about some questions surrounding it as we're building things out.

  • where do we want/need separate threads?
    • every service? certain services? certain hot tasks? not at all?
  • when identifying potential places for threads
    • how is data currently transferred into and out of that area?
    • what's needed to convert into a message passing / MessagePort style?
  • what sorts of patterns do we need wrt how we use MessagePorts?
    • do we need to broadcast messages? have a request/response -ish system?
    • are certain ports single purpose? multiplexed w messages of different types?

@wemeetagain
Copy link
Member Author

does anyone else want to review/approve this PR?

@wemeetagain
Copy link
Member Author

Gonna merge this. This stuff is not set in stone, just some skeleton code to help give a little more structure to the project.

@wemeetagain wemeetagain merged commit 6a67b5a into master Mar 16, 2019
@wemeetagain wemeetagain deleted the cayman/node0 branch March 16, 2019 18:00
wemeetagain added a commit that referenced this pull request Aug 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants