Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Support running full nodes with overseer enabled #4763

Closed
bkchr opened this issue Jan 21, 2022 · 8 comments
Closed

Support running full nodes with overseer enabled #4763

bkchr opened this issue Jan 21, 2022 · 8 comments
Assignees
Labels
I8-refactor Code needs refactoring. T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.

Comments

@bkchr
Copy link
Member

bkchr commented Jan 21, 2022

For parachains it would be nice to run full nodes with overseer enabled. We don't require a full overseer running, some stripped down version would be okay (mainly with networking and what is required for this). With paritytech/cumulus#927 we will require the overseer for full nodes to be able to recover povs. Currently that works because we accidentally have polkadot nodes always run as collators when run inside a parachain node. However, we would like to solve this properly.

CC @drahnr

@rphmeier
Copy link
Contributor

We could start to make subsystems aware of the type of node they're embedded in (validator, collator, regular full-node) so that they can internally disable code-paths that aren't relevant or are too heavy for the context. It might also let us finally remove the collator_side / validator_side distinction in collator-protocol.

@rphmeier rphmeier added the I8-refactor Code needs refactoring. label Jan 21, 2022
@drahnr
Copy link
Contributor

drahnr commented Jan 27, 2022

I assume we would separate the subset functionality based on the role a node has.

If so, this would be rather simple to integrate and actually simplify the overseer initialization in the polkadot package, since we could pass the role to the impl OverseerGen to which is then responsible to setup a concrete overseer instance.

One caveat: Something needs to drain the channels or they will eventually overflow, depending on which subsystems are being removed / left in place. If a separate overseer struct is intended for different roles, this could also work, yet increase work to integrate with the malus/nemesis implementations.

@drahnr drahnr self-assigned this Jan 27, 2022
@bkchr
Copy link
Member Author

bkchr commented Jan 27, 2022

Something needs to drain the channels or they will eventually overflow,

Can you not just have some dummy subsystem that drains the channels?

@drahnr
Copy link
Contributor

drahnr commented Jan 27, 2022

You can and that'd be the easiest, except for the case where backchannels are used. Then it depends how the cancellation errors are handled, but this has to be investigated if there are any such subsystems.

@rphmeier
Copy link
Contributor

If so, this would be rather simply to integrate and actually simplify the overseer initialization in the polkadot package, since we could pass the role to the impl OverseerGen to which is then responsible to setup a concrete overseer instance.

The fact that we have to drain channels with 'dummy' subsystems in certain places is a byproduct of trying to do runtime polymorphism this way. i.e. role at runtime determining the set of subsystems at runtime. I think it might still be easier to have all subsystems decide internally how to behave depending on role.

@bkchr
Copy link
Member Author

bkchr commented Jan 28, 2022

FYI:

As part of paritytech/cumulus#545 we will probably implement some subsystems in a way that they can work over rpc. This will mainly be required for runtime api and chain info. The idea is that we still run the overseer etc in the process of the parachain node, but in the best case only with the subsystems that we need. So, for a full node with networking, availability recovery and all the other subsystems that are essential for both of them (probably runtime api and chain state). For collators we will require the same subsystems as for full nodes + collation generation.

The idea is that we talk to a normal Polkadot full node via RPC, but don't want to introduce any new RPC apis or whatever for doing the collation sending sending for example.

CC @skunert he is the one implementing it. He will also come back with more request if something is missing or whatever.

@drahnr
Copy link
Contributor

drahnr commented Jan 28, 2022

@bkchr @skunert happy to give guidance as needed. As a side note, we area also using the OverseerGen to setup malicious code with "hacked" subsystems, so there is already precedent code how to modify subsystems as needed.
Adding additional subsystems is simple too, as long as we assure a) the channels are drained and b) the default behavior does not change.

Not really sure what the expectations are here for me or what work items there are left to do on my end?

@bkchr
Copy link
Member Author

bkchr commented Jan 28, 2022

Not really sure what the expectations are here for me or what work items there are left to do on my end?

The current service setup still runs the overseer only for collators/validators. We will also probably never really remove running the polkadot node inside the parachain node (at least not for the foreseeable future), so the initial feature request is still somewhat valid.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I8-refactor Code needs refactoring. T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.
Projects
None yet
Development

No branches or pull requests

4 participants