Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for Minimal Relay #32

Merged
merged 20 commits into from
Oct 22, 2023
Merged
Changes from 19 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 196 additions & 0 deletions text/0032-minimal-relay.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# RFC-0032: Minimal Relay

| | |
| --------------- | ----------------------------------------------------------------------------- |
| **Start Date** | 20 September 2023 |
| **Description** | Proposal to minimise Relay Chain functionality. |
| **Authors** | Joe Petrowski, Gavin Wood |

## Summary

The Relay Chain contains most of the core logic for the Polkadot network. While this was necessary
prior to the launch of parachains and development of XCM, most of this logic can exist in
parachains. This is a proposal to migrate several subsystems into system parachains.

## Motivation

Polkadot's scaling approach allows many distinct state machines (known generally as parachains) to
operate with common guarantees about the validity and security of their state transitions. Polkadot
provides these common guarantees by executing the state transitions on a strict subset (a backing
group) of the Relay Chain's validator set.

However, state transitions on the Relay Chain need to be executed by _all_ validators. If any of
those state transitions can occur on parachains, then the resources of the complement of a single
backing group could be used to offer more cores. As in, they could be offering more coretime (a.k.a.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen this argument crop up but I will comment to lay out the counterarguments:

  1. This presumes that the resources spent on including user transactions in blocks can necessarily be spent on adding more cores.
  2. If (1) holds, this presumes that work done on cores will necessarily be more valuable than work done directly on the relay chain

re: (1) Current configurations specify resource requirements for cores (in data / compute) which are discrete and this implies that there is a high likelihood of some leftover resources which are sufficient to execute user transaction but insufficient to add another core. Furthermore, the actual data burden on the network can be estimated by the sizes of PoVs placed onto cores in recent blocks.
re: (2) Synchronous composability is often more valuable than asynchronous

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some level, I do think cores could usefully consume otherwise spent on user transactions. We do not necessarily have a straight line there, but it's still all bandwidth, cpu time, and disk space. The question is more: How much resources really?

I think the synchronous vs asynchronous dichotomy winds up being largely illusory here. We do know some computations parallelize well and some badly, ala npos, but that's not really your dichotomy. All of web2 is built upon sharded data models aka what you call asynchronous. It's true that sharded data models require more thought, with this RFC being one example, but they're so dominant in web2 that they're probably not blocking adoption much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really cannot see any reasonable argument to intentionally put work on an obvious bottleneck point of the network which can be moved into a non-bottleneck point.

Synchronous composability specifically on the Relay-chain, whose entire point under CoreJam is to join async compute results and accumulate state machine transitions, does not sound sensible to me at all.

I've said it before and I'll say it again: The Relay-chain should do one thing and do it well. That means being the head of a secure decentralised multicore computer. Sticking an EVM smart contract in there and/or continuing to allow end-user-application-level transactions and general interaction goes directly against this mantra.

blockspace) to the network.

By minimising state transition logic on the Relay Chain by migrating it into "system chains" -- a
set of parachains that, with the Relay Chain, make up the Polkadot protocol -- the Polkadot
Copy link
Contributor

@rphmeier rphmeier Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid mixing the general desire to take functionality off of the relay chain with an implementation which gives each function its own chain. We should consider offloading each of these components onto a single chain for better synchronous composability between them. Of all the components currently being bundled into system chains, the only one I see a strong case for having its own is staking, as nomination, elections, and slashing are quite heavy.

The underlying goals are to maximize secure blockspace and effectively use blockspace. Anything which requires less than a single full core will have to either use coretime less frequently (poor system UX) or will use the core constantly but far below its capabilities (resource misallocation).

Copy link

@burdges burdges Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we place balances + staking + governance together on one parachain. We'd still handle era points on the relay chain perhaps, but not dots. We should figure out if this is possible though, aka how governance controls the relay chain, and how era points and slashing work.

Accumulate being quite flexible and permissionless is one of the two new ideas in corejam. Accumulate becomes much safer if accumulate cannot perform logic based upon dot balances, but only based upon authorization conditions set by the parachains themselves.

I'd envision nominations using a chain-forked pattern whatever we do, i.e. the staking chain spins off an nomination chain, to which validators add their opinions. This requires running parachain blocks longer.

The other of the two new ideas in corejam is to have blocks run a long time by saving their memory state. Although nice, we think this cannot really help with state migrations or npos, due to how memory accesses work. We can likely use other techniques to do longer running npos computations on parachains though, like simply allowing them to run longer.

Anyways, if we "eat our own dog food" by moving dots off the relay chain, then we make more time available for user defined accumulate functions, reduce the number of dumb things they do, and establish patterns for them to follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that requires non-uniform work package execution times/weights seems impractical. It should be possible to extend computations over multiple blocks, by paging in the relevant data needed for each step of the algorithm at any point.

Yes, spinning off child chains/tasks for heavy computation such as elections or governance vote-tallying makes a lot of sense, especially since parachain execution is already stateless. The System product benefits a lot from having as many user-exposed functionalities as possible living in the same chain, especially when there is no clear impetus to scale horizontally.

@joepetrowski

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once two features are in the same chain, users and applications get very accustomed to their being in the same chain, which makes it difficult to split them into multiple chains. I understand that with today's usage, many of these subsystems could all go into a single system parachain. However, as usage and requirements increase (e.g. several thousand validators), it will hit a bottleneck and need to be split.

IMO it makes more sense to keep these subsystems in separate chains but take advantage of core scheduling to handle resource allocation. We could use one core to process staking, governance, and identity in round-robin fashion (perhaps prioritizing staking at certain times like session changes). You mention degraded system UX but we're still talking 18 second instead of 6 second block times. When a single core no longer meets the execution needs of a subsystem, it's much easier and more agile to allocate a dedicated core to that subsystem than it is to launch a new chain and migrate all the data/logic to it (plus getting parachains to re-address their XCM programs, applications to target new chains, etc.).

RE spinning off child chains for heavy computations, yes this is reasonable but I think it's an optimization. The first priority is to get this work off the Relay Chain. Then we can optimize with things like child chains.

Copy link
Contributor

@rphmeier rphmeier Oct 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention degraded system UX but we're still talking 18 second instead of 6 second block times

With the amount of traffic that an identity chain currently handles, it'd take weeks to make a full block. UX challenges for users coordinating across many chains notwithstanding, even wasting 1/3 of a core on identity is clearly trading off sharding + lowish latency over consolidation and low latency.

I see the case for being inefficient with coretime in the face of a coretime surplus, though I expect that the pressures under load would point back to consolidation and the practical experience for end-users of navigating many chains tips it in favor of consolidation now for me.

Full support of moving these pallets off of the relay chain.

Ubiquitous Computer can maximise its primary offering: secure blockspace.

## Stakeholders

- Parachains that interact with affected logic on the Relay Chain;
- Core protocol and XCM format developers;
- Tooling, block explorer, and UI developers.

## Explanation

The following pallets and subsystems are good candidates to migrate from the Relay Chain:

- Identity
- Balances
- Staking
joepetrowski marked this conversation as resolved.
Show resolved Hide resolved
- Staking
- Election Provider
- Bags List
- NIS
- Nomination Pools
- Fast Unstake
joepetrowski marked this conversation as resolved.
Show resolved Hide resolved
- Governance
- Treasury and Bounties
- Conviction Voting
- Referenda

Note: The Auctions and Crowdloan pallets will be replaced by Coretime, its system chain and
interface described in RFC-1 and RFC-5, respectively.

### Migrations

Some subsystems are simpler to move than others. For example, migrating Identity can be done by
simply preventing state changes in the Relay Chain, using the Identity-related state as the genesis
for a new chain, and launching that new chain with the genesis and logic (pallet) needed.

Other subsystems cannot experience any downtime like this because they are essential to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-existing will be quite tricky for staking, and we would likely have to do a one-off migration. The migration is possible to happen within an era, but even if not, having one extended "super era" is not too bad.

With a one-off migration, all the staking data is moved to the system chain. Some historical staking data will be kept on the RC for retroactive rewards and slashes, and will be removed after a maximum of 84 eras.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this due to mutual exclusivity of freezing on the Relay-chain and the Staking chain?

Copy link

@Ank4n Ank4n Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed an alternative migration strategy to staking chain by running two parallel staking system for a while.
Roughly the way it would work is:

Staking system in the Relay chain works independently of Staking Chain. Relay Chain has a configuration that decides how many validators are provided from staking chain, the rest elected locally in the Relay Chain. We can then slowly scale up this number to 100% Staking Chain.

Validators who want to validate via Staking Chain, would need to unbond, teleport assets to Staking Chain, and set their intent to validate again (we could potentially improve this by allowing teleport of staked assets without unlocking). In the beginning it may be easy to get into the active validator set via Staking chain until we reach some equilibrium eventually getting to similar economic security as Relay Chain. Once we go to 100% from Staking Chain, at some point we can kick all validators/nominators who did not migrate.

The main opposition to this approach were

  • Total stake for active validators (economic security) may drop while in transition phase.
  • Validators have to move and re-acquire the approval score they gained on the Relay Chain.
  • It might be harder to move all nominators voluntarily to Staking Chain.

We (weakly) reached to the consensus that one-off migration might be simpler. We will need to migrate the staking ledgers and the staked balance to the Staking Chain and do this in one era (which can be an extended era). The era history can stay in Relay Chain allowing validators to claim historical rewards. Once 84 eras (history depth) have passed, we purge the historical data and clean up staking code in relay chain.

Keen to hear if you have some thoughts/feedbacks around these approaches (this probably deserves its own RFC but we do have an open issue where we have some more notes from our discussions so far).

Copy link

@Tbaut Tbaut Oct 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration will require a lot of upfront communication. The lesser effort on the validator and user side, the better. I believe that a one-off migration would be much easier to handle on the wallets side, and communication side. Wallets and partners (think CEXs etc) should be prepared, and when the block hits, they'll be able to switch to displaying/using the staking chain. With a gradual move however, I can see a period when users don't know if they are on the RC or the SC, UIs would have to show and handle both, which would likely be very confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wholeheartedly agree with a one-off migration approach, in both this case and as a general pattern for moving systems between chains or performing a large complex upgrade.

Copy link

@burdges burdges Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should obviously not require that users change wallet software to make new accounts, transfer, stake, etc. That's a big no no.

We need domain separation material in signatures of course, but the exact semantics of that domain separation material can evolve if required. If the new chain is the staking chain, then it can have a "virtual" genesis hash of the original relay chain or whatever. We donno how to have multiple staking chains anyways.

I guess governance could be more nuanced.

As for migration, we could've the staking chain "fork" off from the relay chain at block height x, which freezes transactions on the relay chain, then the staking chain runs a migration pallet, and finally the staking chain unfreezes its own transactions. Anyways a chain "fork" operation should not be expensive and should be useful elsewhere.

We'll want a bunch of O(1) drops here, so those should be implemented first. We might clean up the state a bit first too, like fixing the slashing spans system, and droping that old state. As a backup, we proved a limited recovery sudo which kills the staking chain and unfreezes the relay chain accounts.

It's actually kinda unclear if it's easier to migrate once to separate governance and staking chains, or to migrate twice, once to a separate combined chain. and then split off governance later. The migrations involve somewhat different activities.

Copy link

@Ank4n Ank4n Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for migration, we could've the staking chain "fork" off from the relay chain at block height x, which freezes transactions on the relay chain, then the staking chain runs a migration pallet, and finally the staking chain unfreezes its own transactions. Anyways a chain "fork" operation should not be expensive and should be useful elsewhere.

What we really need to move is staking pallet state and part of user balance that is staked. Unstaked balances should eventually end up in AssetHub once we get rid of balances from relay chain (and should be part of separate migration I think).

Forking off relay chain actually sounds like a great idea. We can then have a specialised migration pallet that cleans/translates the data that we need, drop what we don't need. Relay chain will freeze all staking related transactions and kill the state that now belongs to staking chain once it knows the staking chain is operational. Does that make sense? This could be a template for similar migrations in future.

@kianenigma had the opposite idea to prepare data that we need to migrate, calculate the state root and securely send it to staking chain (root track or xcm), and then have a bot that fills the storage on staking chain through permissionless calls. I guess though forking off would be simpler?

network's functioning, like Staking and Governance. However, these can likely coexist with a
similarly-permissioned system chain for some time, much like how "Gov1" and "OpenGov" coexisted at
the latter's introduction.

Specific migration plans will be included in release notes of runtimes from the Polkadot Fellowship
when beginning the work of migrating a particular subsystem.

### Interfaces

The Relay Chain, in many cases, will still need to interact with these subsystems, especially
Staking and Governance. These subsystems will require making some APIs available either via
dispatchable calls accessible to XCM `Transact` or possibly XCM `Instruction`s in future versions.

For example, Staking provides a pallet-API to register points (e.g. for block production) and
offences (e.g. equivocation). With Staking in a system chain, that chain would need to allow the
Relay Chain to update validator points periodically so that it can correctly calculate rewards.

A pub-sub protocol may also lend itself to these types of interactions.
joepetrowski marked this conversation as resolved.
Show resolved Hide resolved

### Deployment

Actual migrations should happen based on some prioritization. This RFC proposes to migrate Identity,
Staking, and Governance as the systems to work on first. A brief discussion on the factors involved
in each one:

#### Identity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to confirm, we will have a system parachain just for identity?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you considered to merge identity for Kusama and Polkadot? like we have a single fellowship for both networks. I see no reason to have two set of identities on each networks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable idea. The parachain is for Identity to get it off the Relay Chain. Of course with more advanced Coretime scheduling it's probably reasonable to have slower scheduling.

The reason to keep them separate for now is that Kusama is a nice dress rehearsal for Polkadot. But I have nothing against a second migration that deprecates the Kusama chain and does all identities through Polkadot's system.


Identity will be one of the simpler pallets to migrate into a system chain, as its logic is largely
self-contained and it does not "share" balances with other subsystems. As in, any DOT is held in
reserve as a storage deposit and cannot be simultaneously used the way locked DOT can be locked for
multiple purposes.

Therefore, migration can take place as follows:

1. A runtime upgrade will block all calls to the pallet, preventing updates to identity info.
1. The frozen state will form the genesis of a new system parachain.
1. A Relay Chain runtime upgrade will allow migrating the deposit to the parachain. The parachain
joepetrowski marked this conversation as resolved.
Show resolved Hide resolved
deposit is on the order of 1/100th of the Relay Chain's. Therefore, this will result in freeing
up Relay State as well as most of each user's reserved balance.
1. The pallet and any leftover state can be removed from the Relay Chain.

User interfaces that render Identity information will need to source their data from the new system
parachain.

Note: In the future, it may make sense to decommission Kusama's Identity chain and do all account
identities via Polkadot's. However, the Kusama chain will serve as a dress rehearsal for Polkadot.

#### Staking

Migrating the staking subsystem will likely be the most complex technical undertaking, as the
Staking system cannot stop (the system MUST always have a validator set) nor run in parallel (the
system MUST have only _one_ validator set) and the subsystem itself is made up of subsystems in the
runtime and the node. For example, if offences are reported to the Staking parachain, validator
nodes will need to submit their reports there.

Handling balances also introduces complications. The same balance can be used for staking and
governance. Ideally, all balances stay on Asset Hub, and only report "credits" to system chains like
Staking and Governance. However, staking mutates balances by issuing new DOT on era changes and for
rewards. Allowing DOT directly on the Staking parachain would simplify staking changes.

Given the complexity, it would be pragmatic to include the Balances pallet in the Staking parachain
in its first version. Any other systems that use overlapping locks, most notably governance, will
need to recognise DOT held on both Asset Hub and the Staking parachain.

There is more discussion about staking in a parachain in [Moving Staking off the Relay
Chain](https://github.com/paritytech/polkadot-sdk/issues/491).

#### Governance

Migrating governance into a parachain will be less complicated than staking. Most of the primitives
needed for the migration already exist. The Treasury supports spending assets on remote chains and
collectives like the Polkadot Technical Fellowship already function in a parachain. That is, XCM
already provides the ability to express system origins across chains.

Therefore, actually moving the governance logic into a parachain will be simple. It can run in
parallel with the Relay Chain's governance, which can be removed when the parachain has demonstrated
sufficient functionality. It's possible that the Relay Chain maintain a Root-level emergency track
for situations like [parachains
halting](https://forum.polkadot.network/t/stalled-parachains-on-kusama-post-mortem/3998).

The only complication arises from the fact that both Asset Hub and the Staking parachain will have
DOT balances; therefore, the Governance chain will need to be able to credit users' voting power
based on balances from both locations. This is not expected to be difficult to handle.

## Drawbacks

These subsystems will have reduced resources in cores than on the Relay Chain. Staking in particular
may require some optimizations to deal with constraints.

## Testing, Security, and Privacy

Standard audit/review requirements apply. More powerful multi-chain integration test tools would be
useful in developement.

## Performance, Ergonomics, and Compatibility

Describe the impact of the proposal on the exposed functionality of Polkadot.

### Performance

This is an optimization. The removal of public/user transactions on the Relay Chain ensures that its
primary resources are allocated to system performance.

### Ergonomics

This proposal alters very little for coretime users (e.g. parachain developers). Application
joepetrowski marked this conversation as resolved.
Show resolved Hide resolved
developers will need to interact with multiple chains, making ergonomic light client tools
particularly important for application development.

For existing parachains that interact with these subsystems, they will need to configure their
runtimes to recognize the new locations in the network.

### Compatibility

Implementing this proposal will require some changes to pallet APIs and/or a pub-sub protocol.
Application developers will need to interact with multiple chains in the network.

## Prior Art and References

- [Transactionless Relay-chain](https://github.com/paritytech/polkadot/issues/323)
- [Moving Staking off the Relay Chain](https://github.com/paritytech/polkadot-sdk/issues/491)

## Unresolved Questions

There remain some implementation questions, like how to use balances for both Staking and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need identity as its own chain. Then we have the question of how to determine if a feature should lives on its own parachain or coexisting with another parachain. e.g. we could have the identity pallet also lives on the governance parachain.

Copy link
Contributor

@xlc xlc Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current migration plan results downtime. i.e. we have to freeze identity state, setup new genesis, run the new chain, and then it is up again. This is likely going to take weeks. Can we do better to reduce the downtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you've said, a lot of the use of Identity is for reading. While there is downtime, it is only to updating identities and providing judgements. The service still exists for rendering identity info. We can try to coordinate upgrades/merging PRs, but IMO the complexity of a zero-downtime migration outweighs the benefits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need identity as its own chain. Then we have the question of how to determine if a feature should lives on its own parachain or coexisting with another parachain.

Also, deposits are much lower on parachains. Current identity with 21 DOT is very expensive for a lot of people. Having it on a parachain makes it much more accessible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we use the current collectives chain as a hub for everything that's social/offchain related?

Governance. See, for example, [Moving Staking off the Relay
Chain](https://github.com/paritytech/polkadot-sdk/issues/491).

## Future Directions and Related Material

Ideally the Relay Chain becomes transactionless, such that not even balances are represented there.
With Staking and Governance off the Relay Chain, this is not an unreasonable next step.