Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agile Coretime #1

Merged
merged 47 commits into from
Aug 11, 2023
Merged

Agile Coretime #1

merged 47 commits into from
Aug 11, 2023

Conversation

gavofyork
Copy link
Contributor

@gavofyork gavofyork commented Jun 30, 2023

This proposes a periodic, sale-based method for assigning Polkadot Coretime. The method takes into account the need for long-term capital expenditure planning for teams building on Polkadot, yet also provides a means to allow Polkadot to capture long-term value in the resource which it sells. It supports the possibility of building secondary markets to make resource allocation more efficient and largely avoids the need for parameterisation.

Implementation: paritytech/substrate#14568

Copy link

@lucasvo lucasvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review with minor comments and one question for clarity. I think the mechanism is simple and a much better model than the current slots. I will take another pass once we’ve discussed this a bit more.

RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
}
```

Notably, if a region is split or transferred, then the `price` is reset to `None`.
Copy link

@lucasvo lucasvo Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier to understand if you explicitly say that splitting a region removes the ability to renew at the same price which I think you are implicitly saying by specifying splitting sets the price to “none”. This also implies though that “split cores” are not eligible for priority renewal, correct? You also don’t seem to mention anything of the fact that current owners of cores should be able to get priority renewal. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is written above:

Notably, if a region is split or transferred, then the price is reset to None.

Not enough?

Also, this is about split regions, i.e. taking the month-long piece and splitting in into smaller pieces.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t understand if there is a specific reason that the price needs to be set to none. Does that mean a split region can’t be renewed the same way a “full core” region would?

Copy link
Contributor Author

@gavofyork gavofyork Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - because it would then have two owners - which one could "renew"? We generally want to minimise renewals since they bias the market. I think it's ok when the core would be used consistently by the same paras in the same way from month to month, but it doesn't make sense when they're being carved up and, presumably, traded.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment initially was to clarify that split cores shouldn’t be able to get priority renewal. Carefully reading the spec does imply that but I think making this more explicit would be helpful for people to understand.

I think it's ok when the core would be used consistently by the same paras in the same way from month to month, but it doesn't make sense when they're being carved up and, presumably, traded.

I think this does not go into the initial spec. It would should be possible to offer to buy split cores at some point at which point this can be added. I doubt the demand for them will be particularly high today and it shouldn’t delay a first implementation.

@lucasvo
Copy link

lucasvo commented Jul 1, 2023

I believe this RFC makes transfer of regions lose the priority on core renewal but changing the allocation of the slot from one parachain to another does not. This would mean the system can quite easily be gamed to transfer ownership of a region without it being prevented from being renewed (for example by holding the region in a pure proxy and simply transferring ownership). Wouldn’t we want to tie the ability to renew to the paraid that the core is running and not which account controls the core?

Copy link
Contributor

@rphmeier rphmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regions Architecture

The regions here have a number of similarities with the regions I've proposed in #3 but lack a couple significant details which will be important for implementing elastic scaling and other future mechanisms.

My proposal was focused on relay-chain coretime scheduling and is actually a replacement for all existing scheduling logic in the relay-chain. In my opinion, the mechanisms on the broker-chain should match the mechanisms on the relay-chain closely in order to avoid friction between those components.

Scheduling on the relay-chain, looking forward, needs to solve a few key problems:

  1. Gracefully handle tens of thousands of parachains without significant runtime scheduling overhead.
  2. In the case that a parachain has scheduled assignments on several cores, the mapping between upcoming blocks on the parachain and specific cores must be unambiguous in the near-term future to the validators working on those cores
  3. When cores are highly shared, the time it takes to make up for a missed opportunity should be minimally dependent on the number of other applications also scheduled on the core
  4. Accommodate a variety of different scheduling frequencies and overlapping durations, all on the same core

(more detailed description of all of the above and more in #3)

The regions RFC is my attempt to solve all these problems. When we discussed offline, @gavofyork raised the point that handling splitting/transferring on the relay-chain is likely to incur too much load, which is fair. However, the relay-chain Region primitive itself is still important. The broker-chain should be able to use its own primitives, but they ought to be compatible with the direction of relay-chain scheduling.

To solve (1), the proposal focuses on having parachain candidates tagged with a deterministic and immutable region identifier which is submitted along with the candidate by the relay-chain block author. The relay-chain logic needs only lazily check that the region is in surplus and the parachain assignment is correct, which is a single load, modify, store operation per core per relay-chain block. This way, all scheduling overhead is pushed to the node-side.

To solve (2), we can build upon the solution to (1). In the future, this region identifier for a particular candidate may be included inside the CandidateReceipt by the collator, as collator-selection algorithms working across many regions must already figure out how to utilize their regions and asking validators to re-run this allocation logic with less information is a redundancy that can be avoided. This solves the elastic scaling candidate-group problem, as validators will know unequivocally which backing group is intended to work on the candidate.

To solve (3) and (4), the regions proposal schedules core-time somewhat probabilistically. P2P network systems experience variance in practice, and the time it takes to back a block or make its data available does not always fall within prescribed bounds. Stated otherwise - if Polkadot were to set time limits on backing and availability timeouts such that they were always met, or were met 99.9% of the time, those bounds would likely be too conservative and we'd be leaving significant performance on the table. By giving regions a single assignee, the probabilistic scheduling allows for cores and parachains to "make up" recently missed opportunities by accepting more than one candidate at a time, up to configured per-core and per-relay-chain-block maximum to avoid massive per-block loads. Chains can never access more core-time than they've been allocated in total. With this solution, even with 1000 chains scheduled on a single core with varying frequencies, they cause minimal friction on each others' timing and system load is both predictable and capped when averaged out over any period longer than a minute or two.

Applying Regions to this RFC

I suggest that my regions proposal be altered, removing the ability to split, transfer, and reassign regions in an unpermissioned way on the Polkadot relay-chain.

Instead, these actions would become permissioned, with the permission being held by the broker-chain. The broker-chain would then 'blit' the data structures it manages onto regions in the relay-chain to manage scheduling. This can be done with a single XCM to create the regions on the relay-chain, and the scheduling logic there would handle the rest.

The region records described in this RFC will be compatible, in that they could be transformed into single-assignee, frequency-based regions when blitting them up to the relay-chain, but I suggest we outline that intention in this RFC to commit to that as a plan. The regions in this RFC also have a few properties I'll comment on:

  1. The RegionId is not unique - every BULK_PERIOD, RegionIds will be reused, as they are dependent on only the core Index and the timeslice within the BULK_PERIOD. That may make it harder for logic living in other chains to do bookkeeping about which regions they own.
  2. The allocation containing a Vec<ParaId> rather than having multiple regions, each with their own single ParaId may discourage region-sharing among chains, as either they all have to renew together or none of them can. In my opinion, this is likely to take away from the value proposition of sharing regions altogether, as all parachains will want to stay on the RENEWAL_PRICE_CAP curve but some will be chained to sinking partners. That said, it does also give parachains very strong incentives to help their region companions survive. Let's discuss this property & alternatives.

Tight integration with the regions RFC would make updating the broker parachain to elastic scaling, or adding other mechanisms for accessing core-time technically trivial, economic design notwithstanding.

Since full implementation of the regions RFC will likely take a while, it'd probably make most sense to include a new call on the relay-chain that the broker can invoke via XCM::Transact: BlitBrokerRegion. This would take the Broker region format as an argument and transform it into whatever the scheduling mechanism that the relay chain currently uses is - whether that's the existing scheduling infrastructure quickly adapted for the purpose, or the new regions architecture when that lands.

RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
5. The design MUST work with a limited set of resources (cores on the Polkadot UC) whose properties and number may evolve over time.
6. The design MUST avoid creating additional dependency on functionality which the Relay-chain need not strictly provide for the delivery of the Polkadot UC. This includes any dependency on the Relay-chain hosting a DOT token.

Furthermore, the design SHOULD be implementable and deployable in a timely fashion; three months from the acceptance of this RFC would seem reasonable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems optimistic, especially on the "deployable" part given the Root track takes one month and would eat into a third of this. Perhaps deployable to testnet with concrete migration path proposed for existing parachains?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the fellowship be able to whitelist this upgrade?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the track is still 28 days. And even though it is less restrictive to pass earlier on that track, a lot of parachain teams have expressed that they prefer a set block number (i.e. At over After) for runtime upgrades so that they can prepare for any breaking changes.

Copy link
Contributor Author

@gavofyork gavofyork Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll see. As long as the Relay-chain support for core-assignment exists, then this really shouldn't be a big change. Three months should be notionally possible, even if it ends up being missed due to of factors outside of the scope of this RFC.

RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
@gavofyork
Copy link
Contributor Author

I believe this RFC makes transfer of regions lose the priority on core renewal but changing the allocation of the slot from one parachain to another does not. This would mean the system can quite easily be gamed to transfer ownership of a region without it being prevented from being renewed (for example by holding the region in a pure proxy and simply transferring ownership). Wouldn’t we want to tie the ability to renew to the paraid that the core is running and not which account controls the core?

From the Renewals section:

[renew] has the same effect as purchase followed by allocate containing the same Vec<ParaId>...

Note containing the same [set of parachains]. This prevents transfer using proxies.

Transfer of regions would indeed lose renewal rights since the price information would be dropped. This is intentional. The point of renewals isn't to attempt to give as many entities as possible a discount for the next month: it's to ensure that committed teams get some guarantees about price for predicting future costs.


The present system of allocating time for parachains on the cores of the Polkadot Ubiquitous Computer (aka "Polkadot") is through a process known as *slot auctions*. These are on-chain candle auctions which proceed for several days and result in a core being assigned to a single parachain for six months at a time up to 18 months in advance. Practically speaking, we only see two year periods being bid upon and leased.

Funds behind the bids made in the slot auctions are merely locked, not consumed or paid and become unlocked and returned to the bidder on expirt of the lease period. A means of sharing the deposit trustlessly known as a *crowdloan* is available allowing token holders to contribute to the overall deposit of a chain without any counterparty risk.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funds behind the bids made in the slot auctions are merely locked, not consumed or paid

This wording and multiple other references to sales/ purchase suggest that Bulk coretime will be paid for in DOT, in contrast to currently where it's simply locked and the "cost" is the opportunity cost of not staking - it should be explicit if this is the case.

It would also be useful to understand - if this is the case - where those DOT are sent, i.e. who is paid? Is it validators? Is it burned? If validators, would high reward rates through demand for blockspace impact inflation that currently forms the vast majority of their revenue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bulk coretime will be paid for in DOT

That's right.

Any DOT recuperated for sales of system resources (Coretime, in this case) would by default be placed in the treasury. Governance would be able to determine what to do with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its better to burn the DOT instead of diverting them to Treasury. There are a few reasons for that:

  1. In the absence of 2y locks on DOT, the system might benefit from a permanent sink for DOTs. We also might consider to increase the ideal staking rate. Non-interactive staking might serve well here, too.
  2. Inflow from coretime usage might, especially in the short-term, be very volatile. The Treasury conceptually benefits from predictable inflow, allowing for long-term budgeting. Inflation is the best way to do that. We'd counter that with burning for coretime.
  3. This mechanism would lead to high inflow in times with high coretime usage and low inflow in times of low usage. It seems to me that, if anything, it should be the opposite. With a steady inflow the Treasury would always have enough funds to respond to demand shocks in coretime when necessary (by funding good projects / initiatives).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with @jonasW3F , DOT from Coretime sales should be instantly burnt in order to slow down inflation.

Copy link
Contributor Author

@gavofyork gavofyork Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objection, though I would prefer to leave the specific economics (including this and the price adaption) for other RFCs so that we can document the motivations properly. The implementation of RFC-1 can (and indeed does) just provide a OnUnbalanced<Credit> endpoint which can just as easily burn as send to the treasury.

@jonasW3F perhaps you can write a short RFC expanding out your points above.

RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
@lucasvo
Copy link

lucasvo commented Jul 3, 2023

@gavofyork you write:

The point of renewals isn't to attempt to give as many entities as possible a discount for the next month: it's to ensure that committed teams get some guarantees about price for predicting future costs.

I think this point perhaps wasn't clear enough. Thanks for clarifying that this is implemented by having renew always allocate the core to the same ParaId. I would add a comment to renew section that points out that this is a specific design goal or highlight this in the problem statement.

RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
RFC-0001-Agile Coretime.md Outdated Show resolved Hide resolved
Copy link

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good read!

text/0001-agile-coretime.md Show resolved Hide resolved
@@ -167,7 +167,7 @@ The Sale Price varies during an initial portion of the Purchasing Period called

At any time when there are remaining Regions of Bulk Coretime to be sold, *including during the Interlude Period*, then certain Bulk Coretime assignmnents may be *Renewed*. This is similar to a purchase in that funds must be paid and it consumes one of the Regions of Bulk Coretime which would otherwise be placed for purchase. However there are two key differences.

Firstly, the price paid is exactly `RENEWAL_PRICE_CAP` more than what the purchase/renewal price was in the previous sale.
Firstly, the price paid is the minimum of `RENEWAL_PRICE_CAP` more than what the purchase/renewal price was in the previous renewal and the current (or initial, if yet to begin) regular Sale Price.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for incorporating this change! Simplifies the purchase strategy that Centrifuge would choose significantly and increases certainty over core availability.

type CoreMask = [u8; 10]; // 80-bit bitmap.

// 128-bit (16 bytes)
struct RegionId {
Copy link
Contributor

@rphmeier rphmeier Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an implementation note, out-of-scope for the RFC, but these datatypes depend on using a packed representation as opposed to standard alignments if they're meant to have these exact sizes in memory.

in SCALE encoding they should be packed automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did make a test to check the size (at least on my 64-bit M1 architecture) and it was indeed 128-bit as expected without any explicit packing.

@gavofyork gavofyork merged commit c782a92 into main Aug 11, 2023
@gavofyork gavofyork deleted the gav-agile-coretime branch August 11, 2023 23:48
@anaelleltd anaelleltd added the Implemented Is merged or live as a feature/service. label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Implemented Is merged or live as a feature/service.
Projects
None yet
Development

Successfully merging this pull request may close these issues.