-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic L1 starting block selection (Config in advance) #342
base: celo-rebase-12
Are you sure you want to change the base?
Conversation
There is one failing case it needs to be expanded upon.
Also adds checks to enforce using either beachonchain flags or using l1StartingBlockTag and migration-block-time.
e83a9ed
to
702e087
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments.
In general the approach looks good to me. I'd like to have a backup for beaconcha.in API, is there one available?
Also, to test this further, could you write a little tool which runs MostRecentFinalizedL1BlockAtTime
every couple of seconds so that we can check the result?
return (unixTime - beaconChainGenesisTimeSeconds) / beaconChainSlotDurationSeconds | ||
} | ||
|
||
type BeaconBlock struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some lib for that kind of stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found these two options:
- https://github.com/gobitfly/eth2-beaconchain-explorer
- https://github.com/prysmaticlabs/prysm
But they both just provide clients for the beacon-rpc, so I'd still need to write a client for the beaconcha.in requests, it didn't seem worth it to import all that stuff for one method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use something like go-swagger
to generate the schema (if you didn't use that already) directly fro m the swagger file, but I don't think it's worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds interesting, I will have a look, where is the swagger file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://ethereum.github.io/beacon-APIs/releases/v3.0.0/beacon-node-oapi.json
The beacon-api specs is basically just a render of that
Required: false, | ||
} | ||
l1BeaconchainURLFlag = &cli.StringFlag{ | ||
Name: "l1-beaconcha.in-url", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a fallback service that provides the same API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as far as I'm aware, but you can host your own, AFAICS this project is the project serving those pages - https://github.com/gobitfly/eth2-beaconchain-explorer it's an indexer/explorer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there could be any reason to communicate with them ahead of the mainnet migration just to make sure they're ready to accommodate all the requests from people migrating and aren't planning any maintenance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to check for any maintenance issues, in terms of load only two very small requests are being made per migration, so I don't think it would be an issue.
// migration script is run. We give them a 10 minute buffer in case of any | ||
// issues, but if we are outside of this range we will fail here. | ||
var maxSequencerDriftCelo uint64 = 2892 // Duplicate of the value defined at op-node/rollup.maxSequencerDriftCelo | ||
if maxSequencerDriftCelo-(opts.migrationBlockTime-l1StartBlock.Time()) < 17*60 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'd hit this, this would break the migration. Shouldn't we instead try to find a newer finalized block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can try to find a newer finalized block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if we assume that blocks explicitly set by tag are correct, this check here is needless, because L1 block times are determined by the slot time, meaning that here the max diff between the l1 block time and l2 start time is (in the case that our L2 start time is the last second of an Eth 2 epoch):
(32*12*3) + (12*9) - 1 = 1259s
~= 20.98 minutes. We minus the one because the L2 start block is falling on the last second of the epoch in this scenario.
So given our maxSequencerDrift of 48.2m that gives us at least 27 minutes of time to get the sequencer up and running.
So this check would never get triggered with the current code.
W can still search for other blocks but that will be determined by whether we find a finalized block in the expected slot or not.
Co-authored-by: Paul Lange <[email protected]>
Also fix multiple linter errors.
// Wait for the L2 start time before we calculate the L1 starting block. | ||
tillL2Start := int64(opts.migrationBlockTime) - time.Now().Unix() | ||
if tillL2Start >= 0 { | ||
time.Sleep(time.Duration(tillL2Start+1) * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to wait here? If the beacon chain API call below takes a few seconds anyway, why not add this check after that returns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to wait here so that we don't try to query the beacon chain api for epochs that have not yet completed, since we are relying on the completed epochs to determine the finalized block.
We don't actually need to wait till the L2 start time, we just need to wait till the epoch prior to the epoch that contains the l2 start time has completed. So we can shave a bit of time off here.
// migration script is run. We give them a 10 minute buffer in case of any | ||
// issues, but if we are outside of this range we will fail here. | ||
var maxSequencerDriftCelo uint64 = 2892 // Duplicate of the value defined at op-node/rollup.maxSequencerDriftCelo | ||
if maxSequencerDriftCelo-(opts.migrationBlockTime-l1StartBlock.Time()) < 17*60 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that opts.migrationBlockTime and l1StartBlock can be decided by flags, should we check for underflows here?
Looks good! Very nice approach. I feel confident given the tests added here, but agree with Paul that it wouldn't hurt to test this by running it many times and checking against live expected results to make sure there aren't any edge cases. Dry runs and local testing will help too. I'm planning to test this locally myself when I get a chance. Looks great though. Ready to approve once everyone has taken a look and all changes are in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I think the approach is sound!
The comments are mainly nitpicks or suggestions, but nothing substantial
case (opts.l1BeaconRPC != "") != (opts.l1BeaconchainURL != ""): | ||
// Check that either both l1BeaconRPC and l1BeaconchainURL are set or unset. | ||
return fmt.Errorf("if the l1-beacon-rpc flag is specified, the l1-beaconchain-url must also be specified and vice versa") | ||
case (opts.migrationBlockTime == 0) == (opts.l1BeaconRPC == ""): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of the switch cases are really hard to parse :)
Why is this so strict anyways? In the migration, the absence of the L1StartingBlockTag will trigger the l1 starting block selection, otherwise it will take the specified one.
You could simply check something like this here:
if config.L1StartingBlockTag == nil && (opts.l1BeaconRPC == "" || opts.l1BeaconchainURL == "") {
return fmt.Errorf("...")
}
And then if you want add a log warning in the runStateMigration
that if set by the user, the provided migration-block-time will be overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to be explicit about the flags, it's a mistake to pass the migrationBlockTime if you've passed either the l1BeaconRPC or the l1BeaconchainURL, admittedly it's pretty verbose but I think that's better than letting people misuse the migration tool and then be surprised when what it outputs is not what they expected.
Anyway, this will get simpler as I'm removing the requirement on beaconcha.in
if config.L1StartingBlockTag == nil { | ||
l1StartBlock, err = client.BlockByNumber(context.Background(), nil) | ||
// Find the L1 starting block, the L2 fork block occurs 1 minute after the last celo L1 block. | ||
opts.migrationBlockTime = celoL1Head.Time + 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What speaks against getting rid of the parameter all-together, hardcoding the baklava and alfajores parameters and use this 60 second offset everywhere else?
// given time falls and then looks back from there to find the first finalized | ||
// block. | ||
func (c *beaconClient) MostRecentFinalizedL1BlockAtTime(ctx context.Context, pointInTime uint64) (common.Hash, error) { | ||
// Find the epoch starting at or before the L2 start time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you renamed the arg, but the comment still references the old name
var err error | ||
names := [2]string{"completed", "justified"} | ||
|
||
// Check the most recent completed and justified epochs had a participation rate of at least 0.67. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and following steps are dependent on that the point-in-time has actually passed (in wall-time and on-chain in terms of prior epoch justifications).
In the L1StartBlock determining code you wait for that time to pass prior to calling this method.
I think it would make sense to include the waiting logic here instead, since the function basically doesn't do anything when called before that time anyways.
Additionally, you could make the point in time arg optional, and if it is nil
it takes time.Now()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to wait in this method, makes sense thanks!
if err != nil { | ||
return common.Hash{}, fmt.Errorf("error fetching epoch %d: %w", epochNumber-i, err) | ||
} | ||
if epoch.Data.Globalparticipationrate < 0.67 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have preferred to use the BeaconAPI for this, but it seems like you would have to process the BLS signatures and the graph of source
to target
in the attestations yourself for each block in the epoch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think I can use the /eth/v1/beacon/states/{state_id}/finality_checkpoints
endpoint for this, I wasn't able to test before since I hadn't found an rpc that supported historical requests for /eth/v1/beacon/states/{state_id}/finality_checkpoints
but https://ethereum-holesky-beacon-api.publicnode.com seems to. So I'm planning to remove the reliance on beaconcha.in
return (unixTime - beaconChainGenesisTimeSeconds) / beaconChainSlotDurationSeconds | ||
} | ||
|
||
type BeaconBlock struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use something like go-swagger
to generate the schema (if you didn't use that already) directly fro m the swagger file, but I don't think it's worth it
time.Sleep(time.Duration(tillL2Start+1) * time.Second) | ||
} | ||
// MostRecentFinalizedL1BlockAtTime can a few seconds so we provide a suitably long context. | ||
ctx, cancel := context.WithTimeout(context.Background(), 20*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a timeout here? Is there a concern that the HTTP calls will block too long?
I think I'd rather risk blocking too long than exiting the migration early.
(Although I don't think the timeout will ever be reached, since there are no loops retrying on a non-200 response, so you'll rather exit because of a orderly unsuccessful response or connection error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The go http library will block forever waiting from a response from the server, so I think we need some sort of timeout. I've re-factored the code a bit and now contexts are used closer to the network call sites, I also used more generous timeouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
By the way, the http.Client{}
also has a Timeout
(total request time) field that you can set globally, if you don't want to use the timeout-context on each call.
Also removes reliance on beachoncha.in Tests not yet updated
Changes the way the migration script works so that it does not require the L2 fork block time as an input. This frees us from trying to predict a safe L2 fork block time that falls after the final Celo L1 block but not so far in the future that we end up with the L2 not producing blocks once started because it is waiting for that time.
Also changes the way we select the l1 starting block, previously we were selecting it to be the latest block of the L1 but this was not safe since that block would not have been finalized and it was also not deterministic, since the value depended on when the migration script was run. Now we select the most recent L1 block that was finalized at the time of the L2 fork block, which is deterministic because it is based on the L2 fork block time that is based on the last Celo L1 block time.
These changes allow us to provide all required config for a migration well in advance of the migration time and also allows partners to perform the migration in parallel with our migration, as opposed to having to wait for our migration to finish before they can start their migration.
A description of ethereum's finalization process can be found here - https://eth2book.info/capella/part2/consensus/casper_ffg/
The approach we use here is to find the epoch that the L2 start block falls within, this epoch is guaranteed to not be completed at the time of the L2 fork block, so we look to the previous epoch which is completed, and then to the epoch prior to that which is justified and take the first block of that epoch, since it will be finalized.
This is assuming that the completed and justified epochs reached at least 2/3 network participation. If they did not then we fail the process.
Fixes https://github.com/celo-org/celo-blockchain-planning/issues/908
Tested
Locally was able to migrate an alfajores datadir with the new flags