Deterministic L1 starting block selection (Config in advance) #342

piersy · 2025-03-03T17:38:48Z

Changes the way the migration script works so that it does not require the L2 fork block time as an input. This frees us from trying to predict a safe L2 fork block time that falls after the final Celo L1 block but not so far in the future that we end up with the L2 not producing blocks once started because it is waiting for that time.

Also changes the way we select the l1 starting block, previously we were selecting it to be the latest block of the L1 but this was not safe since that block would not have been finalized and it was also not deterministic, since the value depended on when the migration script was run. Now we select the most recent L1 block that was finalized at the time of the L2 fork block, which is deterministic because it is based on the L2 fork block time that is based on the last Celo L1 block time.

These changes allow us to provide all required config for a migration well in advance of the migration time and also allows partners to perform the migration in parallel with our migration, as opposed to having to wait for our migration to finish before they can start their migration.

A description of ethereum's finalization process can be found here - https://eth2book.info/capella/part2/consensus/casper_ffg/

The approach we use here is to find the epoch that the L2 start block falls within, this epoch is guaranteed to not be completed at the time of the L2 fork block, so we look to the previous epoch which is completed, and then to the epoch prior to that which is justified and take the first block of that epoch, since it will be finalized.

This is assuming that the completed and justified epochs reached at least 2/3 network participation. If they did not then we fail the process.

Fixes https://github.com/celo-org/celo-blockchain-planning/issues/908

Tested

Locally was able to migrate an alfajores datadir with the new flags

There is one failing case it needs to be expanded upon.

Also adds checks to enforce using either beachonchain flags or using l1StartingBlockTag and migration-block-time.

palango

Left some comments.

In general the approach looks good to me. I'd like to have a backup for beaconcha.in API, is there one available?

Also, to test this further, could you write a little tool which runs MostRecentFinalizedL1BlockAtTime every couple of seconds so that we can check the result?

op-chain-ops/cmd/celo-migrate/beacon.go

palango · 2025-03-04T13:11:26Z

op-chain-ops/cmd/celo-migrate/beacon.go

+	return (unixTime - beaconChainGenesisTimeSeconds) / beaconChainSlotDurationSeconds
+}
+
+type BeaconBlock struct {


Is there some lib for that kind of stuff?

I found these two options:

https://github.com/gobitfly/eth2-beaconchain-explorer

https://github.com/prysmaticlabs/prysm

But they both just provide clients for the beacon-rpc, so I'd still need to write a client for the beaconcha.in requests, it didn't seem worth it to import all that stuff for one method.

We could use something like go-swagger to generate the schema (if you didn't use that already) directly fro m the swagger file, but I don't think it's worth it

Sounds interesting, I will have a look, where is the swagger file?

https://ethereum.github.io/beacon-APIs/releases/v3.0.0/beacon-node-oapi.json

The beacon-api specs is basically just a render of that

palango · 2025-03-04T13:15:11Z

op-chain-ops/cmd/celo-migrate/main.go

+		Required: false,
+	}
+	l1BeaconchainURLFlag = &cli.StringFlag{
+		Name:     "l1-beaconcha.in-url",


Is there a fallback service that provides the same API?

Not as far as I'm aware, but you can host your own, AFAICS this project is the project serving those pages - https://github.com/gobitfly/eth2-beaconchain-explorer it's an indexer/explorer

Do you think there could be any reason to communicate with them ahead of the mainnet migration just to make sure they're ready to accommodate all the requests from people migrating and aren't planning any maintenance?

It would be good to check for any maintenance issues, in terms of load only two very small requests are being made per migration, so I don't think it would be an issue.

op-chain-ops/cmd/celo-migrate/main.go

palango · 2025-03-04T13:18:46Z

op-chain-ops/cmd/celo-migrate/main.go

+	// migration script is run. We give them a 10 minute buffer in case of any
+	// issues, but if we are outside of this range we will fail here.
+	var maxSequencerDriftCelo uint64 = 2892 // Duplicate of the value defined at op-node/rollup.maxSequencerDriftCelo
+	if maxSequencerDriftCelo-(opts.migrationBlockTime-l1StartBlock.Time()) < 17*60 {


If we'd hit this, this would break the migration. Shouldn't we instead try to find a newer finalized block?

Yes, I think we can try to find a newer finalized block.

Actually, if we assume that blocks explicitly set by tag are correct, this check here is needless, because L1 block times are determined by the slot time, meaning that here the max diff between the l1 block time and l2 start time is (in the case that our L2 start time is the last second of an Eth 2 epoch):

(32*12*3) + (12*9) - 1 = 1259s ~= 20.98 minutes. We minus the one because the L2 start block is falling on the last second of the epoch in this scenario.

So given our maxSequencerDrift of 48.2m that gives us at least 27 minutes of time to get the sequencer up and running.

So this check would never get triggered with the current code.

W can still search for other blocks but that will be determined by whether we find a finalized block in the expected slot or not.

Co-authored-by: Paul Lange <[email protected]>

Also fix multiple linter errors.

op-chain-ops/cmd/celo-migrate/main.go

alecps · 2025-03-04T18:55:16Z

op-chain-ops/cmd/celo-migrate/main.go

+		// Wait for the L2 start time before we calculate the L1 starting block.
+		tillL2Start := int64(opts.migrationBlockTime) - time.Now().Unix()
+		if tillL2Start >= 0 {
+			time.Sleep(time.Duration(tillL2Start+1) * time.Second)


Why do we need to wait here? If the beacon chain API call below takes a few seconds anyway, why not add this check after that returns?

We need to wait here so that we don't try to query the beacon chain api for epochs that have not yet completed, since we are relying on the completed epochs to determine the finalized block.

We don't actually need to wait till the L2 start time, we just need to wait till the epoch prior to the epoch that contains the l2 start time has completed. So we can shave a bit of time off here.

alecps · 2025-03-04T19:16:30Z

op-chain-ops/cmd/celo-migrate/main.go

+	// migration script is run. We give them a 10 minute buffer in case of any
+	// issues, but if we are outside of this range we will fail here.
+	var maxSequencerDriftCelo uint64 = 2892 // Duplicate of the value defined at op-node/rollup.maxSequencerDriftCelo
+	if maxSequencerDriftCelo-(opts.migrationBlockTime-l1StartBlock.Time()) < 17*60 {


Given that opts.migrationBlockTime and l1StartBlock can be decided by flags, should we check for underflows here?

alecps · 2025-03-04T21:19:15Z

Looks good! Very nice approach. I feel confident given the tests added here, but agree with Paul that it wouldn't hurt to test this by running it many times and checking against live expected results to make sure there aren't any edge cases. Dry runs and local testing will help too.

I'm planning to test this locally myself when I get a chance. Looks great though. Ready to approve once everyone has taken a look and all changes are in

ezdac

Looks good, I think the approach is sound!
The comments are mainly nitpicks or suggestions, but nothing substantial

ezdac · 2025-03-04T16:18:12Z

op-chain-ops/cmd/celo-migrate/main.go

+	case (opts.l1BeaconRPC != "") != (opts.l1BeaconchainURL != ""):
+		// Check that either both l1BeaconRPC and l1BeaconchainURL are set or unset.
+		return fmt.Errorf("if the l1-beacon-rpc flag is specified, the l1-beaconchain-url must also be specified and vice versa")
+	case (opts.migrationBlockTime == 0) == (opts.l1BeaconRPC == ""):


all of the switch cases are really hard to parse :)

Why is this so strict anyways? In the migration, the absence of the L1StartingBlockTag will trigger the l1 starting block selection, otherwise it will take the specified one.
You could simply check something like this here:

if config.L1StartingBlockTag == nil && (opts.l1BeaconRPC == "" || opts.l1BeaconchainURL == "") { return fmt.Errorf("...") }

And then if you want add a log warning in the runStateMigration that if set by the user, the provided migration-block-time will be overwritten.

I wanted to be explicit about the flags, it's a mistake to pass the migrationBlockTime if you've passed either the l1BeaconRPC or the l1BeaconchainURL, admittedly it's pretty verbose but I think that's better than letting people misuse the migration tool and then be surprised when what it outputs is not what they expected.

Anyway, this will get simpler as I'm removing the requirement on beaconcha.in

ezdac · 2025-03-04T16:35:12Z

op-chain-ops/cmd/celo-migrate/main.go

 	if config.L1StartingBlockTag == nil {
-		l1StartBlock, err = client.BlockByNumber(context.Background(), nil)
+		// Find the L1 starting block, the L2 fork block occurs 1 minute after the last celo L1 block.
+		opts.migrationBlockTime = celoL1Head.Time + 60


What speaks against getting rid of the parameter all-together, hardcoding the baklava and alfajores parameters and use this 60 second offset everywhere else?

ezdac · 2025-03-05T11:00:41Z

op-chain-ops/cmd/celo-migrate/beacon.go

+// given time falls and then looks back from there to find the first finalized
+// block.
+func (c *beaconClient) MostRecentFinalizedL1BlockAtTime(ctx context.Context, pointInTime uint64) (common.Hash, error) {
+	// Find the epoch starting at or before the L2 start time.


you renamed the arg, but the comment still references the old name

ezdac · 2025-03-05T11:05:22Z

op-chain-ops/cmd/celo-migrate/beacon.go

+	var err error
+	names := [2]string{"completed", "justified"}
+
+	// Check the most recent completed and justified epochs had a participation rate of at least 0.67.


This and following steps are dependent on that the point-in-time has actually passed (in wall-time and on-chain in terms of prior epoch justifications).
In the L1StartBlock determining code you wait for that time to pass prior to calling this method.

I think it would make sense to include the waiting logic here instead, since the function basically doesn't do anything when called before that time anyways.

Additionally, you could make the point in time arg optional, and if it is nil it takes time.Now().

Updated to wait in this method, makes sense thanks!

ezdac · 2025-03-05T14:11:30Z

op-chain-ops/cmd/celo-migrate/beacon.go

+		if err != nil {
+			return common.Hash{}, fmt.Errorf("error fetching epoch %d: %w", epochNumber-i, err)
+		}
+		if epoch.Data.Globalparticipationrate < 0.67 {


I would have preferred to use the BeaconAPI for this, but it seems like you would have to process the BLS signatures and the graph of source to target in the attestations yourself for each block in the epoch...

Actually, I think I can use the /eth/v1/beacon/states/{state_id}/finality_checkpoints endpoint for this, I wasn't able to test before since I hadn't found an rpc that supported historical requests for /eth/v1/beacon/states/{state_id}/finality_checkpoints but https://ethereum-holesky-beacon-api.publicnode.com seems to. So I'm planning to remove the reliance on beaconcha.in

ezdac · 2025-03-05T16:11:31Z

op-chain-ops/cmd/celo-migrate/beacon.go

+	return (unixTime - beaconChainGenesisTimeSeconds) / beaconChainSlotDurationSeconds
+}
+
+type BeaconBlock struct {


We could use something like go-swagger to generate the schema (if you didn't use that already) directly fro m the swagger file, but I don't think it's worth it

ezdac · 2025-03-05T16:38:24Z

op-chain-ops/cmd/celo-migrate/main.go

+			time.Sleep(time.Duration(tillL2Start+1) * time.Second)
+		}
+		// MostRecentFinalizedL1BlockAtTime can a few seconds so we provide a suitably long context.
+		ctx, cancel := context.WithTimeout(context.Background(), 20*time.Second)


Do we need a timeout here? Is there a concern that the HTTP calls will block too long?
I think I'd rather risk blocking too long than exiting the migration early.

(Although I don't think the timeout will ever be reached, since there are no loops retrying on a non-200 response, so you'll rather exit because of a orderly unsuccessful response or connection error)

The go http library will block forever waiting from a response from the server, so I think we need some sort of timeout. I've re-factored the code a bit and now contexts are used closer to the network call sites, I also used more generous timeouts.

Makes sense!

By the way, the http.Client{} also has a Timeout (total request time) field that you can set globally, if you don't want to use the timeout-context on each call.

Also removes reliance on beachoncha.in Tests not yet updated

piersy requested review from alecps, ezdac and palango March 3, 2025 21:26

piersy added 15 commits March 3, 2025 21:29

Ensure that l1 starting block is configured before check

60f36c4

Select l1 starting block based on l2 fork block time

e0ab6c3

Test working as expected

8d6a0d4

There is one failing case it needs to be expanded upon.

All tests passing

12b9558

Handle old and new approach to l1StartingBlockTag selection

b7ebbab

Fix comment typo

b4c7863

Fix typo in comment

4a08a3c

Add reference

7d4569d

Remove print statements

dde8dce

Small fixes

9264244

Add flags for beaconchain apis

c047c78

Also adds checks to enforce using either beachonchain flags or using l1StartingBlockTag and migration-block-time.

Fix flag check

b311c43

Actually use the new flags

cfba705

Update docs for new flags

c2daa1d

Update old L1 start block check to use correct maxSequencerDrift

702e087

piersy force-pushed the piersy/config-in-advance branch from e83a9ed to 702e087 Compare March 3, 2025 21:30

piersy marked this pull request as ready for review March 3, 2025 21:39

palango requested a review from gastonponti March 4, 2025 13:06

palango reviewed Mar 4, 2025

View reviewed changes

piersy and others added 3 commits March 4, 2025 13:26

Update op-chain-ops/cmd/celo-migrate/beacon.go

3591c74

Co-authored-by: Paul Lange <[email protected]>

Use contexts correctly in beaconClient

a9897b3

Also fix multiple linter errors.

Re-format calculation for readability

d56f9ed

alecps reviewed Mar 4, 2025

View reviewed changes

op-chain-ops/cmd/celo-migrate/main.go Outdated Show resolved Hide resolved

alecps reviewed Mar 4, 2025

View reviewed changes

op-chain-ops/cmd/celo-migrate/main.go Show resolved Hide resolved

alecps reviewed Mar 4, 2025

View reviewed changes

Rename some items for clarity

402c2f1

Move log near relevant variable declaration

ec03587

ezdac approved these changes Mar 5, 2025

View reviewed changes

Exhaustively try to find finalized l1 start block

c027dab

Also removes reliance on beachoncha.in Tests not yet updated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministic L1 starting block selection (Config in advance) #342

Deterministic L1 starting block selection (Config in advance) #342

piersy commented Mar 3, 2025 •

edited

Loading

palango left a comment

palango Mar 4, 2025

piersy Mar 4, 2025

ezdac Mar 5, 2025

piersy Mar 5, 2025

ezdac Mar 6, 2025

palango Mar 4, 2025

piersy Mar 4, 2025 •

edited

Loading

alecps Mar 4, 2025

piersy Mar 4, 2025

palango Mar 4, 2025

piersy Mar 4, 2025 •

edited

Loading

piersy Mar 4, 2025 •

edited

Loading

alecps Mar 4, 2025

piersy Mar 4, 2025

alecps Mar 4, 2025

alecps commented Mar 4, 2025

ezdac left a comment

ezdac Mar 4, 2025

piersy Mar 5, 2025

ezdac Mar 4, 2025

ezdac Mar 5, 2025

ezdac Mar 5, 2025

piersy Mar 5, 2025

ezdac Mar 5, 2025

piersy Mar 5, 2025

ezdac Mar 5, 2025

ezdac Mar 5, 2025

piersy Mar 5, 2025

ezdac Mar 6, 2025

Deterministic L1 starting block selection (Config in advance) #342

Are you sure you want to change the base?

Deterministic L1 starting block selection (Config in advance) #342

Conversation

piersy commented Mar 3, 2025 • edited Loading

Tested

palango left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piersy Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piersy Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

piersy Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alecps commented Mar 4, 2025

ezdac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piersy commented Mar 3, 2025 •

edited

Loading

piersy Mar 4, 2025 •

edited

Loading

piersy Mar 4, 2025 •

edited

Loading

piersy Mar 4, 2025 •

edited

Loading