Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual finalization endpoint new pruning #7060

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Mar 1, 2025

Proposed Changes

Spicy mashup of:

This is BACKWARDS INCOMPATIBLE due to migrating the DB schema to v23 for Lion's improvements.

We might want to just go all in on this for v7.0.0 proper.

This is broken, do not use. See:

dapplion and others added 30 commits February 3, 2025 16:04
Currently we have very poor coverage of range sync with unit tests. With the event driven test framework we could cover much more ground and be confident when modifying the code.


  Add two basic cases:
- Happy path, complete a finalized sync for 2 epochs
- Post-PeerDAS case where we start without enough custody peers and later we find enough

⚠️  If you have ideas for more test cases, please let me know! I'll write them
Address misc PeerDAS TODOs that are not too big for a dedicated PR


  I'll justify each TODO on an inlined comment
Currently we track a key metric `PEERS_PER_COLUMN_SUBNET` in a getter `good_peers_on_sampling_subnets`. Another PR sigp#6922 deletes that function, so we have to move the metric anyway. This PR moves that metric computation to the metrics spawned task which is refreshed every 5 seconds.

I also added a few more useful metrics. The total set and intended usage is:

- `sync_peers_per_column_subnet`: Track health of overall subnets in your node
- `sync_peers_per_custody_column_subnet`: Track health of the subnets your node needs. We should track this metric closely in our dashboards with a heatmap and bar plot
- ~~`sync_column_subnets_with_zero_peers`: Is equivalent to the Grafana query `count(sync_peers_per_column_subnet == 0) by (instance)`. We may prefer to skip it, but I believe it's the most important metric as if `sync_column_subnets_with_zero_peers > 0` your node stalls.~~
- ~~`sync_custody_column_subnets_with_zero_peers`: `count(sync_peers_per_custody_column_subnet == 0) by (instance)`~~
- PR sigp#6497 made obsolete some consistency checks inside the batch

I forgot to remove the consumers of those errors


  Remove un-used batch sync error condition, which was a nested `Result<_, Result<_, E>>`
Addresses sigp#6854.

PeerDAS requires unsubscribing a Gossip topic at a fork boundary. This is not possible with our current topic machinery.


  Instead of defining which topics have to be **added** at a given fork, we define the complete set of topics at a given fork. The new start of the show and key function is:

```rust
pub fn core_topics_to_subscribe<E: EthSpec>(
fork_name: ForkName,
opts: &TopicConfig,
spec: &ChainSpec,
) -> Vec<GossipKind> {
// ...
if fork_name.deneb_enabled() && !fork_name.fulu_enabled() {
// All of deneb blob topics are core topics
for i in 0..spec.blob_sidecar_subnet_count(fork_name) {
topics.push(GossipKind::BlobSidecar(i));
}
}
// ...
}
```

`core_topics_to_subscribe` only returns the blob topics if `fork < Fulu`. Then at the fork boundary, we subscribe with the new fork digest to `core_topics_to_subscribe(next_fork)`, which excludes the blob topics.

I added `is_fork_non_core_topic` to carry on to the next fork the aggregator topics for attestations and sync committee messages. This approach is future-proof if those topics ever become fork-dependent.

Closes sigp#6854
@michaelsproul michaelsproul requested a review from jxs as a code owner March 1, 2025 12:54
@michaelsproul
Copy link
Member Author

michaelsproul commented Mar 1, 2025

Known issues:

  • HTTP heads endpoint returns blocks in fork choice that are descended from finalization but which have been pruned (because they don't descend from the fake finalization checkpoint). This is probably fine for now.

@michaelsproul michaelsproul added the backwards-incompat Backwards-incompatible API change label Mar 1, 2025
@michaelsproul
Copy link
Member Author

Usage:

curl -X POST --data '{"state_root": "0x7c0b6538b5e0a5b47f66168d72e476c6b9bc8a9882acf407c78d8000d8eee3ba", "block_root": "0xa67f0695d5ea7e8d2bcc01b6b7fbee0178c2a16cbf97f5b65496e0518deb5baf", "epoch": "116805"}' http://localhost:5052/lighthouse/finalize

Copy link

mergify bot commented Mar 3, 2025

This pull request has merge conflicts. Could you please resolve them @michaelsproul? 🙏

@michaelsproul
Copy link
Member Author

Closing in favour of:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-incompat Backwards-incompatible API change do-not-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants