Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(portal-bridge(beacon)): generate and gossip HistoricalSummariesWithProof content #1389

Merged
merged 1 commit into from
Aug 23, 2024

Conversation

ogenev
Copy link
Member

@ogenev ogenev commented Aug 20, 2024

What was wrong?

Fixes #1365.

How was it fixed?

  1. Add logic to generate and gossip HistoricalSummariesWithProof to the beacon network.
  2. Refactor how we spawn the tokio tasks to generate the beacon content in every slot. An issue was blocking the main loop if some of the tasks took longer to complete.
  3. Add some improvements to the BeaconState's deserializations from JSON strings. This was required to successfully deserialize the JSON responses from the pandaops node. Test asset was added to match the pandaops format.
  4. Update main and fallback consensus API endpoints (We are bypassing the load balancer on the main endpoint because there are some issues with it).

To-Do

@ogenev ogenev self-assigned this Aug 20, 2024
@ogenev ogenev added the beacon network Issue related to portal beacon network label Aug 20, 2024
@ogenev ogenev force-pushed the gossip-historical-summaries branch 2 times, most recently from 5e75579 to f0a1eef Compare August 21, 2024 07:17
@ogenev ogenev changed the title portal-bridge(beacon): generate and gossip HistoricalSummariesWithProof content feat(portal-bridge(beacon)): generate and gossip HistoricalSummariesWithProof content Aug 21, 2024
@ogenev ogenev force-pushed the gossip-historical-summaries branch 3 times, most recently from 842e4b8 to 47634a6 Compare August 21, 2024 08:58
@ogenev ogenev marked this pull request as ready for review August 21, 2024 09:10
@ogenev ogenev added the portal-bridge Portal bridge label Aug 21, 2024
Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that I don't have much knowledge about beacon network and its bridge logic, I'm not sure how worth my approval is (at least I didn't notice anything too wrong with it).

I left couple of small suggestions and questions (mostly for you to think about and reevaluate, since I don't have a good overview of the whole logic).

In addition, considering that there is a lot of tasks happening in parallel and it's hard to have an overview of their progress and success, it would be nice to have better metrics. But this is probably better done in a separate PR (I could be wrong and it might not even be that important or urgent).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss the future of the test_assets directory in Trin on some of the next calls. Currently, all of the test data for the beacon types is there and it was more consistent to put the new beacon state data there as well.

ethportal-api/src/types/consensus/beacon_state.rs Outdated Show resolved Hide resolved
@@ -30,6 +30,18 @@ impl ConsensusApi {
Ok(Self { primary, fallback })
}

/// Requests the `BeaconState` structure corresponding to the current head of the beacon chain.
pub async fn get_beacon_state(&self) -> anyhow::Result<String> {
let endpoint = "/eth/v2/debug/beacon/states/finalized".to_string();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: According to Beacon API spec, the debug endpoints (/eth/v2/debug/...) shouldn't be exposed publicly.
Just want to highlight in case you weren't aware.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but pandaops nodes expose that and it is the simplest way right now to download the beacon state.

@@ -118,17 +136,19 @@ impl BeaconBridge {
// If serving takes a little too long, then we want to serve the next one as soon as
// possible, but not serve any extras until the following slot. "Skip" gets this behavior.
interval.set_missed_tick_behavior(MissedTickBehavior::Skip);
interval.tick().await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we have to wait for the first tick? Considering that we are already waiting for the next update few lines above.

If answer is still yes, than maybe move it inside loop and do it at the start, instead of the end of the loop block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this is the case, but when I move the first interval.tick().await call in the loop, the loop runs 2 times in a row at the beginning without waiting for the tick. I haven't dived into this but it seems that calling interval.tick().await first before the loop fixes this issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't make sense to me, so I checked the documentation. The first tick completes immediately.

So yes, if you just remove this line, it will execute the loop twice immediately. But if you do something like this:

let mut interval = interval(Duration::from_secs(12));
interval.set_missed_tick_behavior(MissedTickBehavior::Skip);

loop {
    interval.tick().await;

    let consensus_api = self.api.clone();
    let portal_client = self.portal_client.clone();
    Self::serve_latest(...).await;
}

I believe that you will have expected behavior.

But no big deal, up to you.

portal-bridge/src/bridge/beacon.rs Outdated Show resolved Hide resolved
portal-bridge/src/bridge/beacon.rs Outdated Show resolved Hide resolved
@@ -142,122 +162,127 @@ impl BeaconBridge {
async fn serve_latest(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the serve_latest will start several (5?) tasks to serve data and return.
All these tasks will proceed running in the background.

It can happen that the same task is running in parallel (from a different tick), expect for the serve_historical_summaries_with_proof, which uses in_progress field to prevent this.

Do I understand things correctly? Is this desired behavior? Is it ok with serve_historical_summaries_with_proof skips a slot (because previous one was running too long)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, we want to serve only historical summaries from a newly finalized beacon state (every ~ 32 slots).

Because it takes a few minutes at best to download the beacon state data from the consensus nodes, we don't want to check and download the next finalized state if the current download is still in progress.

@ogenev ogenev force-pushed the gossip-historical-summaries branch from 47634a6 to aa83bda Compare August 22, 2024 08:04
Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -118,17 +136,19 @@ impl BeaconBridge {
// If serving takes a little too long, then we want to serve the next one as soon as
// possible, but not serve any extras until the following slot. "Skip" gets this behavior.
interval.set_missed_tick_behavior(MissedTickBehavior::Skip);
interval.tick().await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't make sense to me, so I checked the documentation. The first tick completes immediately.

So yes, if you just remove this line, it will execute the loop twice immediately. But if you do something like this:

let mut interval = interval(Duration::from_secs(12));
interval.set_missed_tick_behavior(MissedTickBehavior::Skip);

loop {
    interval.tick().await;

    let consensus_api = self.api.clone();
    let portal_client = self.portal_client.clone();
    Self::serve_latest(...).await;
}

I believe that you will have expected behavior.

But no big deal, up to you.

@ogenev ogenev force-pushed the gossip-historical-summaries branch from aa83bda to 5c6fd8c Compare August 23, 2024 08:13
@ogenev ogenev force-pushed the gossip-historical-summaries branch from 5c6fd8c to 7a2f484 Compare August 23, 2024 08:16
@ogenev ogenev merged commit d2f974f into ethereum:master Aug 23, 2024
8 checks passed
@ogenev ogenev deleted the gossip-historical-summaries branch August 23, 2024 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beacon network Issue related to portal beacon network portal-bridge Portal bridge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

portal-bridge(beacon): Gossip HistoricalSummariesWithProof to beacon portal network
2 participants