Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint Sync API #226

Closed
wants to merge 8 commits into from
Closed

Conversation

mkalinin
Copy link

@mkalinin mkalinin commented Jul 29, 2022

Specification

GET /eth/v1/checkpoint/finalized_state

  • Returns BeaconState object for a finalized checkpoint state from the WS period

GET /eth/v1/checkpoint/finalized_blocks/{slot}/root

  • Returns 404 if a block at a slot is either unavailable or not yet finalized
  • Otherwise, returns beacon block root

Motivation

Facilitates checkpoint sync adoption by simplifying the following scheme:

  • [Few] State provider(s) supply a state at a finalized checkpoint within WS period
  • [Many] Trust providers expose /eth/v1/checkpoint/finalized_blocks/{slot}/root endpoint to allow for checking that a block in the pulled state is finalized

The finalized_blocks/{slot}/root endpoint is a shortcut to the following actions:

  1. GET /eth/v1/beacon/blocks/{slot}/root
  2. GET /eth/v1/beacon/headers/finalized
  3. Check that slot <= finalized_header.slot

The finalized_state endpoint is an alias to GET /eth/v2/debug/beacon/states/finalized

cc @ajsutton @djrtwo

@mkalinin mkalinin changed the title Add getFinalizedBlockRoot Add getFinalizedBlockRoot and getFinalizedState Aug 3, 2022
@mkalinin mkalinin changed the title Add getFinalizedBlockRoot and getFinalizedState Checkpoint Sync API Aug 3, 2022
@MicahZoltu
Copy link

MicahZoltu commented Aug 4, 2022

From call: These probably shouldn't be part of the existing APIs because they are meant for third parties to connect to, rather than the operator to connect to (like existing APIs). This means exposed on a different port so it is easy to expose to the internet without exposing everything else to the internet. It may have separate rate limiting and firewalling, which is easier if it is a separate port.

The trust endpoint in particular we want as widely available as possible, so we should make it very easy for people to expose that to the internet publicly with minimal effort/risk.

@samcm
Copy link
Member

samcm commented Aug 4, 2022

Another point mentioned in the call was the additional functionality that client teams may need to implement in order to checkpoint sync with only a finalized state & a block root.

At the moment checkpoint sync requires differing sets of data depending on the client implementation. For example the complete set of routes to provide checkpoint sync for all clients is something like this:

  • GET /eth/v1/beacon/states/head/finality_checkpoints

  • GET /eth/v1/beacon/blocks/genesis/root

  • GET /eth/v2/beacon/blocks/{finalized/block_root}

  • GET /eth/v2/debug/beacon/states/{genesis/finalized/state_root}

@arnetheduck
Copy link
Contributor

These probably shouldn't be part of the existing APIs because they are meant for third parties to connect to, rather than the operator to connect to

Notably, downloading a state is very expensive (100+mb in json) - nobody running a beacon node with validators attached would be advised to be supplying beacon states (for their own good) - conversely, for beacon nodes that don't support active validators, there is little harm in exposing all the REST API - the beacon state call is by far one of the most expensive ones you can expose anyway, making most of the rest benign.

@arnetheduck
Copy link
Contributor

the complete

GET /eth/v2/beacon/blocks/{block_root} - in the libp2p protocol, clients are not required to keep a root -> block index - they are required however to keep a slot -> block index for the relevant WS period (to support by-range requests - by-root requests are only supported for the non-finalized period, and checkpoint syncing should likely follow the same constraints - notably, fetching blocks for the entire WS period is the only way to not violate the libp2p spec (while backfilling, clients are in violation of the libp2p spec by not providing historical data and may be disconnected) so the block request is really "required" for a full checkpoint sync.

genesis_root is interesting in that it's usually part of the "metadata" of the chain (in addition to a state, a chain config is also needed), but not necessarily - getting the genesis state or root via API is one way of avoiding this having to be given at startup, but commonly, the genesis state is available from chain metadata, thus fetching it is not really needed.

GET /eth/v2/debug/beacon/states/{genesis/finalized/state_root}

The state_root lookup is problematic in that there is no requirement elsewhere in the protocol as to which states should be indexed by state root (most clients store only states at certain intervals, and some don't support by-state-root lookups at all because they're expensive) - thus a requirement to index by state roots would require specifying exactly which states should be "fetchable" - restraining this to the finalized state is one way to do this.

@ajsutton
Copy link
Contributor

ajsutton commented Aug 8, 2022

Notably, downloading a state is very expensive (100+mb in json) - nobody running a beacon node with validators attached would be advised to be supplying beacon states (for their own good)

We should specify that these endpoints only return SSZ at least for the state. There's no reason to serialize to a much bigger json representation in this case. Even so I don't think we're targeting nodes that are running validators here - for security reasons alone I wouldn't recommend exposing any APIs from a node running validators.

conversely, for beacon nodes that don't support active validators, there is little harm in exposing all the REST API - the beacon state call is by far one of the most expensive ones you can expose anyway, making most of the rest benign.

This is definitely incorrect. The /eth/v1/beacon/states/:stateId/validators endpoint is actually the most expensive and problematic API currently. And the fact that it can request any state makes it dramatically more expensive than these proposed APIs even if both supported JSON since it means caching is ineffective. We've got a lot of experience exposing the REST API publicly and there are a lot of minefields there.

@ajsutton
Copy link
Contributor

ajsutton commented Aug 8, 2022

Another point mentioned in the call was the additional functionality that client teams may need to implement in order to checkpoint sync with only a finalized state & a block root.

At the moment checkpoint sync requires differing sets of data depending on the client implementation.

I would say the point of these new APIs to to define a small set of APIs that clients will need to work with. It will mean clients needing to do additional work so they can start from just a BeaconState but otherwise it's significantly harder to be a state provider which makes it hard to convince people to provide them (and basically centralizes on Infura). While we don't need a lot of state providers, given that trust providers are separate, we still want more than one or two and we want to make it easy for them to be reliable.

@arnetheduck
Copy link
Contributor

The /eth/v1/beacon/states/:stateId/validators endpoint is actually the most expensive and problematic API currently.

I guess this is client-dependent as far as implementation / expensiveness goes (ie how costly it is to generate a "filtered" response) - but given that the full state is a superset of the validators, the responses are certainly larger -> more expensive (from a bandwidth perspective).

+1 agree on the point that limiting to finalized is a good idea for cache:ability - that said, this is equivalent to simply not responding to non-finalized requests in the current {:state-id} based API (ie just because by-state-root requests are specified does not mean that they need to be enabled) - therefore it might be more flexible to allow a full state id, but specify that clients wanting to a checkpoint sync "SHOULD" use finalized.

As such, one would expose /eth/v1/beacon/states/{:state-id} (to make this a non-debug API) making this an entirely regular and consistent request in line with the existing calls - clients that wish to constrain this can respond 501 or 503 when state-id is not finalized and we add guidance documentation for "checkpoint-sync-consumption" instead, so that clients converge on a single way to get the state.

+1 that not exposing JSON for SSZ might be a good idea, but even the SSZ is ~60mb and growing - this is not a light request by any means - again, we can solve this with guidance docs for client implementers (which would only use SSZ or use quality preferences to prefer ssz etc).

[Many] Trust providers expose

What is a "trust provider" in this context, and where do you get a list of them (since there are many)? If it's a well-known list, they are open to MITM, DoS, blocking etc, so we need to tread carefully - if instead they are random nodes on the internet used for a majority decision, it might be simpler to expose this via libp2p and get discovery for free.

@MicahZoltu
Copy link

What is a "trust provider" in this context, and where do you get a list of them (since there are many)? If it's a well-known list, they are open to MITM, DoS, blocking etc, so we need to tread carefully - if instead they are random nodes on the internet used for a majority decision, it might be simpler to expose this via libp2p and get discovery for free.

I am envisioning people on the internet who run Beacon nodes just telling their social network where it is. This may be private between a group of friends, or it may be a tweet, or maybe you post it on the sidebar of your blog. Some institutional providers may make it available more widely as a way of building a positive reputation as well.

I'm definitely not envisioning some well known list of "these are the trusted trust providers". It should be up to each individual to define their trust network (or delegate that to someone they trust).

@arnetheduck
Copy link
Contributor

I am envisioning people on the internet who run Beacon nodes just telling their social network where it is.

If this is the use case, they might as well provide a state - it's a one-off, and it's not that bloody (unless you run a public service, in which case you rate limit it and then you're done) - ie it's important to keep in mind that there are two kinds of beacon nodes: those with validators and those without - for the former, you will not want to expose any API and for the latter category, there is really very little / no harm in exposing the entire API (minus keymanager of course).

@MicahZoltu
Copy link

I would be comfortable sharing my checkpoint endpoint on Twitter and linking in the sidebar of my blog or something, where the reach is not huge, but also unbounded. I would not be comfortable sharing my state endpoint with that broad of an audience though. As an anecdote: I previously exposed my execution JSON-RPC API but didn't advertise it (just a DNS record and referenced it internally in an app I built) and I find it being used by random dapps and people throughout the ecosystem to this day, despite me never publicly linking to it anywhere.

The problem with the state endpoint is that I have to do something to protect myself, but with the checkpoint endpoint I likely don't have to do anything to protect myself (besides normal stuff that I already have for any publicly facing service). Even just a little bit of discouragement is enough to make people not expose a thing, and we want as many checkpoint endpoints available as possible.

Comment on lines 20 to 24
properties:
version:
type: string
enum: [ phase0, altair, bellatrix ]
example: "phase0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't include the is_optimistic flag the existing getState has. Do we want to add that or should this endpoint simply refuse to return a state if it's optimistically syncing? We probably should be clear about that whichever way we go.

Also worth noting that the state being finalized would be affected by the chain head being optimistic even if the state returned itself is fully validated (because the optimistic head may have updated finalisation).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should be conservative on that and respond 404 if the node is optimistic. I'll add a respective note

Comment on lines 5 to 6
Returns full BeaconState object for a finalized checkpoint state from the WS period.
Depending on `Accept` header it can be returned either as json or as bytes serialized by SSZ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason for this API to support returning JSON. It makes the request significantly more expensive to serve and use more bandwidth, and any client needs to be able to support an SSZ BeaconState. The debug endpoint is available for humans or other tools that want a more readable version of the spec.

apis/checkpoint/finalized_blocks/root.yaml Outdated Show resolved Hide resolved
get:
operationId: getFinalizedBlockRoot
summary: Get finalized block root
description: Retrieves hashTreeRoot of finalized BeaconBlock/BeaconBlockHeader. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be returned if the specified slot is an empty slot? Should it return the last block before that slot or 404?

I don't think it overly matters because you can't calculate the correct block root from a state if it's had empty slots applied, so if you have a usable state it will be from a slot that contains a block. It's probably worth returning 404 for empty slots though just to be consistent with the /eth/v1/beacon/blocks/{blockId}/root API.

operationId: "getFinalizedCheckpointState"
summary: "Get full BeaconState object for finalized checkpoint state"
description: |
Returns full BeaconState object for a finalized checkpoint state from the WS period.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth explicitly stating that the state should not have empty slots applied. That is it should be the state that is referenced by the block some finalized checkpoint root actually points to.

@mkalinin
Copy link
Author

mkalinin commented Sep 8, 2022

Alternatively, we may have GET /eth/v1/checkpoint/finalized_block_state returning bundled block and state at the finalized checkpoint, SSZ encoded:

class BeaconBlockAndState(Container):
  block: BeaconBlock
  state: BeaconState

@michaelsproul
Copy link
Contributor

Alternatively, we may have GET /eth/v1/checkpoint/finalized_block_state returning bundled block and state at the finalized checkpoint, SSZ encoded

I would be in favour of this for Lighthouse as the assumption that blocks and states come in pairs is quite deeply embedded. I'm not saying it would be impossible to remove but it would likely represent a substantial amount of work, particularly testing that no assumptions about block existence are violated. We often use the existence of a block in our database to determine if it is known/canonical, and currently block processing expects to be able to verify each block against its parent block + state.

Another related issue for us is that we currently require the checkpoint block to be from a non-skipped slot, although I think this requirement would be easier to remove (see sigp/lighthouse#3210).

@MicahZoltu
Copy link

Alternatively, we may have GET /eth/v1/checkpoint/finalized_block_state returning bundled block and state at the finalized checkpoint, SSZ encoded:

This is the Engine API, or a different API?

@michaelsproul
Copy link
Contributor

This is the Engine API, or a different API?

Nah, the beacon API served by consensus clients (i.e. this repo). Spec is rendered online here: https://ethereum.github.io/beacon-APIs/

@MicahZoltu
Copy link

Ah. As mentioned above, I think checkpoint stuff should be exposed on different ports because the expectation is that a different set of people will have access to these things.

Finalized Slot Root: Everyone
Checkpoint State: Friends & Family
Beacon API: Just Me

@mkalinin
Copy link
Author

mkalinin commented Sep 9, 2022

I agree on the port different to what we have for beacon APIs. And I also think that CL clients should have at least the following set of flags:

  • --checkpoint-api-enabled[=<BOOLEAN>], default: false
  • --checkpoint-api-port=<INTEGER>, default: some_port
  • --checkpoint-api-state-enabled[=<BOOLEAN>], default: false

State endpoint shouldn't be enabled by default as it's heavy. I guess setting an HTTP server with caching in front of checkpoint sync state endpoint would be an optimal strategy for state providers. They could set say 1h TTL for this cache

@mcdee
Copy link
Contributor

mcdee commented Sep 9, 2022

I think that this conversation has veered out of scope for an endpoint definition. In my opinion, this endpoint should be added to the beacon APIs same as the others, and made available on the same basis as the rest of the endpoints.

If there is a desire for this endpoint to be exposed to external/untrusted parties it might require any number of features such as authentication, DDoS protection, rate limiting, internal caching etc. that don't fall under the remit of an API specification. These can all be accomplished better by a piece of middleware that provides these features than attempting to add them to multiple beacon client implementations.

So please: no dedicated port, no specific options in the CL for this. Let's build an endpoint definition here and handle presentation of it elsewhere.

@MicahZoltu
Copy link

So please: no dedicated port, no specific options in the CL for this. Let's build an endpoint definition here and handle presentation of it elsewhere.

I think the chance of a user shooting themselves in the foot is higher, and the chance of users providing these APIs is significantly lower if we just throw them all onto the Beacon API and tell users "if you want to provide a public good, you can do a bunch of additional work to make that happen". At the least, the checkpoint API is incredibly lightweight and probably safe for the vast majority of people to just expose freely.

  • --checkpoint-api-enabled[=<BOOLEAN>], default: false
  • --checkpoint-api-port=<INTEGER>, default: some_port
  • --checkpoint-api-state-enabled[=<BOOLEAN>], default: false

I think there should be another for --checkpoint-api-state-port=<INTEGER>, default: some_other_port.

@mcdee
Copy link
Contributor

mcdee commented Sep 9, 2022

At the least, the checkpoint API is incredibly lightweight and probably safe for the vast majority of people to just expose freely.

It's a DoS vector, especially given the size of the state. But again, this is down to the implementation and has no impact on the inputs, processing or outputs of the endpoint which is what should be being discussed here.

@ajsutton
Copy link
Contributor

To set the finalized state to the state from the start of epoch n+1 would be incorrect, and might allow an attacker to feed us blocks from n+1 that conflict with finalization without us knowing.

Why is that a problem? As long as the state is from within the weak subjectivity period they won't be able to make you finalise something incorrect and so you'll wind up finding the right chain and following it. You are basically in the same situation as you would have been had you been in sync and following the chain from epoch n onwards.

You would get a problem if n+2 was within weak subjectivity but the state was from before weak subjectivity period but the requirement is that the state itself be within the weak subjectivity period.

So to answer the specific question, you'd process slots to move the state forward to the start of the next epoch. You know there weren't any blocks in that period (or the state should have been from one of those blocks) but you can't tell if there were no blocks for the next epoch as well so can't take it any further forward.

@michaelsproul
Copy link
Contributor

@ajsutton I think my example is actually too conservative, if there's a gap with no blocks that spans the weak subjectivity period then the epoch following block $B$ could be old enough for an attacker to equivocate without penalty. It would be very bad to have the canonical chain lacking blocks for this long, but it could happen as a result of a major internet outage or an inactivity leak + block proposal bugs. This kind of relates to what @arnetheduck was saying about the finalized block being >8192 slots prior to the finalized checkpoint.

I hadn't made this connection before in relation to checkpoint sync, so figured it was worth noting. I agree in practice it's unlikely to be relevant, but if we're adding a new endpoint that bundles the state and a block, we may as well throw the finalized epoch in there too.

@ajsutton
Copy link
Contributor

ajsutton commented Sep 13, 2022

@ajsutton I think my example is actually too conservative, if there's a gap with no blocks that spans the weak subjectivity period then the epoch following block B could be old enough for an attacker to equivocate without penalty. It would be very bad to have the canonical chain lacking blocks for this long, but it could happen as a result of a major internet outage or an inactivity leak + block proposal bugs. This kind of relates to what @arnetheduck was saying about the finalized block being >8192 slots prior to the finalized checkpoint.

For this to be a problem we'd need to have a block at say slot 10,000, followed by a long gap of multiple weeks so it's longer than the weak subjectivity period to say 110,000 where there are no blocks at all. Then we'd need to include some blocks that contain attestations from that offline period such that we finalise the empty slots but we don't finalize any of the new blocks, and you'd have to by wanting to do a checkpoint sync before finalisation updates to include any new blocks. The attacker meanwhile has to create the alternative chain, with validators that exited and became withdrawal during the period of empty blocks and perform a sybil attack to get you to follow their chain instead of the correct chain from the state you sync'd from. That would be quite impressive and the fix would be to wait until the chain finalizes a new block.

I'm not convinced this is a viable threat let alone that it's a bigger risk than managing to introduce a bug in the extra code required to handle the checkpoint epoch being provided (because you just know you're going to wind up getting a checkpoint epoch that's actually from before the state due to some race condition).

@arnetheduck
Copy link
Contributor

More importantly the API proposed here to return a state allows the node to select which state to return whereas the existing API allows any state to be requested

We would white-list only the finalized state id in the checkpoint consumer documentation and servers would expose only that - there is no additional burden or risk for the server compared to the proposal in this PR: as long as consumers agree to use only finalized, and use only SSZ, that is sufficient for the "servers" to standardise on exposing state/finalized (and maybe block with a general "finalized" flag that is useful for other consumers too)

@mkalinin
Copy link
Author

More importantly the API proposed here to return a state allows the node to select which state to return whereas the existing API allows any state to be requested

We would white-list only the finalized state id in the checkpoint consumer documentation and servers would expose only that - there is no additional burden or risk for the server compared to the proposal in this PR: as long as consumers agree to use only finalized, and use only SSZ, that is sufficient for the "servers" to standardise on exposing state/finalized (and maybe block with a general "finalized" flag that is useful for other consumers too)

Altering the path to .../checkpoint/finalized_state does seem to be minor change relating to separate port capability, and only-SSZ and finalized requirements that server will have to check (isn't separate controller already needed to make this work?)

@mkalinin
Copy link
Author

if we only allow checkpoint states that have a block in the same slot, the problem is simplified by constrains the spec

I think we can defer it to implementations. Currently the spec is agnostic to that, so, implementations may choose to return the most recent finalized state that does have a block in the first slot of the epoch

@arnetheduck
Copy link
Contributor

Currently the spec is agnostic to that, so, implementations may choose to return the most recent finalized state that does have a block in the first slot of the epoch

it's a problem for the client, not the server: if the server returns empty slots, clients must be able to deal with, because with only finalized available, they can't pick a slot (what we do now is that when the slot is empty, we jump back an epoch and try again)

@arnetheduck
Copy link
Contributor

Altering the path to .../checkpoint/finalized_state does seem to be minor change relating to separate port capability

we still have to support the old debug path in general, meaning we have to support the exact same request on two paths - this is .. well, redundant - doable, just.. ugly.

arnetheduck added a commit to status-im/nimbus-eth2 that referenced this pull request Oct 21, 2022
Currently, we require genesis and a checkpoint block and state to start
from an arbitrary slot - this PR relaxes this requirement so that we can
start with a state alone.

The current trusted-node-sync algorithm works by first downloading
blocks until we find an epoch aligned non-empty slot, then downloads the
state via slot.

However, current
[proposals](ethereum/beacon-APIs#226) for
checkpointing prefer finalized state as
the main reference - this allows more simple access control and caching
on the server side - in particular, this should help checkpoint-syncing
from sources that have a fast `finalized` state download (like infura
and teku) but are slow when accessing state via slot.

Earlier versions of Nimbus will not be able to read databases created
without a checkpoint block and genesis. In most cases, backfilling makes
the database compatible except where genesis is also missing (custom
networks).

* backfill checkpoint block from libp2p instead of checkpoint source,
when doing trusted node sync
* allow starting the client without genesis / checkpoint block
* perform epoch start slot lookahead when loading tail state, so as to
deal with the case where the epoch start slot does not have a block
* replace `--blockId` with `--state-id` in TNS command line
* when replaying, also look at the parent of the last-known-block (even
if we don't have the parent block data, we can still replay from a
"parent" state) - in particular, this clears the way for implementing
state pruning
* deprecate `--finalized-checkpoint-block` option (no longer needed)
@arnetheduck
Copy link
Contributor

As of status-im/nimbus-eth2#4251, Nimbus will no longer require a block for checkpoint syncing, rather it will download just a state (that has to be epoch-aligned) - we'll also support syncing from a checkpoint whose slot is empty (up to 1 epoch of empty slots supported)

@zah
Copy link

zah commented Oct 26, 2022

I think any up-to-date proposal for checkpoint sync API should take into account EIP-4881, which describes a process similar to checkpoint sync, but related to the state of the deposit contract.

Nimbus and Lighthouse already have the necessary mechanisms for consuming the DepositTreeSnapshot values discussed in the EIP. Other clients that store the full history of deposits will still be able to produce snapshots through relatively simple code additions.

The advantage of bundling all checkpoint-like structures in a single response is that this improves the simplicity of the client which no longer needs to handle the contradictory responses that might arise due to race conditions when the checkpoint sync is executed exactly when the serving node is finalizing a new epoch.

@rolfyone
Copy link
Contributor

Now that checkpointz exists, is it still our intent to merge this to beacon-apis? There was active discussion which has died off, I'm not sure what the status of this PR is...

arnetheduck added a commit to arnetheduck/eth2.0-APIs that referenced this pull request Oct 28, 2022
Whether or not a request pertains to the finalized section of the chain
(per the view of the client fork choice) is somewhat cumbersome to
discover.

This PR adds a boolean that allows clients to distinguish a response
that has been finalized and thus is unlikely to change from one that may
still change over time (specially when using slot-based requests).

A flag like this can also be used for the purpose of verifying that a
checkpoint root indeed is part of chain history and is likely to remain
as such, as discussed in
ethereum#226
@arnetheduck
Copy link
Contributor

one can imagine that having is_finalized on getBlocksV2 and its more detailed friends is something any consumer would want to know.

#254

@ajsutton
Copy link
Contributor

I think any up-to-date proposal for checkpoint sync API should take into account EIP-4881, which describes a process similar to checkpoint sync, but related to the state of the deposit contract.

For the record, the plan is to remove the need for deposit snapshots entirely as part of removing the follow distance and updating deposit handling to take advantage of the post-merge world. So I probably wouldn't design APIs to use deposit snapshots at this stage.

arnetheduck added a commit to status-im/nimbus-eth2 that referenced this pull request Nov 2, 2022
Currently, we require genesis and a checkpoint block and state to start
from an arbitrary slot - this PR relaxes this requirement so that we can
start with a state alone.

The current trusted-node-sync algorithm works by first downloading
blocks until we find an epoch aligned non-empty slot, then downloads the
state via slot.

However, current
[proposals](ethereum/beacon-APIs#226) for
checkpointing prefer finalized state as
the main reference - this allows more simple access control and caching
on the server side - in particular, this should help checkpoint-syncing
from sources that have a fast `finalized` state download (like infura
and teku) but are slow when accessing state via slot.

Earlier versions of Nimbus will not be able to read databases created
without a checkpoint block and genesis. In most cases, backfilling makes
the database compatible except where genesis is also missing (custom
networks).

* backfill checkpoint block from libp2p instead of checkpoint source,
when doing trusted node sync
* allow starting the client without genesis / checkpoint block
* perform epoch start slot lookahead when loading tail state, so as to
deal with the case where the epoch start slot does not have a block
* replace `--blockId` with `--state-id` in TNS command line
* when replaying, also look at the parent of the last-known-block (even
if we don't have the parent block data, we can still replay from a
"parent" state) - in particular, this clears the way for implementing
state pruning
* deprecate `--finalized-checkpoint-block` option (no longer needed)
arnetheduck added a commit to status-im/nimbus-eth2 that referenced this pull request Nov 2, 2022
Currently, we require genesis and a checkpoint block and state to start
from an arbitrary slot - this PR relaxes this requirement so that we can
start with a state alone.

The current trusted-node-sync algorithm works by first downloading
blocks until we find an epoch aligned non-empty slot, then downloads the
state via slot.

However, current
[proposals](ethereum/beacon-APIs#226) for
checkpointing prefer finalized state as
the main reference - this allows more simple access control and caching
on the server side - in particular, this should help checkpoint-syncing
from sources that have a fast `finalized` state download (like infura
and teku) but are slow when accessing state via slot.

Earlier versions of Nimbus will not be able to read databases created
without a checkpoint block and genesis. In most cases, backfilling makes
the database compatible except where genesis is also missing (custom
networks).

* backfill checkpoint block from libp2p instead of checkpoint source,
when doing trusted node sync
* allow starting the client without genesis / checkpoint block
* perform epoch start slot lookahead when loading tail state, so as to
deal with the case where the epoch start slot does not have a block
* replace `--blockId` with `--state-id` in TNS command line
* when replaying, also look at the parent of the last-known-block (even
if we don't have the parent block data, we can still replay from a
"parent" state) - in particular, this clears the way for implementing
state pruning
* deprecate `--finalized-checkpoint-block` option (no longer needed)
rolfyone added a commit that referenced this pull request Nov 9, 2022
Whether or not a request pertains to the finalized section of the chain
(per the view of the client fork choice) is somewhat cumbersome to
discover.

This PR adds a boolean that allows clients to distinguish a response
that has been finalized and thus is unlikely to change from one that may
still change over time (specially when using slot-based requests).

A flag like this can also be used for the purpose of verifying that a
checkpoint root indeed is part of chain history and is likely to remain
as such, as discussed in
#226

* fix dots

Co-authored-by: Paul Harris <[email protected]>
@mkalinin
Copy link
Author

I was revisiting this proposal recently and came to conclusion that it has lost its actuality. Leaving it open to give enough time for those who are interested to chime in and express their opinion. If there is no objections I will close this PR in a week. My justification is provided below.

State provider API is successfully implemented as a part of Checkpointz tool. The tool does also provide caching and DoS protection for endpoints serving the state and other pieces of data that different clients require to bootstrap a node. Once every CL client becomes capable of starting from a checkpoint state only, the tool can be equipped with GET /eth/v1/checkpoint/finalized_state or similar API if it is desired. It would be great to see more tools implementing state provider API but here diversity is not as critical as in the trust provider use case.

After #254 got merged, CL clients are already equipped with API providing verification of a given block/state similar to trust provider's API proposed in this PR. The main obstacle for not experienced user in exposing this API to the internet is configuration that would make the access to it secure. That is, adding DoS protection and limiting the access to this exact endpoint only or a subset of endpoints which are safe to be exposed. Using Checkpointz as a trust provider could reduce the barrier but in this case the tool would become a single point of failure in a critical part of the system.

There are a couple of other solutions that can alleviate the configuration issue described above:

  • Support exposing getBlockRoot API on a different port by client software. This option has almost perfect UX, but comes with additional engineering complexity as it may need a DoS protection embedded into a client.
  • Another option that we have discussed with @samcm in Discord is to provide installation and configuration guides for a few HTTP servers to setup them as a trust provider proxy for BN. Note that caching trust provider API responses is potentially dangerous and should be avoided if possible.

Having more tools like Checkpointz can also be considered, but I personally feel like it will create yet another diversity problem that we will have to care about.

@dapplion
Copy link
Member

dapplion commented Dec 5, 2022

I was revisiting this proposal recently and came to conclusion that it has lost its actuality. Leaving it open to give enough time for those who are interested to chime in and express their opinion. If there is no objections I will close this PR in a week. My justification is provided below.

Fine for me to close, checkpointz provides a better solution that I feel ok recommending to users

@mkalinin
Copy link
Author

mkalinin commented Dec 5, 2022

Closing the PR, see #226 (comment) for details

@mkalinin mkalinin closed this Dec 5, 2022
Ingridmichelledev added a commit to Ingridmichelledev/API-beacons that referenced this pull request Mar 3, 2023
Whether or not a request pertains to the finalized section of the chain
(per the view of the client fork choice) is somewhat cumbersome to
discover.

This PR adds a boolean that allows clients to distinguish a response
that has been finalized and thus is unlikely to change from one that may
still change over time (specially when using slot-based requests).

A flag like this can also be used for the purpose of verifying that a
checkpoint root indeed is part of chain history and is likely to remain
as such, as discussed in
ethereum/beacon-APIs#226

* fix dots

Co-authored-by: Paul Harris <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants