-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpoint Sync API #226
Checkpoint Sync API #226
Conversation
From call: These probably shouldn't be part of the existing APIs because they are meant for third parties to connect to, rather than the operator to connect to (like existing APIs). This means exposed on a different port so it is easy to expose to the internet without exposing everything else to the internet. It may have separate rate limiting and firewalling, which is easier if it is a separate port. The trust endpoint in particular we want as widely available as possible, so we should make it very easy for people to expose that to the internet publicly with minimal effort/risk. |
Another point mentioned in the call was the additional functionality that client teams may need to implement in order to checkpoint sync with only a finalized state & a block root. At the moment checkpoint sync requires differing sets of data depending on the client implementation. For example the complete set of routes to provide checkpoint sync for all clients is something like this:
|
Notably, downloading a state is very expensive (100+mb in json) - nobody running a beacon node with validators attached would be advised to be supplying beacon states (for their own good) - conversely, for beacon nodes that don't support active validators, there is little harm in exposing all the REST API - the beacon state call is by far one of the most expensive ones you can expose anyway, making most of the rest benign. |
The |
We should specify that these endpoints only return SSZ at least for the state. There's no reason to serialize to a much bigger json representation in this case. Even so I don't think we're targeting nodes that are running validators here - for security reasons alone I wouldn't recommend exposing any APIs from a node running validators.
This is definitely incorrect. The |
I would say the point of these new APIs to to define a small set of APIs that clients will need to work with. It will mean clients needing to do additional work so they can start from just a BeaconState but otherwise it's significantly harder to be a state provider which makes it hard to convince people to provide them (and basically centralizes on Infura). While we don't need a lot of state providers, given that trust providers are separate, we still want more than one or two and we want to make it easy for them to be reliable. |
I guess this is client-dependent as far as implementation / expensiveness goes (ie how costly it is to generate a "filtered" response) - but given that the full state is a superset of the validators, the responses are certainly larger -> more expensive (from a bandwidth perspective). +1 agree on the point that limiting to finalized is a good idea for cache:ability - that said, this is equivalent to simply not responding to non-finalized requests in the current As such, one would expose +1 that not exposing JSON for SSZ might be a good idea, but even the SSZ is ~60mb and growing - this is not a light request by any means - again, we can solve this with guidance docs for client implementers (which would only use SSZ or use quality preferences to prefer ssz etc).
What is a "trust provider" in this context, and where do you get a list of them (since there are many)? If it's a well-known list, they are open to MITM, DoS, blocking etc, so we need to tread carefully - if instead they are random nodes on the internet used for a majority decision, it might be simpler to expose this via libp2p and get discovery for free. |
I am envisioning people on the internet who run Beacon nodes just telling their social network where it is. This may be private between a group of friends, or it may be a tweet, or maybe you post it on the sidebar of your blog. Some institutional providers may make it available more widely as a way of building a positive reputation as well. I'm definitely not envisioning some well known list of "these are the trusted trust providers". It should be up to each individual to define their trust network (or delegate that to someone they trust). |
If this is the use case, they might as well provide a state - it's a one-off, and it's not that bloody (unless you run a public service, in which case you rate limit it and then you're done) - ie it's important to keep in mind that there are two kinds of beacon nodes: those with validators and those without - for the former, you will not want to expose any API and for the latter category, there is really very little / no harm in exposing the entire API (minus keymanager of course). |
I would be comfortable sharing my checkpoint endpoint on Twitter and linking in the sidebar of my blog or something, where the reach is not huge, but also unbounded. I would not be comfortable sharing my state endpoint with that broad of an audience though. As an anecdote: I previously exposed my execution JSON-RPC API but didn't advertise it (just a DNS record and referenced it internally in an app I built) and I find it being used by random dapps and people throughout the ecosystem to this day, despite me never publicly linking to it anywhere. The problem with the state endpoint is that I have to do something to protect myself, but with the checkpoint endpoint I likely don't have to do anything to protect myself (besides normal stuff that I already have for any publicly facing service). Even just a little bit of discouragement is enough to make people not expose a thing, and we want as many checkpoint endpoints available as possible. |
apis/checkpoint/finalized_state.yaml
Outdated
properties: | ||
version: | ||
type: string | ||
enum: [ phase0, altair, bellatrix ] | ||
example: "phase0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't include the is_optimistic
flag the existing getState has. Do we want to add that or should this endpoint simply refuse to return a state if it's optimistically syncing? We probably should be clear about that whichever way we go.
Also worth noting that the state being finalized would be affected by the chain head being optimistic even if the state returned itself is fully validated (because the optimistic head may have updated finalisation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we should be conservative on that and respond 404 if the node is optimistic. I'll add a respective note
apis/checkpoint/finalized_state.yaml
Outdated
Returns full BeaconState object for a finalized checkpoint state from the WS period. | ||
Depending on `Accept` header it can be returned either as json or as bytes serialized by SSZ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason for this API to support returning JSON. It makes the request significantly more expensive to serve and use more bandwidth, and any client needs to be able to support an SSZ BeaconState. The debug endpoint is available for humans or other tools that want a more readable version of the spec.
get: | ||
operationId: getFinalizedBlockRoot | ||
summary: Get finalized block root | ||
description: Retrieves hashTreeRoot of finalized BeaconBlock/BeaconBlockHeader. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be returned if the specified slot is an empty slot? Should it return the last block before that slot or 404?
I don't think it overly matters because you can't calculate the correct block root from a state if it's had empty slots applied, so if you have a usable state it will be from a slot that contains a block. It's probably worth returning 404 for empty slots though just to be consistent with the /eth/v1/beacon/blocks/{blockId}/root
API.
apis/checkpoint/finalized_state.yaml
Outdated
operationId: "getFinalizedCheckpointState" | ||
summary: "Get full BeaconState object for finalized checkpoint state" | ||
description: | | ||
Returns full BeaconState object for a finalized checkpoint state from the WS period. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably worth explicitly stating that the state should not have empty slots applied. That is it should be the state that is referenced by the block some finalized checkpoint root actually points to.
Co-authored-by: Adrian Sutton <[email protected]>
Alternatively, we may have class BeaconBlockAndState(Container):
block: BeaconBlock
state: BeaconState |
I would be in favour of this for Lighthouse as the assumption that blocks and states come in pairs is quite deeply embedded. I'm not saying it would be impossible to remove but it would likely represent a substantial amount of work, particularly testing that no assumptions about block existence are violated. We often use the existence of a block in our database to determine if it is known/canonical, and currently block processing expects to be able to verify each block against its parent block + state. Another related issue for us is that we currently require the checkpoint block to be from a non-skipped slot, although I think this requirement would be easier to remove (see sigp/lighthouse#3210). |
This is the Engine API, or a different API? |
Nah, the beacon API served by consensus clients (i.e. this repo). Spec is rendered online here: https://ethereum.github.io/beacon-APIs/ |
Ah. As mentioned above, I think checkpoint stuff should be exposed on different ports because the expectation is that a different set of people will have access to these things. Finalized Slot Root: Everyone |
I agree on the port different to what we have for beacon APIs. And I also think that CL clients should have at least the following set of flags:
State endpoint shouldn't be enabled by default as it's heavy. I guess setting an HTTP server with caching in front of checkpoint sync state endpoint would be an optimal strategy for state providers. They could set say 1h TTL for this cache |
I think that this conversation has veered out of scope for an endpoint definition. In my opinion, this endpoint should be added to the beacon APIs same as the others, and made available on the same basis as the rest of the endpoints. If there is a desire for this endpoint to be exposed to external/untrusted parties it might require any number of features such as authentication, DDoS protection, rate limiting, internal caching etc. that don't fall under the remit of an API specification. These can all be accomplished better by a piece of middleware that provides these features than attempting to add them to multiple beacon client implementations. So please: no dedicated port, no specific options in the CL for this. Let's build an endpoint definition here and handle presentation of it elsewhere. |
I think the chance of a user shooting themselves in the foot is higher, and the chance of users providing these APIs is significantly lower if we just throw them all onto the Beacon API and tell users "if you want to provide a public good, you can do a bunch of additional work to make that happen". At the least, the checkpoint API is incredibly lightweight and probably safe for the vast majority of people to just expose freely.
I think there should be another for |
It's a DoS vector, especially given the size of the state. But again, this is down to the implementation and has no impact on the inputs, processing or outputs of the endpoint which is what should be being discussed here. |
Why is that a problem? As long as the state is from within the weak subjectivity period they won't be able to make you finalise something incorrect and so you'll wind up finding the right chain and following it. You are basically in the same situation as you would have been had you been in sync and following the chain from epoch n onwards. You would get a problem if n+2 was within weak subjectivity but the state was from before weak subjectivity period but the requirement is that the state itself be within the weak subjectivity period. So to answer the specific question, you'd process slots to move the state forward to the start of the next epoch. You know there weren't any blocks in that period (or the state should have been from one of those blocks) but you can't tell if there were no blocks for the next epoch as well so can't take it any further forward. |
@ajsutton I think my example is actually too conservative, if there's a gap with no blocks that spans the weak subjectivity period then the epoch following block I hadn't made this connection before in relation to checkpoint sync, so figured it was worth noting. I agree in practice it's unlikely to be relevant, but if we're adding a new endpoint that bundles the state and a block, we may as well throw the finalized epoch in there too. |
For this to be a problem we'd need to have a block at say slot 10,000, followed by a long gap of multiple weeks so it's longer than the weak subjectivity period to say 110,000 where there are no blocks at all. Then we'd need to include some blocks that contain attestations from that offline period such that we finalise the empty slots but we don't finalize any of the new blocks, and you'd have to by wanting to do a checkpoint sync before finalisation updates to include any new blocks. The attacker meanwhile has to create the alternative chain, with validators that exited and became withdrawal during the period of empty blocks and perform a sybil attack to get you to follow their chain instead of the correct chain from the state you sync'd from. That would be quite impressive and the fix would be to wait until the chain finalizes a new block. I'm not convinced this is a viable threat let alone that it's a bigger risk than managing to introduce a bug in the extra code required to handle the checkpoint epoch being provided (because you just know you're going to wind up getting a checkpoint epoch that's actually from before the state due to some race condition). |
We would white-list only the |
Altering the path to |
I think we can defer it to implementations. Currently the spec is agnostic to that, so, implementations may choose to return the most recent finalized state that does have a block in the first slot of the epoch |
it's a problem for the client, not the server: if the server returns empty slots, clients must be able to deal with, because with only |
we still have to support the old debug path in general, meaning we have to support the exact same request on two paths - this is .. well, redundant - doable, just.. ugly. |
Currently, we require genesis and a checkpoint block and state to start from an arbitrary slot - this PR relaxes this requirement so that we can start with a state alone. The current trusted-node-sync algorithm works by first downloading blocks until we find an epoch aligned non-empty slot, then downloads the state via slot. However, current [proposals](ethereum/beacon-APIs#226) for checkpointing prefer finalized state as the main reference - this allows more simple access control and caching on the server side - in particular, this should help checkpoint-syncing from sources that have a fast `finalized` state download (like infura and teku) but are slow when accessing state via slot. Earlier versions of Nimbus will not be able to read databases created without a checkpoint block and genesis. In most cases, backfilling makes the database compatible except where genesis is also missing (custom networks). * backfill checkpoint block from libp2p instead of checkpoint source, when doing trusted node sync * allow starting the client without genesis / checkpoint block * perform epoch start slot lookahead when loading tail state, so as to deal with the case where the epoch start slot does not have a block * replace `--blockId` with `--state-id` in TNS command line * when replaying, also look at the parent of the last-known-block (even if we don't have the parent block data, we can still replay from a "parent" state) - in particular, this clears the way for implementing state pruning * deprecate `--finalized-checkpoint-block` option (no longer needed)
As of status-im/nimbus-eth2#4251, Nimbus will no longer require a block for checkpoint syncing, rather it will download just a state (that has to be epoch-aligned) - we'll also support syncing from a checkpoint whose slot is empty (up to 1 epoch of empty slots supported) |
I think any up-to-date proposal for checkpoint sync API should take into account EIP-4881, which describes a process similar to checkpoint sync, but related to the state of the deposit contract. Nimbus and Lighthouse already have the necessary mechanisms for consuming the The advantage of bundling all checkpoint-like structures in a single response is that this improves the simplicity of the client which no longer needs to handle the contradictory responses that might arise due to race conditions when the checkpoint sync is executed exactly when the serving node is finalizing a new epoch. |
Now that checkpointz exists, is it still our intent to merge this to beacon-apis? There was active discussion which has died off, I'm not sure what the status of this PR is... |
Whether or not a request pertains to the finalized section of the chain (per the view of the client fork choice) is somewhat cumbersome to discover. This PR adds a boolean that allows clients to distinguish a response that has been finalized and thus is unlikely to change from one that may still change over time (specially when using slot-based requests). A flag like this can also be used for the purpose of verifying that a checkpoint root indeed is part of chain history and is likely to remain as such, as discussed in ethereum#226
|
For the record, the plan is to remove the need for deposit snapshots entirely as part of removing the follow distance and updating deposit handling to take advantage of the post-merge world. So I probably wouldn't design APIs to use deposit snapshots at this stage. |
Currently, we require genesis and a checkpoint block and state to start from an arbitrary slot - this PR relaxes this requirement so that we can start with a state alone. The current trusted-node-sync algorithm works by first downloading blocks until we find an epoch aligned non-empty slot, then downloads the state via slot. However, current [proposals](ethereum/beacon-APIs#226) for checkpointing prefer finalized state as the main reference - this allows more simple access control and caching on the server side - in particular, this should help checkpoint-syncing from sources that have a fast `finalized` state download (like infura and teku) but are slow when accessing state via slot. Earlier versions of Nimbus will not be able to read databases created without a checkpoint block and genesis. In most cases, backfilling makes the database compatible except where genesis is also missing (custom networks). * backfill checkpoint block from libp2p instead of checkpoint source, when doing trusted node sync * allow starting the client without genesis / checkpoint block * perform epoch start slot lookahead when loading tail state, so as to deal with the case where the epoch start slot does not have a block * replace `--blockId` with `--state-id` in TNS command line * when replaying, also look at the parent of the last-known-block (even if we don't have the parent block data, we can still replay from a "parent" state) - in particular, this clears the way for implementing state pruning * deprecate `--finalized-checkpoint-block` option (no longer needed)
Currently, we require genesis and a checkpoint block and state to start from an arbitrary slot - this PR relaxes this requirement so that we can start with a state alone. The current trusted-node-sync algorithm works by first downloading blocks until we find an epoch aligned non-empty slot, then downloads the state via slot. However, current [proposals](ethereum/beacon-APIs#226) for checkpointing prefer finalized state as the main reference - this allows more simple access control and caching on the server side - in particular, this should help checkpoint-syncing from sources that have a fast `finalized` state download (like infura and teku) but are slow when accessing state via slot. Earlier versions of Nimbus will not be able to read databases created without a checkpoint block and genesis. In most cases, backfilling makes the database compatible except where genesis is also missing (custom networks). * backfill checkpoint block from libp2p instead of checkpoint source, when doing trusted node sync * allow starting the client without genesis / checkpoint block * perform epoch start slot lookahead when loading tail state, so as to deal with the case where the epoch start slot does not have a block * replace `--blockId` with `--state-id` in TNS command line * when replaying, also look at the parent of the last-known-block (even if we don't have the parent block data, we can still replay from a "parent" state) - in particular, this clears the way for implementing state pruning * deprecate `--finalized-checkpoint-block` option (no longer needed)
Whether or not a request pertains to the finalized section of the chain (per the view of the client fork choice) is somewhat cumbersome to discover. This PR adds a boolean that allows clients to distinguish a response that has been finalized and thus is unlikely to change from one that may still change over time (specially when using slot-based requests). A flag like this can also be used for the purpose of verifying that a checkpoint root indeed is part of chain history and is likely to remain as such, as discussed in #226 * fix dots Co-authored-by: Paul Harris <[email protected]>
I was revisiting this proposal recently and came to conclusion that it has lost its actuality. Leaving it open to give enough time for those who are interested to chime in and express their opinion. If there is no objections I will close this PR in a week. My justification is provided below. State provider API is successfully implemented as a part of Checkpointz tool. The tool does also provide caching and DoS protection for endpoints serving the state and other pieces of data that different clients require to bootstrap a node. Once every CL client becomes capable of starting from a checkpoint state only, the tool can be equipped with After #254 got merged, CL clients are already equipped with API providing verification of a given block/state similar to trust provider's API proposed in this PR. The main obstacle for not experienced user in exposing this API to the internet is configuration that would make the access to it secure. That is, adding DoS protection and limiting the access to this exact endpoint only or a subset of endpoints which are safe to be exposed. Using Checkpointz as a trust provider could reduce the barrier but in this case the tool would become a single point of failure in a critical part of the system. There are a couple of other solutions that can alleviate the configuration issue described above:
Having more tools like Checkpointz can also be considered, but I personally feel like it will create yet another diversity problem that we will have to care about. |
Fine for me to close, checkpointz provides a better solution that I feel ok recommending to users |
Closing the PR, see #226 (comment) for details |
Whether or not a request pertains to the finalized section of the chain (per the view of the client fork choice) is somewhat cumbersome to discover. This PR adds a boolean that allows clients to distinguish a response that has been finalized and thus is unlikely to change from one that may still change over time (specially when using slot-based requests). A flag like this can also be used for the purpose of verifying that a checkpoint root indeed is part of chain history and is likely to remain as such, as discussed in ethereum/beacon-APIs#226 * fix dots Co-authored-by: Paul Harris <[email protected]>
Specification
GET /eth/v1/checkpoint/finalized_state
BeaconState
object for a finalized checkpoint state from the WS periodGET /eth/v1/checkpoint/finalized_blocks/{slot}/root
slot
is either unavailable or not yet finalizedMotivation
Facilitates checkpoint sync adoption by simplifying the following scheme:
/eth/v1/checkpoint/finalized_blocks/{slot}/root
endpoint to allow for checking that a block in the pulled state is finalizedThe
finalized_blocks/{slot}/root
endpoint is a shortcut to the following actions:GET /eth/v1/beacon/blocks/{slot}/root
GET /eth/v1/beacon/headers/finalized
slot <= finalized_header.slot
The
finalized_state
endpoint is an alias toGET /eth/v2/debug/beacon/states/finalized
cc @ajsutton @djrtwo