Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway crashes while bootstrapping from out of sync AN #488

Closed
peterargue opened this issue Aug 28, 2024 · 7 comments · Fixed by #616
Closed

Gateway crashes while bootstrapping from out of sync AN #488

peterargue opened this issue Aug 28, 2024 · 7 comments · Fixed by #616
Assignees
Labels

Comments

@peterargue
Copy link
Contributor

Problem

I am testing the process for running a gateway with a local dedicated access node. When starting the gateway while the Access node is still catching up with the network, it syncs up until the AN's latest indexed block, the panics.

Here's the log output from the GW from my testing

{"level":"info","component":"ingestion","hash":"0xfffc5551f456fce67bdba6c253a26d51bf61f311c3b8361860a4dfcfe6d48c7e","evm-height":2200940,"cadence-height":213377610,"cadence-id":"3ae50bc701976b72430ca702139267e9af7aa8209f851c93befcb1eee409a7a5","parent-hash":"0x7234d6cfe3e85d79f66715aee3db45980952d99ca19da35a9d2e48e4cf9673cf","tx-hashes-root":"0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421","time":"2024-08-28T18:55:42Z","message":"new evm block executed event"}
{"level":"error","error":"failed to create EVM requester: could not fetch the configured COA account: 62631c28c9fc5a91 make sure it exists: client: rpc error: code = OutOfRange desc = failed to get account from the execution node: 3 errors occurred:\n\t* rpc error: code = OutOfRange desc = state for block ID 075346f6d8cc582093e309136241914fa8fbf16af02ddfe08b076b66d00938a6 not available\n\t* rpc error: code = OutOfRange desc = state for block ID 075346f6d8cc582093e309136241914fa8fbf16af02ddfe08b076b66d00938a6 not available\n\t* rpc error: code = OutOfRange desc = state for block ID 075346f6d8cc582093e309136241914fa8fbf16af02ddfe08b076b66d00938a6 not available\n\n","time":"2024-08-28T18:55:42Z","message":"failed to start the API server"}
panic: failed to create EVM requester: could not fetch the configured COA account: 62631c28c9fc5a91 make sure it exists: client: rpc error: code = OutOfRange desc = failed to get account from the execution node: 3 errors occurred:
        * rpc error: code = OutOfRange desc = state for block ID 075346f6d8cc582093e309136241914fa8fbf16af02ddfe08b076b66d00938a6 not available
        * rpc error: code = OutOfRange desc = state for block ID 075346f6d8cc582093e309136241914fa8fbf16af02ddfe08b076b66d00938a6 not available
        * rpc error: code = OutOfRange desc = state for block ID 075346f6d8cc582093e309136241914fa8fbf16af02ddfe08b076b66d00938a6 not available

It appears that what's happening is the first request to the AN for a block that it has not indexed is forwarded to an execution node. However, since the AN is behind, the EN has already pruned data for this block, resulting in an OutOfRange response code. The gateway then panics.

In this case, I think the gateway should pause and retry.

@sideninja
Copy link
Contributor

At this point I don't think EVM GW should communicate with ANs that are not synced. The problem you are experiencing is that the account you set as the COA is not found and this makes gateway panic. I don't think it's beneficial for GW to handle such case gracefully or retry, it will just add complexity. Please correct me if I'm wrong but I don't see a benefit in real-world usage that GW communicates with out-of-sync AN.

@peterargue
Copy link
Contributor Author

it seems that it panics when it fails to get an account from the AN, because the AN has not yet indexed the data. It's definitely possible for an AN to fall behind on indexing, or be restarted. Are you saying that this panic only happens in the case when node's COA has not yet been loaded?

If the GW requires that the AN is fully synced with the network before starting, I think that should be explicitly stated somewhere in the setup docs since I think a common usecase will be to run the GW with a local AN.

@m-Peter
Copy link
Collaborator

m-Peter commented Aug 29, 2024

it seems that it panics when it fails to get an account from the AN, because the AN has not yet indexed the data. It's definitely possible for an AN to fall behind on indexing, or be restarted. Are you saying that this panic only happens in the case when node's COA has not yet been loaded?

The error is coming from this check right here: https://github.com/onflow/flow-evm-gateway/blob/main/services/requester/requester.go#L130-L137. This might be because the AN has not yet indexed the latest data, or it can simply be because of a wrong Flow address provided in the corresponding bootstrap flag.

@m-Peter
Copy link
Collaborator

m-Peter commented Aug 29, 2024

If the GW requires that the AN is fully synced with the network before starting, I think that should be explicitly stated somewhere in the setup docs since I think a common usecase will be to run the GW with a local AN.

It does not have to be strictly fully synced with the network before starting, but in order to be operational, it should be synced to the block height where the configured COA is created. For testnet specifically, it should have indexed block height 211176670, because this is where the EVM contract was first deployed to testnet.

@peterargue
Copy link
Contributor Author

OK. I'll leave it up to you what should or doesn't need to be documented. I know the GW is still in development and may have some rough edges. This came up when I was testing the setup and was non-obvious what went wrong and what I should do to resolve it.

@m-Peter
Copy link
Collaborator

m-Peter commented Aug 30, 2024

We'll certainly add dedicated sections to the README, including the specific block height at which EVM contract is first deployed and any other detail that is non-obvious, especially for connecting to a self-run AN. I have already opened #500, so that we don't ignore errors coming from COA creation, which is an important part in bootstrapping.

@sideninja
Copy link
Contributor

I believe the pr #500 improves this problem, so the only remaining thing is to add to the documentation.

@m-Peter m-Peter self-assigned this Oct 18, 2024
@m-Peter m-Peter mentioned this issue Oct 18, 2024
6 tasks
@github-project-automation github-project-automation bot moved this to ✅ Done in 🌊 Flow 4D Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants