Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beacon not syncing #1590

Closed
GeoJesse opened this issue Sep 5, 2020 · 8 comments
Closed

Beacon not syncing #1590

GeoJesse opened this issue Sep 5, 2020 · 8 comments

Comments

@GeoJesse
Copy link

GeoJesse commented Sep 5, 2020

Description

See issue #1560
Upgraded to https://github.com/sigp/lighthouse/releases/tag/v0.2.8. Purged the DB and now it won't sync at all.

Version

0.2.8

Present Behaviour

Beacon not syncing. After a while the est_time goes to --.
Example log is:
Sep 05 14:05:50 ubuntu lighthouse[2351]: Sep 05 14:05:50.000 INFO Syncing est_time: --, distance: 127464 slots (2 weeks 3 days), peers: 16, service: slot_notifier

Expected Behaviour

It should sync.

Steps to resolve

Unknown. With 0.2.7 I was able to purge the DB and it would sync eventually with est_time decreasing. Now it never catches up.

@tobisako
Copy link
Contributor

tobisako commented Sep 9, 2020

ref) I'm tried this issue with lighthouse-docker
TESTNET=medalla -> Non-reproducible
TESTNET=altona -> reproduced

medalla

beacon_node_1       | Sep 09 06:48:21.811 INFO Syncing                                 est_time: 0 mins, speed: 17.31 slots/sec, distance: 62 slots (12 mins), peers: 43, service: slot_notifier

@GeoJesse
Copy link
Author

GeoJesse commented Sep 9, 2020

I reverted back to 0.2.7 and it still won't sync. I've purged the db several times but no luck.

Latest logs read as:
Sep 09 12:01:50 ubuntu lighthouse[5932]: Sep 09 12:01:50.001 INFO Syncing est_time: --, distance: 151740 slots (3 weeks 0 days), peers: 25, service: slot_notifier

@james-c
Copy link

james-c commented Sep 9, 2020

I'm getting the same problem with the docker image (sigp/lighthouse:latest). Get's to about 3 weeks to 3 weeks 5 days out and the chain stops syncing.

@ghost ghost added A1 Networking labels Sep 10, 2020
@AgeManning
Copy link
Member

AgeManning commented Sep 10, 2020

This is a known problem. There is a race condition syncing the period of long non-finality on medalla. Faster computers should be able to sync easily, slower computers have a probablistic chance (it continually retries in the background and eventually you should make it through the period of non-finality)

Even though the logs report no syncing time. Your node is still attempting to sync in the background. If you let it run, you should eventually sync (although it may take a while).

We have a fix coming very soon. #1551 will correct this

@GeoJesse
Copy link
Author

Glad to hear. You all rock!

@AgeManning
Copy link
Member

Hey guys. While we wait for the correct sync PR to land. I've made a hack-fix which should help most peers sync.

You can use the sync-hack branch to sync to head :)

bors bot pushed a commit that referenced this issue Sep 13, 2020
## Issue Addressed

#1590 

## Proposed Changes

This is a temporary workaround that prevents finalized chain sync from swapping chains. I'm merging this in now until the full solution is ready.
bors bot pushed a commit that referenced this issue Sep 13, 2020
## Issue Addressed

#1590 

## Proposed Changes

This is a temporary workaround that prevents finalized chain sync from swapping chains. I'm merging this in now until the full solution is ready.
@AgeManning
Copy link
Member

This is resolved in #1615. The correct full fix comes in #1628

@GeoJesse
Copy link
Author

Sync'd in record time and hitting 100% attestation effectiveness! Thanks!

michaelsproul added a commit that referenced this issue Sep 28, 2020
Squashed commit of the following:

commit 74ed1be
Author: pawan <[email protected]>
Date:   Tue Sep 22 19:27:14 2020 +0530

    fmt

commit d61f956
Author: pawan <[email protected]>
Date:   Tue Sep 22 19:03:16 2020 +0530

    Fix merge issues

commit 282a973
Merge: 6759bd1 a97ec31
Author: pawan <[email protected]>
Date:   Tue Sep 22 18:30:10 2020 +0530

    Merge branch 'master' into directory-restructure

commit a97ec31
Author: Pawan Dhananjay <[email protected]>
Date:   Tue Sep 22 07:29:34 2020 +0000

    Subscribe to subnets an epoch in advance (#1600)

    ## Issue Addressed

    N/A

    ## Proposed Changes

    Subscibe to subnet an epoch in advance of the attestation slot instead of 4 slots in advance.

commit 7aceff4
Author: Michael Sproul <[email protected]>
Date:   Tue Sep 22 05:40:04 2020 +0000

    Add `safe_sum` and use it in state_processing (#1620)

    ## Issue Addressed

    Closes #1098

    ## Proposed Changes

    Add a `SafeArithIter` trait with a `safe_sum` method, and use it in `state_processing`. This seems to be the only place in `consensus` where it is relevant -- i.e. where we were using `sum` and the integer_arith lint is enabled.

    ## Additional Info

    This PR doesn't include any Clippy linting to prevent `sum` from being called. It seems there is no existing Clippy lint that suits our purpose, but I'm going to look into that and maybe schedule writing one as a lower-priority task.

    This theoretically _is_ a consensus breaking change, but it shouldn't impact Medalla (or any other testnet) because `slashings` shouldn't overflow!

commit 4fca306
Author: Michael Sproul <[email protected]>
Date:   Tue Sep 22 05:40:02 2020 +0000

    Update BLST, add force-adx support (#1595)

    ## Issue Addressed

    Closes #1504
    Closes #1505

    ## Proposed Changes

    * Update `blst` to the latest version, which is more portable and includes finer-grained compilation controls (see below).
    * Detect the case where a binary has been explicitly compiled with ADX support but it's missing at runtime, and report a nicer error than `SIGILL`.

    ## Known Issues

    * None. The previous issue with `make build-aarch64` (supranational/blst#27), has been resolved.

    ## Additional Info

    I think we should tweak our release process and our Docker builds so that we provide two options:

    Binaries:

    * `lighthouse`: compiled with `modern`/`force-adx`, for CPUs 2013 and newer
    * `lighthouse-portable`: compiled with `portable` for older CPUs

    Docker images:

    * `sigp/lighthouse:latest`: multi-arch image with `modern` x86_64 and vanilla aarch64 binary
    * `sigp/lighthouse:latest-portable`: multi-arch image with `portable` builds for x86_64 and aarch64

    And relevant Docker images for the releases (as per #1574 (comment)), tagged `v0.x.y` and `v0.x.y-portable`

commit d85d5a4
Author: Paul Hauner <[email protected]>
Date:   Tue Sep 22 04:45:15 2020 +0000

    Bump to v0.2.11 (#1645)

    ## Issue Addressed

    NA

    ## Proposed Changes

    - Bump version to v0.2.11
    - Run `cargo update`.

    ## Additional Info

    NA

commit bd39cc8
Author: Paul Hauner <[email protected]>
Date:   Tue Sep 22 02:06:10 2020 +0000

    Apply hotfix for inconsistent head (#1639)

    ## Issue Addressed

    - Resolves #1616

    ## Proposed Changes

    If we look at the function which persists fork choice and the canonical head to disk:

    https://github.com/sigp/lighthouse/blob/1db8daae0c7bb34bf2e05644fa6bf313c2bea98e/beacon_node/beacon_chain/src/beacon_chain.rs#L234-L280

    There is a race-condition which might cause the canonical head and fork choice values to be out-of-sync.

    I believe this is the cause of #1616. I managed to recreate the issue and produce a database that was unable to sync under the `master` branch but able to sync with this branch.

    These new changes solve the issue by ignoring the persisted `canonical_head_block_root` value and instead getting fork choice to generate it. This ensures that the canonical head is in-sync with fork choice.

    ## Additional Info

    This is hotfix method that leaves some crusty code hanging around. Once this PR is merged (to satisfy the v0.2.x users) we should later update and merge #1638 so we can have a clean fix for the v0.3.x versions.

commit 14ff385
Author: Pawan Dhananjay <[email protected]>
Date:   Tue Sep 22 01:12:36 2020 +0000

    Add trusted peers (#1640)

    ## Issue Addressed

    Closes #1581

    ## Proposed Changes

    Adds a new cli option for trusted peers who always have the maximum possible score.

commit 5d17eb8
Author: Michael Sproul <[email protected]>
Date:   Mon Sep 21 11:53:53 2020 +0000

    Update LevelDB to v0.8.6, removing patch (#1636)

    Removes our dependency on a fork of LevelDB now that skade/leveldb-sys#17 is merged

commit 1db8daa
Author: Age Manning <[email protected]>
Date:   Mon Sep 21 02:00:38 2020 +0000

    Shift metadata to the global network variables (#1631)

    ## Issue Addressed

    N/A

    ## Proposed Changes

    Shifts the local `metadata` to `network_globals` making it accessible to the HTTP API and other areas of lighthouse.

    ## Additional Info

    N/A

commit 7b97c4a
Author: Pawan Dhananjay <[email protected]>
Date:   Mon Sep 21 01:06:25 2020 +0000

    Snappy additional sanity checks (#1625)

    ## Issue Addressed

    N/A

    ## Proposed Changes

    Adds the following check from the spec

    > A reader SHOULD NOT read more than max_encoded_len(n) bytes after reading the SSZ length-prefix n from the header.

commit 371e1c1
Author: Paul Hauner <[email protected]>
Date:   Fri Sep 18 06:41:29 2020 +0000

    Bump version to v0.2.10 (#1630)

    ## Issue Addressed

    NA

    ## Proposed Changes

    Bump crate version so we can cut a new release with the fix from #1629.

    ## Additional Info

    NA

commit a17f748
Author: Paul Hauner <[email protected]>
Date:   Fri Sep 18 05:14:31 2020 +0000

    Fix bad assumption when checking finalized descendant (#1629)

    ## Issue Addressed

    - Resolves #1616

    ## Proposed Changes

    Fixes a bug where we are unable to read the finalized block from fork choice.

    ## Detail

    I had made an assumption that the finalized block always has a parent root of `None`:

    https://github.com/sigp/lighthouse/blob/e5fc6bab485fa54d7e518b325f4eb9201ba5c6a1/consensus/fork_choice/src/fork_choice.rs#L749-L752

    This was a faulty assumption, we don't set parent *roots* to `None`. Instead we *sometimes* set parent *indices* to `None`, depending if this pruning condition is satisfied:

    https://github.com/sigp/lighthouse/blob/e5fc6bab485fa54d7e518b325f4eb9201ba5c6a1/consensus/proto_array/src/proto_array.rs#L229-L232

    The bug manifested itself like this:

    1. We attempt to get the finalized block from fork choice
    1. We try to check that the block is descendant of the finalized block (note: they're the same block).
    1. We expect the parent root to be `None`, but it's actually the parent root of the finalized root.
    1. We therefore end up checking if the parent of the finalized root is a descendant of itself. (note: it's an *ancestor* not a *descendant*).
    1. We therefore declare that the finalized block is not a descendant of (or eq to) the finalized block. Bad.

    ## Additional Info

    In reflection, I made a poor assumption in the quest to obtain a probably negligible performance gain. The performance gain wasn't worth the risk and we got burnt.

commit 49ab414
Author: Age Manning <[email protected]>
Date:   Fri Sep 18 02:05:36 2020 +0000

    Shift gossipsub validation (#1612)

    ## Issue Addressed

    N/A

    ## Proposed Changes

    This will consider all gossipsub messages that have either the `from`, `seqno` or `signature` field as invalid.

    ## Additional Info

    We should not merge this until all other clients have been sending empty fields for a while.

    See ethereum/consensus-specs#1981 for reference

commit 2074bec
Author: Age Manning <[email protected]>
Date:   Fri Sep 18 02:05:34 2020 +0000

    Gossipsub message id to shortened bytes (#1607)

    ## Issue Addressed

    ethereum/consensus-specs#2044

    ## Proposed Changes

    Shifts the gossipsub message id to use the first 8 bytes of the SHA256 hash of the gossipsub message data field.

    ## Additional Info

    We should merge this in once the spec has been decided on. It will cause issues with gossipsub scoring and gossipsub propagation rates (as we won't receive IWANT) messages from clients that also haven't made this update.

commit e5fc6ba
Author: Michael Sproul <[email protected]>
Date:   Mon Sep 14 10:58:15 2020 +0000

    Remove redundant decompression in process_deposit (#1610)

    ## Issue Addressed

    Closes #1076

    ## Proposed Changes

    Remove an extra unnecessary decompression of the deposit public key from `process_deposit`. The key is decompressed and used to verify the signature in `verify_deposit_signature`, making this initial decompression redundant.

    ## Additional Info

    This is _not_ a consensus-breaking change because keys which previously failed the early decompression check will not be found in the pubkey cache (they are invalid), and will be checked and rejected as part of `verify_deposit_signature`.

commit c9596fc
Author: Age Manning <[email protected]>
Date:   Sun Sep 13 23:58:49 2020 +0000

    Temporary Sync Work-Around (#1615)

    ## Issue Addressed

    #1590

    ## Proposed Changes

    This is a temporary workaround that prevents finalized chain sync from swapping chains. I'm merging this in now until the full solution is ready.

commit c6abc56
Author: Age Manning <[email protected]>
Date:   Fri Sep 11 02:33:36 2020 +0000

    Prevent large step-size parameters (#1583)

    ## Issue Addressed

    Malicious users could request very large block ranges, more than we expect. Although technically legal, we are now quadraticaly weighting large step sizes in the filter. Therefore users may request large skips, but not a large number of blocks, to prevent requests forcing us to do long chain lookups.

    ## Proposed Changes

    Weight the step parameter in the RPC filter and prevent any overflows that effect us in the step parameter.

    ## Additional Info

commit 7f1b936
Author: blacktemplar <[email protected]>
Date:   Fri Sep 11 01:43:15 2020 +0000

    ignore too early / too late attestations instead of penalizing them (#1608)

    ## Issue Addressed

    NA

    ## Proposed Changes

    This ignores attestations that are too early or too late as it is specified in the spec (see https://github.com/ethereum/eth2.0-specs/blob/v0.12.1/specs/phase0/p2p-interface.md#global-topics first subpoint of `beacon_aggregate_and_proof`)

commit 810de2f
Author: Daniel Schonfeld <[email protected]>
Date:   Fri Sep 11 01:43:13 2020 +0000

    Static testnet configs (#1603)

    ## Issue Addressed

    #1431

    ## Proposed Changes

    Added an archived zip file with required files manually

    ## Additional Info

    1) Used zip, instead of tar.gz to add a single dependency instead of two.
    2) I left the download from github code for now, waiting to hear if you'd like it cleaned up or left to be used for some tooling needs.

commit 0525876
Author: Pawan Dhananjay <[email protected]>
Date:   Fri Sep 11 00:52:27 2020 +0000

    Dial cached enr's before making subnet discovery query (#1376)

    ## Issue Addressed

    Closes #1365

    ## Proposed Changes

    Dial peers in the `cached_enrs` who aren't connected, aren't banned and satisfy the subnet predicate before making a subnet discovery query.

commit d79366c
Author: Age Manning <[email protected]>
Date:   Thu Sep 10 04:43:22 2020 +0000

    Prevent printing binary in RPC errors (#1604)

    ## Issue Addressed

    #1566

    ## Proposed Changes

    Prevents printing binary characters in the RPC error response from peers.

commit b19cf02
Author: Age Manning <[email protected]>
Date:   Thu Sep 10 03:51:06 2020 +0000

    Penalise bad peer behaviour (#1602)

    ## Issue Addressed

    #1386

    ## Proposed Changes

    Penalises peers in our scoring system that produce invalid attestations or blocks.

commit dfe5077
Author: Paul Hauner <[email protected]>
Date:   Thu Sep 10 00:24:41 2020 +0000

    Remove references to rust-docs (#1601)

    ## Issue Addressed

    - Resolves #897
    - Resolves #821

    ## Proposed Changes

    Removes references to the rust docs that we're no long maintaining.

    ## Additional Info

    NA

commit 0821e6b
Author: Paul Hauner <[email protected]>
Date:   Wed Sep 9 02:28:35 2020 +0000

    Bump version to v0.2.9 (#1598)

    ## Issue Addressed

    NA

    ## Proposed Changes

    - Bump version tags
    - Run `cargo update`

    ## Additional Info

    NA

commit 6759bd1
Author: pawan <[email protected]>
Date:   Tue Sep 8 12:39:22 2020 +0530

    Add `strict-slashing-protection` flag

commit 1644717
Author: pawan <[email protected]>
Date:   Fri Sep 4 16:46:43 2020 +0530

    Remove `get_default_base_dir`

commit e4783bf
Author: pawan <[email protected]>
Date:   Fri Sep 4 16:24:45 2020 +0530

    Address review comments

commit 89b832c
Author: pawan <[email protected]>
Date:   Thu Sep 3 00:30:53 2020 +0530

    Fix tests

commit 5940a52
Author: pawan <[email protected]>
Date:   Wed Sep 2 21:27:57 2020 +0530

    Print out path information in account manager

commit 5ca1341
Author: pawan <[email protected]>
Date:   Wed Sep 2 20:49:29 2020 +0530

    Use datadir flag for account manager

commit 8c82769
Author: pawan <[email protected]>
Date:   Wed Sep 2 20:10:44 2020 +0530

    Rename BASE_DIR_FLAG to WALLET_DIR_FLAG

commit ad34a0f
Author: pawan <[email protected]>
Date:   Wed Sep 2 18:17:45 2020 +0530

    Add validator-dir cli option

commit 04fd9f7
Merge: 0e3ccba 8718120
Author: pawan <[email protected]>
Date:   Wed Sep 2 14:53:27 2020 +0530

    Merge branch 'master' into directory-restructure

commit 0e3ccba
Author: pawan <[email protected]>
Date:   Wed Sep 2 14:40:36 2020 +0530

    Minor fixes

commit ad22178
Author: pawan <[email protected]>
Date:   Tue Sep 1 19:14:44 2020 +0530

    Migration for ValidatorDefinitions

commit 1022f58
Author: pawan <[email protected]>
Date:   Tue Sep 1 16:02:07 2020 +0530

    Fix lints and udeps

commit 4ddbbfe
Author: pawan <[email protected]>
Date:   Tue Sep 1 12:56:12 2020 +0530

    Remove more hardocoded values

commit 4170e4c
Author: pawan <[email protected]>
Date:   Tue Sep 1 12:50:43 2020 +0530

    Remove some hardcoded values

commit 5c46823
Merge: ae533f3 8301a98
Author: pawan <[email protected]>
Date:   Tue Sep 1 12:02:35 2020 +0530

    Merge branch 'master' into directory-restructure

commit ae533f3
Author: pawan <[email protected]>
Date:   Mon Aug 17 20:47:20 2020 +0530

    Docs fixes

commit fe13b1b
Author: pawan <[email protected]>
Date:   Mon Aug 17 19:58:09 2020 +0530

    lint

commit e4a7e8c
Merge: 01e6166 9a97a0b
Author: pawan <[email protected]>
Date:   Mon Aug 17 19:47:27 2020 +0530

    Merge branch 'master' into directory-restructure

commit 01e6166
Author: pawan <[email protected]>
Date:   Mon Aug 17 19:23:57 2020 +0530

    All binaries get directory info from common::directory crate

commit 8a7c130
Author: pawan <[email protected]>
Date:   Mon Aug 17 17:18:44 2020 +0530

    Add migration code for restructure

commit a4c33cd
Author: pawan <[email protected]>
Date:   Mon Aug 17 15:51:24 2020 +0530

    Fix bug in secrets dir

commit f4b8529
Author: pawan <[email protected]>
Date:   Mon Aug 17 13:05:28 2020 +0530

    Restrucure validator directory paths

commit 54b4ae6
Author: pawan <[email protected]>
Date:   Mon Aug 17 12:20:40 2020 +0530

    Restructure account_manager directory paths

commit 53892af
Author: pawan <[email protected]>
Date:   Thu Aug 13 19:43:44 2020 +0530

    Move get_testnet_dir to clap utils

commit d1d269a
Author: pawan <[email protected]>
Date:   Thu Aug 13 19:33:34 2020 +0530

    Move beacon directory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants