Bootstrapping blocks slowly crawl to a halt #3294

DarrenAlex · 2021-05-17T16:10:06Z

Summary

When bootstrapping blocks, slowly the number of blocks fetched/downloaded per second goes down.
As an example, when I first started fetching blocks from other peers, I downloaded 2,000 blocks/second.
After hitting 21.3 million blocks (60 million for someone else), it slowed down to 20-30 blocks/second.

Local device resources seem to be fine

Node version

V22.0

Build details

Docker container, nothing out of the ordinary.

OS and version

Linux 20.04

Steps to reproduce the behavior

Follow installation instruction in the docs
Start bootstrapping
Monitor block/sec progress

Expected behavior

The software continues to sync blocks at a similar/ascending rate, not a descending rate

Actual behavior

The software syncs blocks at a descending rate

Possible solution

Honestly, I don't have any idea

Supporting files

No response

qwahzi · 2021-05-17T16:12:09Z

Same issue for me with multiple nodes/bootstrapping attempts (Centos 8). The slowdown happens around ~59-60M blocks for me though. In the last 4 hours, I've only gained ~100 counted blocks.

$ cat startTime.txt
           Universal time: Thu 2021-05-13 20:49:04 UTC

$ curl -d '{"action": "block_count"}' 127.0.0.1:7076
{
    "count": "60680998",
    "unchecked": "60623590",
    "cemented": "37743403"
}

$ timedatectl
           Universal time: Mon 2021-05-17 16:15:28 UTC

Not sure if related, but I see a bunch of these messages in the logs:

[2021-May-17 16:12:33.072637]: Error initiating bootstrap connection to [::ffff:143.198.231.110]:54182: Connection reset by peer
[2021-May-17 16:12:33.839897]: 27416 accounts in pull queue
[2021-May-17 16:12:44.109434]: Error initiating bootstrap connection to [::ffff:194.195.243.169]:57930: No route to host

DarrenAlex · 2021-05-17T16:23:39Z

I also receive those messages in the log file. I've uploaded the log files (only lines which contain the word "bootstrap") to https://node.erawan.me/log.txt

The only difference is that the counted and cemented blocks are stuck at 1 for me.
If you sum them all up, it's close to the network's total blocks. Is the cemented block number going up?

atomlab · 2021-05-20T08:45:40Z

The same problem!

smithy9901 · 2021-05-20T08:50:35Z

+1

qwahzi · 2021-05-20T12:46:22Z

My CentOS 8 node (RocksDB) suddenly caught up on count (from 70M to 117M) overnight. On V21.3 with LMDB, the same node was stuck at <60M count for over a month

curl -d '{"action": "block_count"}' 127.0.0.1:7076
{
    "count": "117825631",
    "unchecked": "972886",
    "cemented": "52043280"
}

My Windows 10 RocksDB + pruning node is still stuck:

{
    "count": "51852881",
    "unchecked": "77711715",
    "cemented": "39891722",
    "full": "13068695",
    "pruned": "38784186"
}

michaeljs1990 · 2021-05-26T04:05:06Z

Running into the same issue.

{
    "count": "50864256",
    "unchecked": "65442030",
    "cemented": "33401959"
}

Gets to this point and now it's slowed to a crawl that will never catch up.

I did catch this in the logs but I'm not sure if it matters.

[2021-May-26 04:10:40.444040]: Starting legacy bootstrap attempt with ID auto_bootstrap_0
[2021-May-26 04:10:40.444231]: Bootstrap attempt stopped because there are no peers
[2021-May-26 04:10:40.444312]: Exiting legacy bootstrap attempt with ID auto_bootstrap_0
[2021-May-26 04:10:40.452590]: Beginning pending block search
[2021-May-26 04:10:40.452708]: IPC: server started

In addition I'm only at 43% synced shown by the nano node stats monitor and have database size of 100G. Is nano really already at over 200G for it's blockchain size?

My node has had consistent memory, cpu, and disk usage throughout the entire sync however I'm not really sure what it's doing with all this data it's writing. I did start from scratch with a v22 node so it shouldn't have to be doing any conversion between node database versions.

qwahzi · 2021-05-26T13:54:34Z

I had the same message in my logs when my Windows 10 node disconnected:

2021-May-22 10:31:30.217942]: UPnP local address: 192.168.86.25, discovery: 0, IGD search: 1
[2021-May-22 10:32:34.046196]: Total recently pruned block count: 0
[2021-May-22 10:33:02.575300]: Beginning pending block search
[2021-May-22 10:37:54.012282]: Total recently pruned block count: 0
[2021-May-22 10:38:02.575701]: Beginning pending block search
[2021-May-22 10:41:25.958997]: Starting legacy bootstrap attempt with ID auto_bootstrap_85
[2021-May-22 10:41:25.958997]: Bootstrap attempt stopped because there are no peers
[2021-May-22 10:41:25.958997]: Exiting legacy bootstrap attempt with ID auto_bootstrap_85
[2021-May-22 10:43:02.576639]: Beginning pending block search
[2021-May-22 10:43:13.742616]: Total recently pruned block count: 0
[2021-May-22 10:45:16.239424]: UPnP local address: 192.168.86.25, discovery: 0, IGD search: 1
[2021-May-22 10:48:02.577391]: Beginning pending block search
[2021-May-22 10:48:33.333862]: Total recently pruned block count: 0
[2021-May-22 10:53:02.577717]: Beginning pending block search
[2021-May-22 10:53:54.688580]: Total recently pruned block count: 0
[2021-May-22 10:56:25.959882]: Starting legacy bootstrap attempt with ID auto_bootstrap_86
[2021-May-22 10:56:25.959882]: Bootstrap attempt stopped because there are no peers
[2021-May-22 10:56:25.959882]: Exiting legacy bootstrap attempt with ID auto_bootstrap_86
[2021-May-22 10:58:02.578748]: Beginning pending block search
[2021-May-22 10:59:02.242813]: UPnP local address: 192.168.86.25, discovery: 0, IGD search: 1
[2021-May-22 10:59:19.093811]: Total recently pruned block count: 0
[2021-May-22 11:03:02.579804]: Beginning pending block search
[2021-May-22 11:04:39.309938]: Total recently pruned block count: 0
[2021-May-22 11:08:02.581172]: Beginning pending block search

After restarting the node again and leaving it for another week, it seems to be making some progress:

{
    "count": "61293347",
    "unchecked": "67421914",
    "cemented": "51092686",
    "full": "12016287",
    "pruned": "49277060"
}

Centos 8 node full sync status update:

{
    "count": "118174879",
    "unchecked": "902500",
    "cemented": "74119823",
    "full": "50116606",
    "pruned": "68058273"
}

michaeljs1990 · 2021-05-27T06:06:02Z

You can download the latest bootstrap database file from here and then start up the node. Took me about an hour between download and extract of the 7z file and then I was at 100% a few minutes after starting. Doesn't fix the underlying issue but if you are trying to get a node up and running it's the fastest way right now.

DarrenAlex · 2021-05-28T02:12:07Z

downloading the latest bootstrap database file from here

The whole point of bootstrapping is to not use the ledger download.

regfaker · 2021-05-28T14:08:23Z

My CentOS 8 node (RocksDB) suddenly caught up on count (from 70M to 117M) overnight. On V21.3 with LMDB, the same node was stuck at <60M count for over a month

Same jump here, I've been on 72M/70M count/cemented at 7:13 today and now at 119M/77M count/cemented at 14:04. Days before it was crawling slowly with only hundreds of count increase in hours.

(RocksDB)

michaeljs1990 · 2021-05-28T16:21:09Z

downloading the latest bootstrap database file from here

The whole point of bootstrapping is to not use the ledger download.

I agree but if your goal is to get a node running that looks to be the only way to get one up and running before the network started moving a bit faster. In the end if the bootstrap file was bad your blocks would be rejected by the network anyway.

qwahzi · 2021-05-30T14:16:29Z

Centos 8 node status update, after 17 days:

$ cat startTime2.txt
           Universal time: Thu 2021-05-13 20:49:04 UTC

$ curl -d '{"action": "block_count"}' 127.0.0.1:7076
{
    "count": "118491518",
    "unchecked": "956586",
    "cemented": "98261377",
    "full": "33622456",
    "pruned": "84869062"
}

$ timedatectl
           Universal time: Sun 2021-05-30 14:11:20 UTC

Windows 10 node status update (still pretty stuck):

{
    "count": "63138031",
    "unchecked": "65450762",
    "cemented": "57874107",
    "full": "7837324",
    "pruned": "55300707"
}

Joohansson · 2021-06-02T13:03:47Z

Same here, windows. 14 days. Confirming at 30 cps.

qwahzi · 2021-06-02T13:08:15Z

Both of my nodes are now unstuck and getting pretty close:

Centos 8:

{
    "count": "119027602",
    "unchecked": "791562",
    "cemented": "108980055",
    "full": "28467965",
    "pruned": "90559637"
}

Windows 10:

{
    "count": "118949710",
    "unchecked": "1785684",
    "cemented": "86111699",
    "full": "39533449",
    "pruned": "79416261"
}

Joohansson · 2021-06-03T05:31:05Z

Mine started to move now as well, 5000 bps. Block count 92M
Edit day 16: Still 31M unconfirmed

qwahzi · 2021-06-06T13:16:12Z

My Centos 8 node officially finished syncing from scratch last night (sometime in the last 12 hours). For some reason it kept restarting itself roughly once a day over the last week, but there were no details in the logs

$ cat startTime.txt
           Universal time: Thu 2021-05-13 20:49:04 UTC
          
$ curl -d '{"action": "block_count"}' 127.0.0.1:7076
{
    "count": "120612378",
    "unchecked": "28987",
    "cemented": "120612370",
    "full": "25587436",
    "pruned": "95024942"
}

$ timedatectl
           Universal time: Sun 2021-06-06 13:10:32 UTC

$ du -xhs /nano
22G     /nano

$ docker ps -a
CONTAINER ID   IMAGE                     COMMAND                  CREATED       STATUS        PORTS                                                                                                                                      NAMES
XXXX nanocurrency/nano:V22.0   "/usr/bin/entry.sh n…"   2 weeks ago   Up 16 hours   127.0.0.1:7076->7076/tcp, 0.0.0.0:7075->7075/tcp, 0.0.0.0:7075->7075/udp, :::7075->7075/tcp, :::7075->7075/udp, 127.0.0.1:7078->7078/tcp   nano-container

My Windows 10 node is still going, but making progress:

{
    "count": "118992721",
    "unchecked": "2149929",
    "cemented": "95323574",
    "full": "34485057",
    "pruned": "84507664"
}

Joohansson · 2021-06-08T13:46:33Z

I still have 21M to confirm and mostly doing 25-30 cps so I give up. It was just an experiment but simply taking too long for me.

JohnGarnham · 2021-07-04T13:47:09Z

Same here. Syncing my node was taking way too long and after a few days I just gave up and manually downloaded the ledger. Glad it's not just me!

Exxenoz · 2021-07-18T13:27:16Z

My experience with V22.1: It took a few days to request all unchecked blocks, but cementing was stuck at around 1-2 blocks per second. After a week and around 2mln cemented blocks I gave up and downloaded the MyNanoNinja ledger. My node was synchronized after less than two hours, even though the node still had to request and cement a few 100k blocks.

ghost · 2021-09-08T22:06:19Z

I think I was able to find some information on the bootstrapping problem.
My node started churning as well

block_count unchecked would go up then down
frontier_count would go up slowly.
logs:

27416 accounts in pull queue
...
0 accounts in pull queue
...
Flushing unchecked blocks
Finished flushing unchecked blocks
...
27416 accounts in pull queue

so I think what is happening is that rocksdb is garbage collecting the unchecked blocks after "accounts in pull queue" reaches zero, whereas leveldb only garbage collects when you explicitly ask for it.

zhyatt · 2021-09-10T17:14:20Z

so I think what is happening is that rocksdb is garbage collecting the unchecked blocks after "accounts in pull queue" reaches zero, whereas leveldb only garbage collects when you explicitly ask for it.

@theohax @clemahieu Although this note from @coranos (thanks!) may be RocksDB specific, I wanted to highlight for you as you continue to look into bootstrapping activities.

ghost · 2021-09-12T13:32:12Z

I did some more investigation, and I believe I have found a clue.

Using frontiers, I was able to determine an account that took several hours to appear. It was taking a long time in both rocksdb and leveldb, and it's on bananos v22. So ymmv.

these accounts were in a chunk of accounts that took 8 hours of 100% cpu churning to appear:
[
"ban_1ah4xni57wd9m3ub9b6u59714ngntyugm5689gamthgibzr1nrqn1j15rbxr",
"ban_1tyjsn7ra7z7qb6a9akfxrsonqtzfgnwggkzhodccnuss4ms5bieacnkeftk",
"ban_3aegykcqtuggoc98ypm7q5aicx7t6s1exh7nxhyycocac1amg8j8noi4gxjr",
"ban_3fb7gkrncm3ojo96fnpwhkrj1km4dh8gzo4sdxzjkmqcbtynqd94z7rua1nz",
"ban_3fqcofp1yempyttg6pnouwd685w8t3aoo78trmqkkk9q6gr3xinastt5yo5s"
]

Most are 2-3 hops from faucets (and the faucet would be close to genesis) so may be false positives.

The account ban_1tyjsn7ra7z7qb6a9akfxrsonqtzfgnwggkzhodccnuss4ms5bieacnkeftk
was fed from the account ban_1kshbcgqqzwwx8wmm994xke7bzrngt4thhxhhf49ybxth9tesgtj6acog63e
which isnt a know faucet, and has a high block height of 10993.
so it may be that ban_1tyjsn took a while due to being far from genesis.
I don't have exact telemetry to prove that, but that is what I have so far.

https://creeper.banano.cc/explorer/block/BA9FADD1F3FC347D32DA3FFCFCBC07C21E4D9FAA440D3F71236AB5C7A2B89251

jarviscodes · 2021-09-14T13:27:27Z

FYI:
My node was completely synced in less than 12 Hours.

Start time: Sep 13 13:39
End time: Sep 14 11:58

I have no "legacy bootstrap attempts" in my logs.
It's probably worth mentioning that I bootstrapped v20 instead of v22.

ghost · 2021-09-17T22:05:31Z

Was the fork block resolution code changed in v22?

I did some tests in bananos, since it had a smaller list of frontiers. a $40/mo 8gb 4 core server synchs to 24m of 27m blocks in 24h.

If I then compare frontiers of a fully synched block, and work through the link and previous blocks, I find hash 2405BCECCFECFDA6E72A72445BBAB2D0411C1B6315685E65D869734402647223 in the fully synched banano node, but absent from the out of synch node.

if I try to process the block, I get a fork block on the out of synch node.

if I force the fork to resolve in favor of the 2405..7223 block (so the out of synch node matches the fully synched node), it then synchs another ~1000 blocks.

I'm now rerunning the scan looking for another fork block.

ghost · 2021-09-18T00:49:03Z

no more fork blocks so far, but I have found a handful of other blocks using the following algorithm:

get frontiers from fully synched node (1078223 accounts)
get frontiers from desynched node (1064159 accounts)
get all frontiers that are in fully synched that are not in desynched (~25,000 hashes)
using fronters as a set of hashes, pull block from fully synched nodes, look for link and previous in desynched node.
for any block where the link and previous are in the desynched node, call 'process' to process the fully synched block on the desynched node.
repeat 4-6 using the set of not-found link and previous as the set of hashes.

So I start at the gap blocks, and try to work my way towards genesis.

ghost · 2021-09-20T22:30:20Z

Update from banano, it appears we have resolved the problem on banano.

The problem was this: We did not have over 50% rep weight on v22, most of it was on v20.

The resolution was this: When over 50% rep weight moved to v22, a new node synched successfully with no perceptible slowdown, and the fork that was detected earlier did not occur. In addition, block cementing happened at a much more rapid pace.

Since nano is on v22 and has been for months, someone may want to try to bootstrap nano from genesis and see if the issue still occurs or has resolved itself.

dakshsingh · 2021-09-23T02:36:46Z

I had this issue too. New to nano and wanted to set up a representative using docker. Started bootstrapping on 22.1 about 6 days back; plenty of RAM & cores, fast NVME drive. Reached about 50M blocks cemented in about 2 days and then slowed to a crawl, intense disk activity continuing. Reached just 61M blocks over the next 4 days. and decided to switch over to a downloaded ledger. Interestingly, the downloaded data.ldb was about 50GB uncompressed while the existing semi-done data.ldb that I replaced was over 140GB.

Currently cementing blocks at a high rate. Will post back once done.

dakshsingh · 2021-09-23T04:24:47Z

Yes. Synced up in about an hour.

qwahzi · 2023-05-25T02:11:42Z

This appears to have been resolved by ascending bootstrap in V25. Let me know if you're still having issues and we can reopen the issue.

zhyatt added this to the Research for Future Release milestone Jun 22, 2021

qwahzi mentioned this issue Sep 21, 2021

Fix the bootstrap request queue processing #3461

Merged

qwahzi mentioned this issue Nov 30, 2021

Cementing during bootstrap on test network with V23.0RC1 is slow #3578

Closed

qwahzi mentioned this issue Aug 22, 2022

Slow Syncing (e.g Bootstrapping) broken after V23.X #3899

Closed

qwahzi added the issues_fixed_by_ascending_bootstrap Issues that will become irrelevant once ascending bootstrap is in operation and the only bootstrap label Dec 28, 2022

qwahzi modified the milestones: Research for Future Release, ascending_bootstrap Dec 28, 2022

qwahzi closed this as completed May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootstrapping blocks slowly crawl to a halt #3294

Bootstrapping blocks slowly crawl to a halt #3294

DarrenAlex commented May 17, 2021

qwahzi commented May 17, 2021 •

edited

Loading

DarrenAlex commented May 17, 2021 •

edited

Loading

atomlab commented May 20, 2021

smithy9901 commented May 20, 2021

qwahzi commented May 20, 2021 •

edited

Loading

michaeljs1990 commented May 26, 2021 •

edited

Loading

qwahzi commented May 26, 2021 •

edited

Loading

michaeljs1990 commented May 27, 2021 •

edited

Loading

DarrenAlex commented May 28, 2021

regfaker commented May 28, 2021 •

edited

Loading

michaeljs1990 commented May 28, 2021 •

edited

Loading

qwahzi commented May 30, 2021

Joohansson commented Jun 2, 2021 •

edited

Loading

qwahzi commented Jun 2, 2021

Joohansson commented Jun 3, 2021 •

edited

Loading

qwahzi commented Jun 6, 2021

Joohansson commented Jun 8, 2021

JohnGarnham commented Jul 4, 2021

Exxenoz commented Jul 18, 2021

ghost commented Sep 8, 2021 •

edited by ghost

Loading

zhyatt commented Sep 10, 2021

ghost commented Sep 12, 2021 •

edited by ghost

Loading

jarviscodes commented Sep 14, 2021

ghost commented Sep 17, 2021

ghost commented Sep 18, 2021

ghost commented Sep 20, 2021

dakshsingh commented Sep 23, 2021

dakshsingh commented Sep 23, 2021

qwahzi commented May 25, 2023

Bootstrapping blocks slowly crawl to a halt #3294

Bootstrapping blocks slowly crawl to a halt #3294

Comments

DarrenAlex commented May 17, 2021

Summary

Node version

Build details

OS and version

Steps to reproduce the behavior

Expected behavior

Actual behavior

Possible solution

Supporting files

qwahzi commented May 17, 2021 • edited Loading

DarrenAlex commented May 17, 2021 • edited Loading

atomlab commented May 20, 2021

smithy9901 commented May 20, 2021

qwahzi commented May 20, 2021 • edited Loading

michaeljs1990 commented May 26, 2021 • edited Loading

qwahzi commented May 26, 2021 • edited Loading

michaeljs1990 commented May 27, 2021 • edited Loading

DarrenAlex commented May 28, 2021

regfaker commented May 28, 2021 • edited Loading

michaeljs1990 commented May 28, 2021 • edited Loading

qwahzi commented May 30, 2021

Joohansson commented Jun 2, 2021 • edited Loading

qwahzi commented Jun 2, 2021

Joohansson commented Jun 3, 2021 • edited Loading

qwahzi commented Jun 6, 2021

Joohansson commented Jun 8, 2021

JohnGarnham commented Jul 4, 2021

Exxenoz commented Jul 18, 2021

ghost commented Sep 8, 2021 • edited by ghost Loading

zhyatt commented Sep 10, 2021

ghost commented Sep 12, 2021 • edited by ghost Loading

jarviscodes commented Sep 14, 2021

ghost commented Sep 17, 2021

ghost commented Sep 18, 2021

ghost commented Sep 20, 2021

dakshsingh commented Sep 23, 2021

dakshsingh commented Sep 23, 2021

qwahzi commented May 25, 2023

qwahzi commented May 17, 2021 •

edited

Loading

DarrenAlex commented May 17, 2021 •

edited

Loading

qwahzi commented May 20, 2021 •

edited

Loading

michaeljs1990 commented May 26, 2021 •

edited

Loading

qwahzi commented May 26, 2021 •

edited

Loading

michaeljs1990 commented May 27, 2021 •

edited

Loading

regfaker commented May 28, 2021 •

edited

Loading

michaeljs1990 commented May 28, 2021 •

edited

Loading

Joohansson commented Jun 2, 2021 •

edited

Loading

Joohansson commented Jun 3, 2021 •

edited

Loading

ghost commented Sep 8, 2021 •

edited by ghost

Loading

ghost commented Sep 12, 2021 •

edited by ghost

Loading