-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrapping blocks slowly crawl to a halt #3294
Comments
Same issue for me with multiple nodes/bootstrapping attempts (Centos 8). The slowdown happens around ~59-60M blocks for me though. In the last 4 hours, I've only gained ~100 counted blocks.
Not sure if related, but I see a bunch of these messages in the logs:
|
I also receive those messages in the log file. I've uploaded the log files (only lines which contain the word "bootstrap") to https://node.erawan.me/log.txt The only difference is that the counted and cemented blocks are stuck at 1 for me. |
The same problem! |
+1 |
My CentOS 8 node (RocksDB) suddenly caught up on count (from 70M to 117M) overnight. On V21.3 with LMDB, the same node was stuck at <60M count for over a month
My Windows 10 RocksDB + pruning node is still stuck:
|
Running into the same issue.
Gets to this point and now it's slowed to a crawl that will never catch up. I did catch this in the logs but I'm not sure if it matters.
In addition I'm only at 43% synced shown by the nano node stats monitor and have database size of 100G. Is nano really already at over 200G for it's blockchain size? My node has had consistent memory, cpu, and disk usage throughout the entire sync however I'm not really sure what it's doing with all this data it's writing. I did start from scratch with a v22 node so it shouldn't have to be doing any conversion between node database versions. |
I had the same message in my logs when my Windows 10 node disconnected:
After restarting the node again and leaving it for another week, it seems to be making some progress:
Centos 8 node full sync status update:
|
You can download the latest bootstrap database file from here and then start up the node. Took me about an hour between download and extract of the 7z file and then I was at 100% a few minutes after starting. Doesn't fix the underlying issue but if you are trying to get a node up and running it's the fastest way right now. |
The whole point of bootstrapping is to not use the ledger download. |
Same jump here, I've been on 72M/70M count/cemented at 7:13 today and now at 119M/77M count/cemented at 14:04. Days before it was crawling slowly with only hundreds of count increase in hours. (RocksDB) |
I agree but if your goal is to get a node running that looks to be the only way to get one up and running before the network started moving a bit faster. In the end if the bootstrap file was bad your blocks would be rejected by the network anyway. |
Centos 8 node status update, after 17 days:
Windows 10 node status update (still pretty stuck):
|
Both of my nodes are now unstuck and getting pretty close: Centos 8:
Windows 10:
|
Mine started to move now as well, 5000 bps. Block count 92M |
My Centos 8 node officially finished syncing from scratch last night (sometime in the last 12 hours). For some reason it kept restarting itself roughly once a day over the last week, but there were no details in the logs
My Windows 10 node is still going, but making progress:
|
I still have 21M to confirm and mostly doing 25-30 cps so I give up. It was just an experiment but simply taking too long for me. |
Same here. Syncing my node was taking way too long and after a few days I just gave up and manually downloaded the ledger. Glad it's not just me! |
My experience with V22.1: It took a few days to request all unchecked blocks, but cementing was stuck at around 1-2 blocks per second. After a week and around 2mln cemented blocks I gave up and downloaded the MyNanoNinja ledger. My node was synchronized after less than two hours, even though the node still had to request and cement a few 100k blocks. |
I think I was able to find some information on the bootstrapping problem.
so I think what is happening is that rocksdb is garbage collecting the unchecked blocks after "accounts in pull queue" reaches zero, whereas leveldb only garbage collects when you explicitly ask for it. |
@theohax @clemahieu Although this note from @coranos (thanks!) may be RocksDB specific, I wanted to highlight for you as you continue to look into bootstrapping activities. |
I did some more investigation, and I believe I have found a clue. Using frontiers, I was able to determine an account that took several hours to appear. It was taking a long time in both rocksdb and leveldb, and it's on bananos v22. So ymmv. these accounts were in a chunk of accounts that took 8 hours of 100% cpu churning to appear: Most are 2-3 hops from faucets (and the faucet would be close to genesis) so may be false positives. The account ban_1tyjsn7ra7z7qb6a9akfxrsonqtzfgnwggkzhodccnuss4ms5bieacnkeftk |
FYI: Start time: I have no "legacy bootstrap attempts" in my logs. |
Was the fork block resolution code changed in v22? I did some tests in bananos, since it had a smaller list of frontiers. a $40/mo 8gb 4 core server synchs to 24m of 27m blocks in 24h. If I then compare frontiers of a fully synched block, and work through the link and previous blocks, I find hash 2405BCECCFECFDA6E72A72445BBAB2D0411C1B6315685E65D869734402647223 in the fully synched banano node, but absent from the out of synch node. if I try to process the block, I get a fork block on the out of synch node. if I force the fork to resolve in favor of the 2405..7223 block (so the out of synch node matches the fully synched node), it then synchs another ~1000 blocks. I'm now rerunning the scan looking for another fork block. |
no more fork blocks so far, but I have found a handful of other blocks using the following algorithm:
So I start at the gap blocks, and try to work my way towards genesis. |
Update from banano, it appears we have resolved the problem on banano. The problem was this: We did not have over 50% rep weight on v22, most of it was on v20. The resolution was this: When over 50% rep weight moved to v22, a new node synched successfully with no perceptible slowdown, and the fork that was detected earlier did not occur. In addition, block cementing happened at a much more rapid pace. Since nano is on v22 and has been for months, someone may want to try to bootstrap nano from genesis and see if the issue still occurs or has resolved itself. |
I had this issue too. New to nano and wanted to set up a representative using docker. Started bootstrapping on 22.1 about 6 days back; plenty of RAM & cores, fast NVME drive. Reached about 50M blocks cemented in about 2 days and then slowed to a crawl, intense disk activity continuing. Reached just 61M blocks over the next 4 days. and decided to switch over to a downloaded ledger. Interestingly, the downloaded data.ldb was about 50GB uncompressed while the existing semi-done data.ldb that I replaced was over 140GB. Currently cementing blocks at a high rate. Will post back once done. |
Yes. Synced up in about an hour. |
This appears to have been resolved by ascending bootstrap in V25. Let me know if you're still having issues and we can reopen the issue. |
Summary
When bootstrapping blocks, slowly the number of blocks fetched/downloaded per second goes down.
As an example, when I first started fetching blocks from other peers, I downloaded 2,000 blocks/second.
After hitting 21.3 million blocks (60 million for someone else), it slowed down to 20-30 blocks/second.
Local device resources seem to be fine
Node version
V22.0
Build details
Docker container, nothing out of the ordinary.
OS and version
Linux 20.04
Steps to reproduce the behavior
Expected behavior
The software continues to sync blocks at a similar/ascending rate, not a descending rate
Actual behavior
The software syncs blocks at a descending rate
Possible solution
Honestly, I don't have any idea
Supporting files
No response
The text was updated successfully, but these errors were encountered: