-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DSN sync can get stuck indefinitely #2729
Comments
I think we should start by tracking all libp2p queries and when they started, then periodically print those that have not completed for a long time. By knowing which kind of query didn't finish and for how long we can narrow-down the problem more accurately. As to DSN sync itself, I think we can add generous timeouts and assuming networking stack doesn't get stuck completely we should be able to simply restart DSN sync again and it will hopefully succeed from second attempt. |
Researching existing timeouts makes sense. I also have a suspicion that |
@shamil-gadelshin I narrowed it down to at least Kademlia bootstrap getting stuck sometimes: libp2p/rust-libp2p#5432 It is possible that other requests might get stuck too, but that is the only one I reproduced so far. |
If you start node with |
I've already override the db from a synced one. |
I'll close this for now, we had workaround and a fix included in |
@shamil-gadelshin, not sure how to debug this, but there must be still a bug in libp2p that causes requests to be stuck sometimes.
I just had my node rebooted after a few minutes of being offline and it was not able to finish DSN sync in 30 minutes.
After restart it synced in a few minutes successfully.
Users report the same thing from time to time and we should look for a way to:
While restart helps, it is a suboptimal experience and for Space Acres users that don't read logs all the time it is even more confusing.
The text was updated successfully, but these errors were encountered: