-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client Problems: Max Listener Memory Leak / Log Output Format / Snappy Errors / Invalid Trie Sync Stops #1539
Comments
And the extended error messages from #1540 directly in action in context of issue 4.: (even directly taking over the coloring 🙂) |
Client is not yet stable - especially regarding 4. - everyone who wants to debug please try to sync a couple of 1000 blocks (> 5.000 or so). |
nice the new errors already coming in handy, yes looks like we are inserting blocks out of order, this INFO [10-25|18:02:53] Imported blocks count=50 number=79866 hash=cba85ba8... hardfork=chainstart peers=9
finished: 10
INFO [10-25|18:02:53] Executed blocks count=50 first=79865 hash=da065ba6... td=105538793080819005 hardfork=chainstart last=79915 hash=f310c295... with txs=37
INFO [10-25|18:02:53] Executed blocks count=2 first=79915 hash=f310c295... td=105541141406504755 hardfork=chainstart last=79917 hash=de1755a1... with txs=38
WARN [10-25|18:02:53] Error along storing received block or header result: Error: invalid uncle hash (block number=79923 hash=0x818faf022269da8c3f20ba3630a8b90e3b6ba2c1769e74509e40e325114d2112 hf=chainstart baseFeePerGas=none txs=0 uncles=1)
INFO [10-25|18:02:57] New sync target height number=13489962 hash=29820ffe...
WARN [10-25|18:02:57] Error along storing received block or header result: Error: could not find parent header (block header number=80643 hash=0x4fd6cce23237ea6377c86ac084d9ded97bda7cf773b36795fae315953d13d103 hf=chainstart baseFeePerGas=none) I have some fixes for the snappy uncompress coming in a PR. Here's another clue, this one was properly executed in order, but when I look up this block on etherscan it says there is 1 tx and below says txs=0. So perhaps the body is returning with no result )or timing out) and still being inserted instead of invalidated or the information re-requested. Will look into it at blockfetcher.ts:L42 WARN [10-25|18:18:19] Error along storing received block or header result: Error: invalid transaction trie (block number=92789 hash=0x05d00c31f341c74e3103f7f49930dc5b8ef55dd3a5508d9be14399e09260b2fd hf=chainstart baseFeePerGas=none txs=0 uncles=0) |
Issue 4 still present, see e.g. the following screenshot from the latest client run on the server (actually needed to fix the service script for this, this is now working): Should get easier debuggable with Fetcher debug logger from #1544 activated. |
Here is some debug output on another fail, failing block at the end is 672044: Associated job is re-enqueued in the fetcher ("undefined or emtpy result set returned"). The renewedly executed request can also be seen in the screenshot from above. Then - somewhat further down the line - this happens: Respective block is shown as a 0-transactions block - which is wrong since due to Etherscan the block contains 1 transaction. This leads to the transaction trie being wrong (respectively the root not match) (guess a similar observation has already been done by Ryan @ryanio). Hmm, still a bit hard to debug, not yet clear to me why what's the connection between the re-request and the block falsely having (or being instantiated with) no transactions. 🤔 |
All somewhat addressed, client is syncing stable again, will close. |
Various problems with our beloved client 🙂 making everything a bit production-unready atm, with some which might have been introduced along the latest merge work.
Not sure if it's wise to throw all this into one issue, but I also didn't want to blow things up too much, guess (most) things can be fixed along the sideline (I will actually directly start after posting here).
1. Max Listener Memory Leak
We still have a memory leak caused by too much event listeners added and not removed, this is from the server log:
Should be easily fixable though.
2. Log Output Format Error
The normal log output (so running from the console with
npm run client:start
) is now always adding a39m
at the end of the colored labels from log lines, see e.g.:This has close-to-for-sure been introduced along the log file work in the latest Merge PR and should be very close located to these changes. Had some look into that yesterday but couldn't completely locate and fix yet.
3. Snappy Errors
There are new snappy errors looking like this:
These don't seem to break the control flow, but might have bad consequences under the hood (like a normally good peer getting disconnected or something) and therefore should be examined. Might be that my latest Snappy fix at #1484 didn't get it in the full turn (this was somewhat hacky in the first place).
Errors occur regularly so should be relatively easily be reproducable.
4. Invalid Trie Sync Stops
Last thing, likely the most severe/annoying one: sync (tested on
mainnet
on the server) breaks regularly (after a couple of 10.000 up to 100.000 blocks) with aninvalid receipt trie
orinvalid tx trie
or similar message, see e.g.:This completely breaks the sync flow and sync is never picked up after that (bad 🙃). (Somewhat) good thing is is that sync will continue once the client gets restarted. So this rather hints at something like that we process a block in the wrong order or maybe handle a block or block common or the like in some occasions in a way which breaks the block setup and or structure.
Anyhow. To be analyzed.
Not totally easy because this can only be triggered relatively irregularly (it does happen though on every client start after maximally 30-60 minutes).
Guess the best attempt might be to really carefully craft in some meaningful debugging output additions and then just let it run on the sideline. Might give this a try during the day.
The text was updated successfully, but these errors were encountered: