Calls for block/transaction information sometimes don't return #21430

wbt · 2020-08-07T22:32:18Z

This is intended to be a migration of this issue originally filed on web3. However, I dove all the way in to the W3CWebSocket used by web3 and found that the requests are being sent but the responses were not being received at all from geth, so I don't think it's a web3 issue.

Additionally, I have an odd observation from Geth which might help in debugging.

System information

Geth version: 1.9.18
OS & Version: Windows/Linux/OSX: Windows 10

Steps to reproduce the behaviour

Follow the setup here. Use a WebSocket URL when connecting to geth (in truffle-config) and in migrations/3_increment_counter.js you can set batchSize to 100 without an issue. This should produce a blockchain with >100K transactions in just over 1000 blocks, with most blocks containing 100 transactions (sometimes that's split across a couple).

The expected and actual behavior are as listed there. If the responses are not going to be responded to, some error is expected; if no error at least some logging and documentation about how to configure the threshold would be better than status quo silent failure.

Now here's the twist. If I make all three of the following changes (from the code as of the 1.9.18 release tag), I get the expected results. The changes are all Geth console logging and shouldn't affect correctness of operation, but in combination, these three log statements do. The changes are:

In internal/ethapi/api.go, the getBlockByNumber function, add at the start of the function log.Info("In api.go GetBlockByNumber", "number", number);
In core/blockchain.go, the getBlock function, add at the start of the function log.Info("In getBlock for number", "number", number);
In core/blockchain.go, the getBlock function, add at the start of the if block for checking the cache log.Info("Returning from getBlock (cached) for number", "number", number);

The third one gets lots of use because of how the transaction generation and querying process works; if yours isn't doing that you could add a similar log statement (without "cached") just before the return statement.

Comment out any of those three and Geth only returns data for roughly 325 blocks and 0 transactions out of an expected 100K of each. Include all three, and you get the expected 100K block and 100K transactions returned.

I suspect that there may be some buffer or limit that is being overwhelmed, causing some glitch that leads Geth to stop sending responses to queries for at least a little while, and that these log statements slow down processing just enough that the issue threshold is never exceeded. However, I'm not yet quite sure where that threshold is set.

The text was updated successfully, but these errors were encountered:

GriffGreen · 2020-08-08T23:56:33Z

This issue we were thinking was with web3... but maybe its geth...? web3/web3.js#3671

GregTheGreek · 2020-08-09T09:42:14Z

@GriffGreen I don't think those two issues are related...

GriffGreen · 2020-08-09T14:15:32Z

Cool :-D Thx! @GregTheGreek we are scrambling over here chasing the problem down :-D We tried both the Consensys abi-decoder and ethers.js and neither has worked... but now we see that they both use ethers.js's abi encoder! So we made an issue there ethers-io/ethers.js#992

GregTheGreek · 2020-08-09T22:56:28Z

@GriffGreen amazing! I did a little debugging and noticed that as well... best of luck!

MariusVanDerWijden · 2020-09-07T11:02:11Z

@wbt I've been trying to repro the issue today but I'm not sure what the problem is.
The only problem I encounter is in truffle migrate

Counter value estimate: 202471
Returned error: already known

which can be mitigated by yarn truffle migrate -f 3 as you wrote.

The successful completions tick by normally.

Successful completions:  308798
Successful completions:  308799
So far have received 308799 events and successfully queried for 308799 blocks and 308799 transactions.

Am I missing something?
I'm using geth 1.9.20.

MariusVanDerWijden · 2020-09-07T14:32:47Z

I ran it for 2.4 mil contract calls now, 570.000 blocks and I haven't seen a problem yet

ricmoo · 2020-09-08T01:26:49Z

I've added support in ethers 5.0.12 for supporting legacy Solidity 0.4 external event data.

For more details, please see ethers-io/ethers.js#891.

wbt · 2020-09-08T13:52:17Z

@ricmoo Thanks for your efforts on that other issue around log parsing, which definitely caused us a lot of problems. However, I don't think this one was one of them. In this issue, the blockchain node isn't sending anything back to web3; this one isn't an issue where there is an incorrectly formatted log getting returned and not handled well in parsing.

@MariusVanDerWijden It is possible that your computer is significantly more powerful than mine; maybe try running it with a compute-intensive background job? Slower is also a possibility, if your machine's resource limits cause the requests to be spaced out enough to "fix" the issue, similar to what happened when I added those three log statements. I'm not quite sure what to suggest there.

I also noticed what I think is the same issue when subscribing to events, where there are many distinct events, some of which have many instances, and the subscription goes back to the beginning of the chain: sometimes I didn't even get the events back.

I wound up implementing a workaround which limits the number of requests outstanding to the blockchain node at any given time. This mostly works but requires a dedicated node and/or enough other controls so that requests from other clients/processes, combined with the limited one, remain under the yet-unknown limit for the node as a whole. It's not an elegant or complete solution, just usually practical for this particular application. I would like to see a better solution as described in the OP above.

holiman · 2021-05-27T08:48:44Z

I wound up implementing a workaround which limits the number of requests outstanding to the blockchain node at any given time.

If that solves it, sounds like the node was overloaded, and there was an RPC timeout. According to @fjl, rpc timouts do not issue a proper error-response, which is something that could be improved.

holiman · 2021-05-27T11:04:09Z

Interesting.

func (api *PublicDebugAPI) TimeMeOut(ctx context.Context) (string, error) {
	log.Info("TimeMeOut sleeping for 10 minutes...")
	time.Sleep(time.Minute * 10)
	log.Info("TimeMeOut waking up!")
	return "Good morning", nil
}

When invoked, the node says:

INFO [05-27|11:36:46.314] TimeMeOut sleeping for 10 minutes... 
INFO [05-27|11:46:46.315] TimeMeOut waking up!

On the caller-side (taken at a different time):

date && echo "{\"method\": \"debug_timeMeOut\", \"params\": [], \"id\": \"123\"}" | nc -U /home/user/.ethereum/geth.ipc  && date
Thu May 27 12:02:26 CEST 2021
{"jsonrpc":"2.0","id":"123","result":"Good morning"}
Thu May 27 12:12:26 CEST 2021

So the IPC connection does not time out, and it does return the data.

For http, however

$ date && curl --data '{"method":"debug_timeMeOut","params":[],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545 ; date
Thu May 27 12:50:37 CEST 2021


curl: (52) Empty reply from server
Thu May 27 13:00:46 CEST 2021

It doesn't time out either, but after the 10m, when it does reply, the reply becomes an empty reply. cc @fjl

holiman · 2021-06-03T08:39:25Z

diff --git a/internal/ethapi/api.go b/internal/ethapi/api.go
index 02368aac14..ea7e0347fb 100644
--- a/internal/ethapi/api.go
+++ b/internal/ethapi/api.go
@@ -1792,11 +1792,11 @@ func (api *PublicDebugAPI) GetBlockRlp(ctx context.Context, number uint64) (stri
        return fmt.Sprintf("%x", encoded), nil
 }
 
-func (api *PublicDebugAPI) TimeMeOut(ctx context.Context) (bool, error) {
-       log.Info("TimeMeOut sleeping for 10 minutes...")
-       time.Sleep(time.Minute * 10)
+func (api *PublicDebugAPI) TimeMeOut(ctx context.Context, seconds uint64) (string, error) {
+       log.Info("TimeMeOut sleeping...", "seconds", seconds)
+       time.Sleep(time.Second * time.Duration(seconds))
        log.Info("TimeMeOut waking up!")
-       return false, nil
+       return "Oll korrekt!", nil
 }
 
 // TestSignCliqueBlock fetches the given block number, and attempts to sign it as a clique header with the

After 30 seconds, we hit it:

[user@work go-ethereum]$ curl --data '{"method":"debug_timeMeOut","params":[25],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545
{"jsonrpc":"2.0","id":1,"result":"Oll korrekt!"}
[user@work go-ethereum]$ curl --data '{"method":"debug_timeMeOut","params":[30],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545
curl: (52) Empty reply from server

holiman · 2021-06-03T10:04:24Z

The actual parameter deciding this is h.server.WriteTimeout in node/rpcstack.go. If set to 1 second, the issue is easier to debug

holiman · 2021-06-03T10:06:06Z

I don't think we need to set a timeout there. The jsonCodec has it's own internal timeout for the writes.

	defaultWriteTimeout = 10 * time.Second // used if context has no deadline

adamschmideg added the status:triage label Aug 26, 2020

adamschmideg assigned MariusVanDerWijden Sep 3, 2020

adamschmideg removed the status:triage label Sep 3, 2020

renaynay added the status:triage label May 20, 2021

This was referenced Jun 3, 2021

debug_traceTransaction returning empty result #22869

Closed

debug_traceBlockByHash in parallel -> "Empty reply from server" #22915

Closed

ligi removed the status:triage label Jun 10, 2021

holiman added the status:triage label Jul 1, 2021

ligi removed the status:triage label Jul 8, 2021

holiman mentioned this issue Jul 15, 2021

net/http: server does not send response on timeout golang/go#47229

Open

This was referenced Aug 1, 2022

eth_getLogs run very slowly and return null #25453

Closed

graphql, node, rpc: improve HTTP write timeout handling #25457

Merged

s1na mentioned this issue Nov 8, 2022

graphql: add query timeout to prevent dos attack #26116

Merged

fjl closed this as completed in #25457 Dec 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calls for block/transaction information sometimes don't return #21430

Calls for block/transaction information sometimes don't return #21430

wbt commented Aug 7, 2020

GriffGreen commented Aug 8, 2020

GregTheGreek commented Aug 9, 2020

GriffGreen commented Aug 9, 2020 •

edited

Loading

GregTheGreek commented Aug 9, 2020

MariusVanDerWijden commented Sep 7, 2020

MariusVanDerWijden commented Sep 7, 2020

ricmoo commented Sep 8, 2020

wbt commented Sep 8, 2020

holiman commented May 27, 2021

holiman commented May 27, 2021 •

edited

Loading

holiman commented Jun 3, 2021

holiman commented Jun 3, 2021

holiman commented Jun 3, 2021

Calls for block/transaction information sometimes don't return #21430

Calls for block/transaction information sometimes don't return #21430

Comments

wbt commented Aug 7, 2020

System information

Steps to reproduce the behaviour

GriffGreen commented Aug 8, 2020

GregTheGreek commented Aug 9, 2020

GriffGreen commented Aug 9, 2020 • edited Loading

GregTheGreek commented Aug 9, 2020

MariusVanDerWijden commented Sep 7, 2020

MariusVanDerWijden commented Sep 7, 2020

ricmoo commented Sep 8, 2020

wbt commented Sep 8, 2020

holiman commented May 27, 2021

holiman commented May 27, 2021 • edited Loading

holiman commented Jun 3, 2021

holiman commented Jun 3, 2021

holiman commented Jun 3, 2021

GriffGreen commented Aug 9, 2020 •

edited

Loading

holiman commented May 27, 2021 •

edited

Loading