You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mirror node acceptance tests sometimes run indefinitely in CI as documented in hashgraph/hedera-mirror-node#4610. We pass a timeout value of 10s and it is not respected in this scenario. It occurs sporadically and may be an indication of some issue with the node or proxy, so may not be easy to reproduce. But a bad node should not cause the client to hang.
It seems the timeout in Executable.execute(Client client, Duration timeout) is only used between retry attempts and not the underlying gRPC request. For that there is a grpcDeadline property which is unset by default. We will set this property in our code on every request to workaround, but in my opinion the SDK should do this by default if a timeout is provided. The timeout should be the overall timeout for all attempts, but on each attempt it should calculate the timeout remaining and pass that as the grpcDeadline in the gRPC CallOptions.
steven-sheehy
changed the title
Query.execute(client, timeout) can block indefinitely
Execute with a timeout can ignore timeout and block indefinitely
Oct 5, 2022
Turns out the actual underlying cause was the change in #1091. The change to ForkJoinPool.commonPool() seems to have unintended side effects and reverting this change in a local copy of the SDK unblocks the acceptance tests in CI. Probably because in containers there are less cores (1 or even fractional) so ForkJoinPool may only have at most 1 thread and will block on the second task.
So to fix we need to revert the changes to Client.createExecutor() in that PR and then fix the resulting thread leak as I suggested in my original ticket.
I do think also fixing Executable.execute to use grpcDeadline is useful, so let me know if you'd like a separate ticket for any of the above and I can split things as necessary.
Description
Mirror node acceptance tests sometimes run indefinitely in CI as documented in hashgraph/hedera-mirror-node#4610. We pass a timeout value of
10s
and it is not respected in this scenario. It occurs sporadically and may be an indication of some issue with the node or proxy, so may not be easy to reproduce. But a bad node should not cause the client to hang.It seems the
timeout
inExecutable.execute(Client client, Duration timeout)
is only used between retry attempts and not the underlying gRPC request. For that there is agrpcDeadline
property which is unset by default. We will set this property in our code on every request to workaround, but in my opinion the SDK should do this by default if atimeout
is provided. Thetimeout
should be the overall timeout for all attempts, but on each attempt it should calculate the timeout remaining and pass that as thegrpcDeadline
in the gRPCCallOptions
.Steps to reproduce
Additional context
No response
Hedera network
testnet
Version
v2.17.3
Operating system
No response
The text was updated successfully, but these errors were encountered: