Execute with a timeout can ignore timeout and block indefinitely #1175

steven-sheehy · 2022-10-05T21:48:32Z

Description

Mirror node acceptance tests sometimes run indefinitely in CI as documented in hashgraph/hedera-mirror-node#4610. We pass a timeout value of 10s and it is not respected in this scenario. It occurs sporadically and may be an indication of some issue with the node or proxy, so may not be easy to reproduce. But a bad node should not cause the client to hang.

It seems the timeout in Executable.execute(Client client, Duration timeout) is only used between retry attempts and not the underlying gRPC request. For that there is a grpcDeadline property which is unset by default. We will set this property in our code on every request to workaround, but in my opinion the SDK should do this by default if a timeout is provided. The timeout should be the overall timeout for all attempts, but on each attempt it should calculate the timeout remaining and pass that as the grpcDeadline in the gRPC CallOptions.

Steps to reproduce

    new AccountBalanceQuery()
            .setAccountId(nodeAccountId)
            .setNodeAccountIds(List.of(nodeAccountId))
            .execute(client, Duration.ofSeconds(10L));

Additional context

No response

Hedera network

testnet

Version

v2.17.3

Operating system

No response

The text was updated successfully, but these errors were encountered:

steven-sheehy · 2022-10-06T20:37:30Z

Turns out the actual underlying cause was the change in #1091. The change to ForkJoinPool.commonPool() seems to have unintended side effects and reverting this change in a local copy of the SDK unblocks the acceptance tests in CI. Probably because in containers there are less cores (1 or even fractional) so ForkJoinPool may only have at most 1 thread and will block on the second task.

So to fix we need to revert the changes to Client.createExecutor() in that PR and then fix the resulting thread leak as I suggested in my original ticket.

I do think also fixing Executable.execute to use grpcDeadline is useful, so let me know if you'd like a separate ticket for any of the above and I can split things as necessary.

steven-sheehy added the bug Something isn't working label Oct 5, 2022

steven-sheehy changed the title ~~Query.execute(client, timeout) can block indefinitely~~ Execute with a timeout can ignore timeout and block indefinitely Oct 5, 2022

ochikov assigned dikel Oct 6, 2022

SimiHunjan added this to the 2.18.2 milestone Oct 17, 2022

dikel mentioned this issue Oct 19, 2022

Fix: execute can ignore timeout and block indefinitely #1188

Merged

2 tasks

ochikov closed this as completed in #1188 Oct 21, 2022

steven-sheehy mentioned this issue Nov 3, 2022

Execute timeout should use gRPC deadline #1226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execute with a timeout can ignore timeout and block indefinitely #1175

Execute with a timeout can ignore timeout and block indefinitely #1175

steven-sheehy commented Oct 5, 2022

steven-sheehy commented Oct 6, 2022 •

edited

Loading

Execute with a timeout can ignore timeout and block indefinitely #1175

Execute with a timeout can ignore timeout and block indefinitely #1175

Comments

steven-sheehy commented Oct 5, 2022

Description

Steps to reproduce

Additional context

Hedera network

Version

Operating system

steven-sheehy commented Oct 6, 2022 • edited Loading

steven-sheehy commented Oct 6, 2022 •

edited

Loading