Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute with a timeout can ignore timeout and block indefinitely #1175

Closed
steven-sheehy opened this issue Oct 5, 2022 · 1 comment · Fixed by #1188
Closed

Execute with a timeout can ignore timeout and block indefinitely #1175

steven-sheehy opened this issue Oct 5, 2022 · 1 comment · Fixed by #1188
Assignees
Labels
bug Something isn't working
Milestone

Comments

@steven-sheehy
Copy link
Contributor

Description

Mirror node acceptance tests sometimes run indefinitely in CI as documented in hashgraph/hedera-mirror-node#4610. We pass a timeout value of 10s and it is not respected in this scenario. It occurs sporadically and may be an indication of some issue with the node or proxy, so may not be easy to reproduce. But a bad node should not cause the client to hang.

It seems the timeout in Executable.execute(Client client, Duration timeout) is only used between retry attempts and not the underlying gRPC request. For that there is a grpcDeadline property which is unset by default. We will set this property in our code on every request to workaround, but in my opinion the SDK should do this by default if a timeout is provided. The timeout should be the overall timeout for all attempts, but on each attempt it should calculate the timeout remaining and pass that as the grpcDeadline in the gRPC CallOptions.

Steps to reproduce

    new AccountBalanceQuery()
            .setAccountId(nodeAccountId)
            .setNodeAccountIds(List.of(nodeAccountId))
            .execute(client, Duration.ofSeconds(10L));

Additional context

No response

Hedera network

testnet

Version

v2.17.3

Operating system

No response

@steven-sheehy steven-sheehy added the bug Something isn't working label Oct 5, 2022
@steven-sheehy steven-sheehy changed the title Query.execute(client, timeout) can block indefinitely Execute with a timeout can ignore timeout and block indefinitely Oct 5, 2022
@steven-sheehy
Copy link
Contributor Author

steven-sheehy commented Oct 6, 2022

Turns out the actual underlying cause was the change in #1091. The change to ForkJoinPool.commonPool() seems to have unintended side effects and reverting this change in a local copy of the SDK unblocks the acceptance tests in CI. Probably because in containers there are less cores (1 or even fractional) so ForkJoinPool may only have at most 1 thread and will block on the second task.

So to fix we need to revert the changes to Client.createExecutor() in that PR and then fix the resulting thread leak as I suggested in my original ticket.

I do think also fixing Executable.execute to use grpcDeadline is useful, so let me know if you'd like a separate ticket for any of the above and I can split things as necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants