Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC keep alive need a nice error message #5136

Closed
samsja opened this issue Sep 6, 2022 · 5 comments · Fixed by #5325
Closed

GRPC keep alive need a nice error message #5136

samsja opened this issue Sep 6, 2022 · 5 comments · Fixed by #5325
Assignees

Comments

@samsja
Copy link
Contributor

samsja commented Sep 6, 2022

Context

When the keep alive mechanism is triggered we need to catch the exception and display a nice error message

@Adesoji1
Copy link

Adesoji1 commented Oct 6, 2022

@samsja what are the steps to reproduce so i can fix this issue?

@samsja
Copy link
Contributor Author

samsja commented Oct 7, 2022

Hey @Adesoji1 thank your for wanting to contribute !

This issue is actually not that easy to reproduce.

Let me gave u some context. The keep alive mechanism is only triggered when the connection is killed abruptly between the Jina Client and the Jina Gateway (the server). In this case we just raise the raw Exception from the grpc library so the Client just end up displaying a weird message. The goal would be to catch the Exception and to display a nice message.

Now starts the messy part. Simulating the abrupt kill is not easy at all ( At least I don't have a good way to do it). This mainly happened when the Flow is running in the cloud in production env.

My ugly way to make it happened is to have a remote flow somewhere (on a VM on aws for example) to connect to it via my laptopt and to, ... , close my wifi. This simulate the behavior, but it is not easy to iterate.

If you manage to reproduce the ugly error message you will be probably able to fix the message. Otherwise I will try to produce a simplier way of reproducing

@Jackmin801 Jackmin801 assigned Jackmin801 and unassigned Jackmin801 Oct 26, 2022
@Jackmin801
Copy link
Contributor

Is this the "ugly" error?

<AioRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "keepalive watchdog timeout"
        debug_error_string = "{"created":"@1666942824.531216000","description":"Error received from peer ipv4:192.168.100.56:12345","file":"src/core/lib/surface/call.cc","file_line":967,"grpc_message":"keepalive watchdog timeout","grpc_status":14}"

@Jackmin801
Copy link
Contributor

If the port rejects, you get this:

<AioRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses"
        debug_error_string = "{"created":"@1666943053.384556000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3261,"referenced_errors":[{"created":"@1666943053.384555000","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"

@samsja
Copy link
Contributor Author

samsja commented Oct 28, 2022

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants