-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
received context error while waiting for new LB policy update: context deadline exceeded #7983
Comments
Hey @mayurkale22 , only this error message doesn't provide much information about what could be causing it. The problem could be caused by a number of issues starting from slow network to some error in the application or issue with name resolution. To help identify the cause, you can enable debug logs using
and that will give us more idea about the root cause. |
Hi @mayurkale22 are you using DNS resolver? I am also getting the same error on the latest GRPC client. I can see the resolver resolving 3 endpoints that point to my NLB and are correct. The only difference I can see is that it's using LB policy to "pick_first" instead of round-robin. |
I have more information on this issue. I got it fixed for my usecase SOMEHOW, not sure if this really fixed it or just mitigated it for the time being. In my setup I had a DNS endpoint resolving to multiple A address of NLB and my client was using I am still poking around the library for the flow and how it creates a client connection. I will update here if I find anything else. |
@mayurkale22 could you provide more information in following format https://github.com/grpc/grpc-go/issues/new?template=bug.md with debugging enabled? Please specify if you are using a different load balancing policy or name resolver than the default one. Mention the grpc version you are on and if you are using |
As per your other question of how to interpret It can happen when either there is no picker to connect to backend or there was a valid picker but it has become invalid now (because balancer detected a change in backend availability). So, its more likely a connectivity issue since you mentioned it happens intermittently. Do you have single backend or multiple? To give some background, Picker is used by gRPC to pick a SubConn (backend) to send an RPC. Balancer is expected to generate a new picker from its snapshot every time its internal state has changed. Balancer takes input from gRPC, manages SubConns, and collects and aggregates the connectivity states. It also generates and updates the Picker used by gRPC to pick SubConns (backends) for RPCs. |
Note that there was a recent change to output this message instead of a more generic "deadline exceeded" error. If this is happening at startup, then it's almost always going to be that we are still waiting for connections to be established. Maybe the RPC has too short of a deadline? If there were errors connecting, then those errors would be given to the RPC instead. I wonder if we can further improve this error so that users don't feel confused by it and need to file issues to learn more. |
This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed. |
@purnesh42H is my guess correct? I think in |
No. pick_first also handles failover. It will not give up after a single connection failure or deadline exceeded on a single backend. It will attempt to connect to the next backend in the resolved list.
Not really. roundrobin sends each request to a single backend at a time, determined by its round-robin order. It does maintain connections to multiple backends, but it doesn't send the same request to all of them. |
@itzmanish If you want to fix the issues you're seeing with
|
We're seeing an intermittent issue. This always happens randomly at app startup time, that prevents app from properly starting up
how should we interpret this error? Does it signal an issue with connectivity, gRPC server/client configuration or something else entirely. Appreciate any feedback on this.
The text was updated successfully, but these errors were encountered: