received context error while waiting for new LB policy update: context deadline exceeded #7983

mayurkale22 · 2025-01-07T19:42:09Z

We're seeing an intermittent issue. This always happens randomly at app startup time, that prevents app from properly starting up

rpc error: code = DeadlineExceeded desc = received context error while waiting for new LB policy update: context deadline exceeded

how should we interpret this error? Does it signal an issue with connectivity, gRPC server/client configuration or something else entirely. Appreciate any feedback on this.

The text was updated successfully, but these errors were encountered:

eshitachandwani · 2025-01-08T08:44:18Z

Hey @mayurkale22 , only this error message doesn't provide much information about what could be causing it. The problem could be caused by a number of issues starting from slow network to some error in the application or issue with name resolution. To help identify the cause, you can enable debug logs using

$ export GRPC_GO_LOG_VERBOSITY_LEVEL=99
$ export GRPC_GO_LOG_SEVERITY_LEVEL=info

and that will give us more idea about the root cause.

itzmanish · 2025-01-09T11:20:52Z

Hi @mayurkale22 are you using DNS resolver?

I am also getting the same error on the latest GRPC client.
grpc-go v1.67.3

I can see the resolver resolving 3 endpoints that point to my NLB and are correct. The only difference I can see is that it's using LB policy to "pick_first" instead of round-robin.

itzmanish · 2025-01-12T06:24:33Z

I have more information on this issue. I got it fixed for my usecase SOMEHOW, not sure if this really fixed it or just mitigated it for the time being.

In my setup I had a DNS endpoint resolving to multiple A address of NLB and my client was using pick_first loadbalancer for some reason (maybe it's default loadbalancer). I changed the loadbalancer to round_robin and I don't find the context deadline issue anymore. My guess is for some reason one of the NLB endpoint was taking longer or GRPC lib was not able update the resolver state for that endpoint.

I am still poking around the library for the flow and how it creates a client connection. I will update here if I find anything else.

purnesh42H · 2025-01-15T06:08:51Z

@mayurkale22 could you provide more information in following format https://github.com/grpc/grpc-go/issues/new?template=bug.md with debugging enabled?

Please specify if you are using a different load balancing policy or name resolver than the default one. Mention the grpc version you are on and if you are using grpc.NewClient or grpc.Dial

purnesh42H · 2025-01-15T06:18:13Z

As per your other question of how to interpret rpc error: code = DeadlineExceeded desc = received context error while waiting for new LB policy update: context deadline exceeded

It can happen when either there is no picker to connect to backend or there was a valid picker but it has become invalid now (because balancer detected a change in backend availability). So, its more likely a connectivity issue since you mentioned it happens intermittently. Do you have single backend or multiple?

To give some background, Picker is used by gRPC to pick a SubConn (backend) to send an RPC. Balancer is expected to generate a new picker from its snapshot every time its internal state has changed. Balancer takes input from gRPC, manages SubConns, and collects and aggregates the connectivity states. It also generates and updates the Picker used by gRPC to pick SubConns (backends) for RPCs.

dfawley · 2025-01-16T16:20:48Z

Note that there was a recent change to output this message instead of a more generic "deadline exceeded" error.

If this is happening at startup, then it's almost always going to be that we are still waiting for connections to be established. Maybe the RPC has too short of a deadline?

If there were errors connecting, then those errors would be given to the RPC instead.

I wonder if we can further improve this error so that users don't feel confused by it and need to file issues to learn more.

github-actions · 2025-01-22T16:46:54Z

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

itzmanish · 2025-01-22T18:00:13Z

My guess is for some reason one of the NLB endpoints was taking longer or GRPC lib was not able to update the resolver state for that endpoint.

@purnesh42H is my guess correct? I think in pick_first only one conn is made and if it gets a deadline exceeded the request fails. While in roundrobin multiple connections are made and a request is made using all of the connections, because this can only explain why my service is working correctly after switching to roundrobin.

purnesh42H · 2025-01-24T07:13:58Z

is my guess correct? I think in pick_first only one conn is made and if it gets a deadline exceeded the request fails.

No. pick_first also handles failover. It will not give up after a single connection failure or deadline exceeded on a single backend. It will attempt to connect to the next backend in the resolved list.

While in roundrobin multiple connections are made and a request is made using all of the connections, because this can only explain why my service is working correctly after switching to roundrobin.

Not really. roundrobin sends each request to a single backend at a time, determined by its round-robin order. It does maintain connections to multiple backends, but it doesn't send the same request to all of them.

arjan-bal · 2025-01-24T08:58:10Z

@itzmanish pickfirst tries one address at a time until it finds a healthy backend. pickfirst minimizes the number of active transports. roudrobin tries to create a transport with every backend at the same time. roundrobin reports ready as soon as a single backend is connected. If you have unhealthy backends in the front of the list of addresses produced by DNS, pickfirst will take more time than roundrobin to report ready. On the other hand, roundrobin will create more transports than pickfirst.

If you want to fix the issues you're seeing with pickfirst, consider the following:

Increase the timeout for the context that is used to make RPCs.
If the backends are unreachable, use a custom dialer and set a reasonable timeout for establishing connections, e.g: use net.Dialer{Timeout: 2 * time.Second}. This should make pickfirst move to the next address in the list faster.

mayurkale22 added the Type: Question label Jan 7, 2025

eshitachandwani self-assigned this Jan 8, 2025

eshitachandwani assigned mayurkale22 and unassigned eshitachandwani Jan 8, 2025

purnesh42H added Status: Requires Reporter Clarification Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. labels Jan 10, 2025

github-actions bot added the stale label Jan 22, 2025

dfawley removed the Status: Requires Reporter Clarification label Jan 22, 2025

dfawley assigned purnesh42H and mayurkale22 and unassigned mayurkale22 and purnesh42H Jan 22, 2025

purnesh42H mentioned this issue Jan 24, 2025

picker_wrapper: simplify picker error when timing out waiting for con… #8035

Merged

purnesh42H closed this as completed in #8035 Jan 28, 2025

purnesh42H reopened this Jan 28, 2025

purnesh42H added the Status: Requires Reporter Clarification label Jan 28, 2025

purnesh42H assigned mayurkale22 and unassigned purnesh42H Jan 28, 2025

github-actions bot removed the stale label Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

received context error while waiting for new LB policy update: context deadline exceeded #7983

received context error while waiting for new LB policy update: context deadline exceeded #7983

mayurkale22 commented Jan 7, 2025

eshitachandwani commented Jan 8, 2025

itzmanish commented Jan 9, 2025

itzmanish commented Jan 12, 2025

purnesh42H commented Jan 15, 2025 •

edited

Loading

purnesh42H commented Jan 15, 2025 •

edited

Loading

dfawley commented Jan 16, 2025

github-actions bot commented Jan 22, 2025

itzmanish commented Jan 22, 2025

purnesh42H commented Jan 24, 2025

arjan-bal commented Jan 24, 2025 •

edited

Loading

received context error while waiting for new LB policy update: context deadline exceeded #7983

received context error while waiting for new LB policy update: context deadline exceeded #7983

Comments

mayurkale22 commented Jan 7, 2025

eshitachandwani commented Jan 8, 2025

itzmanish commented Jan 9, 2025

itzmanish commented Jan 12, 2025

purnesh42H commented Jan 15, 2025 • edited Loading

purnesh42H commented Jan 15, 2025 • edited Loading

dfawley commented Jan 16, 2025

github-actions bot commented Jan 22, 2025

itzmanish commented Jan 22, 2025

purnesh42H commented Jan 24, 2025

arjan-bal commented Jan 24, 2025 • edited Loading

purnesh42H commented Jan 15, 2025 •

edited

Loading

purnesh42H commented Jan 15, 2025 •

edited

Loading

arjan-bal commented Jan 24, 2025 •

edited

Loading