Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent GRPC configuration causes SRV DNS lookups which can fail #5563

Open
keeganwitt opened this issue Oct 9, 2024 · 4 comments
Open

Agent GRPC configuration causes SRV DNS lookups which can fail #5563

keeganwitt opened this issue Oct 9, 2024 · 4 comments
Labels
priority/backlog Issue is approved and in the backlog

Comments

@keeganwitt
Copy link
Contributor

keeganwitt commented Oct 9, 2024

Using grpc.WithDefaultServiceConfig(roundRobinServiceConfig) results in SRV DNS lookups. I believe this is because GRPC's code here will attempt to populate addresses to load balance between with both the SRV and A records if EnableSRVLookups is true, which it will be because the grpclb package is initialized.

However, when using an external load balancer (such as aws-load-balancer) and external DNS so that the agents collocated in the same pod as your downstream server can access the upstream server, it should not be using SRV records, but should instead be using A records, as those are the type AWS will create. This results in failed DNS lookups and excessive load on your DNS system. If you generate enough of these NXDOMAIN queries, there can be a significant expense in Route53.

The usage of this load balancer was introduced in #1061.

@keeganwitt
Copy link
Contributor Author

keeganwitt commented Oct 9, 2024

Actually, in the case of Kubernetes, even for the agents in a daemonset communicating to the downstream server, A/AAA records will be typical rather than SRV records (see here).

I'm thinking the fix for this would be to add an option to the agent config to turn on/off the gRPC load balancing.

@keeganwitt
Copy link
Contributor Author

grpc.WithDefaultServiceConfig(roundRobinServiceConfig),
and
grpc.WithDefaultServiceConfig(roundRobinServiceConfig),
were the two places that cause this.

@sorindumitru
Copy link
Contributor

@keeganwitt There's #4990 to make that configuration more configurable. We've agreed on allowing more options than the default, maybe another option would be no configuration.

@amartinezfayo amartinezfayo added the priority/backlog Issue is approved and in the backlog label Oct 10, 2024
@amartinezfayo
Copy link
Member

Thank you @keeganwitt for opening this, and thank you @sorindumitru for pointing to the issue that this depends on.
This depends on #4990 to be able to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Issue is approved and in the backlog
Projects
None yet
Development

No branches or pull requests

3 participants