-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flb_network] DNS auto retry (Timeout while contacting DNS servers) 🌐 #4257
Comments
It appears the DNS timeout was introduced as a fairly highlevel design decision (not part of the unix api response) 2 months ago in this commit.
Actually it doesn't seem like anyone is setting the |
The STS and EKS providers do use DNS since they have to find the STS service. |
I see. The sts connect timeout which is set to FLB_AWS_CREDENTIAL_NET_TIMEOUT is also a generous value - 5 seconds. Not apparent that this would be a problem. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
Seems like most DNS problems are a byproduct of other fluent bit issues, and not really a DNS resolution failure. Seeing less DNS related problems lately and can close this. |
Is your feature request related to a problem? Please describe.
Some Fluent Bit users are experiencing DNS timeout errors every now and then which lead to log loss. The current solution is to set retry_limit to a value above 2, as proposed by @ssplatt and @PettitWesley. This however may be too broad of a solution to address transient DNS errors. See aws/aws-for-fluent-bit#253 (comment)
Describe the solution you'd like
A cleaner solution than having DNS failure fail the entire request would be to add a DNS retry on DNS timeout directly to the DNS lookup method.
This could be implemented with a for loop around the
ares_getaddrinfo()
and timer block herefluent-bit/src/flb_network.c
Line 941 in 231ef4b
result_code == ARES_ETIMEOUT
From here Corresponding to your error message hereDescribe alternatives you've considered
This issue could also be mitigated with:
Additional context
The text was updated successfully, but these errors were encountered: