-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS query cancelled error on v1.8.3 when sending to http-intake.logs.datadoghq.eu (works in v1.8.2) #3944
Comments
Hey @ThomasHenckel thanks for filing
What would you suggest would be the best way to rule that out? Is it possible to use a static IP to see DNS difference between 1.8.3 / 1.8.2 could be causing issue? |
Hi @agup006 I just tried using the ip address as the host name, and it looks like logs are getting through I have talked with other devs in my company and looks like they don't have the problem, i'm the only using ubuntu for my docker images. i install using: Hope it helps |
We also have the same problem which started the minute we upgraded from 1.8.2 to 1.8.3, i.e. we have the very close time correlation (few minutes) between the version change and the beginning of this issue. |
Same problem here. =] |
Same problem here. centos 7, connecting to graylog. |
In our case, dns resolution from the command line to the hostname that was failing dns lookups was working just fine. As a diagnostic and stopgap fix, we tried putting an entry in the /etc/hosts file directing the hostname that fails lookup to its proper ip, and that worked. With the host file entry in place, td-agent-bit starts up fine and does its job. So... what is going wrong with getaddrinfo doing its dns lookups. Is something getting negatively cached somewhere it shouldn't? What's changed since v1.8.2? |
I'm not too familiar with this codebase, so forgive me if this is the wrong direction, but I did a quick search for getaddrinfo and found this file, which appears to have been touched recently. I do see several recent changes in the file that look like they're related to name resolution, but I'm not sure which ones fall between release 1.8.2 and 1.8.3 Could these be related? |
Also seeing this on centos 7, we're using the official yum repo which only seems to have the latest package so unable to rollback to 1.8.2 easily. |
1.8.5 added a new DNS network setting: https://github.com/fluent/fluent-bit/blob/master/src/flb_upstream.c#L38 So in your output you can set:
The other valid value is |
I tried with version 1.8.6 and
but got the same errors than before |
@edsiper there are multiple reports from AWS and non-AWS users that DNS resolution is still broken in some cases. IMO, this is a critical issue that deserves to be a top priority. |
I'm not sure why this stuff is played with to be honest. Not even unique to fluent-bit, everyone seems to like to toy with a working DNS solution and then it completely wrecks the product. I've been having these issues with docker images all the way from 1.8.3 -> 1.8.6. I was previously using v1.7.3 without any problems. I'm running in a fresh EKS cluster and coredns is fine. It's fluent-bit and toying with the settings around DNS resolution. I agree @PettitWesley, critical stuff like this turns me off of a product's usage. There are plenty of reports all around about 1.8.3+ has basically screwed up DNS in fluent-bit. My favorite part: how the app segfaults after it can't find a valid host. Like really... what? |
FYI: it's not a DNS issue, DNS was affected due to premature closure of the TCP connection by the upstream handler and it looked like a DNS issue, but is not. A fix is being shipped on today's release. |
@edsiper thanks for the input. Any idea how long this particular issue has been around? It must have been introduced in a version >1.7.3... |
@farvour my testing suggested it works in 1.8.4 or lower: #4050 (comment) |
@farvour/everyone this has been fixed in 1.8.7: https://fluentbit.io/announcements/v1.8.7/ I will close this issue. Please re-open if needed. |
Bug Report
Describe the bug
After upgrading to v1.8.3 td-agent-bit does not work when sending logs to http-intake.logs.datadoghq.eu
To Reproduce
RUN apt-get -y install td-agent-bit
/opt/td-agent-bit/bin/td-agent-bit -c /etc/td-agent-bit/td-agent-bit.conf --log_file=/tmp/fluentbit.log
Expected behavior
The logs in the logfile should be send to datadog
Your Environment
Additional context
This error appeared after i updated to version 1.8.3, and if i go back to 1.8.2 it works again
Might only be related to datadog eu
The text was updated successfully, but these errors were encountered: