Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEBIAN Base] Possibly Fix Random Down, DNS Timeout, EAGAIN, EAI_AGAIN #295

Closed
louislam opened this issue Aug 31, 2021 · 19 comments
Closed

Comments

@louislam
Copy link
Owner

louislam commented Aug 31, 2021

If you are experiencing such issues, it is worth to try this debian base docker image, and let me know whether it is solved. Thank you!

Docker Image

uptime-kuma:1.5.3-debian

Story:

There were some Uptime Kuma users who were facing weird connectivity problems which I cannot explain.

Recently, one of our contributions @chakflying found out that it may be related to Alpine Docker which Uptime Kuma is based on it.

@louislam louislam added the help wanted May need your help to test or answer label Aug 31, 2021
@louislam louislam pinned this issue Aug 31, 2021
@louislam louislam changed the title [Uptime Kuma Based on Debian] Possibly Fix Random Down, DNS Timeout, EAGAIN, EAI_AGAIN [DEBIAN Base] Possibly Fix Random Down, DNS Timeout, EAGAIN, EAI_AGAIN Aug 31, 2021
@CallMeTerdFerguson
Copy link

CallMeTerdFerguson commented Aug 31, 2021

See my comment in #275 for slightly more detail , but TLDR; switching from alpine did nothing to resolve the issue for me. I did resolve it successfully though by ensuring kuma was going directly to my DNS server by setting DNS on the container instead of requests going through the docker gateway. Does kuma's container appropriately cache and respect TTL's for DNS records?

@Oaktribe
Copy link

Oaktribe commented Sep 1, 2021

Currently trying the uptime-kuma:1.5.0-debian image, will report back when it has run for a while.

@ovizii
Copy link

ovizii commented Sep 1, 2021

I did resolve it successfully though by ensuring kuma was going directly to my DNS server by setting DNS on the container instead of requests going through the docker gateway.

Just in case someone else wants to try this, I manually added these dns servers to uptime-kuma's docker-compose.yml file and my response times have also gone down.

    dns:
      - 1.1.1.1
      - 9.9.9.9
      - 8.8.8.8

@louislam louislam unpinned this issue Sep 1, 2021
@Oaktribe
Copy link

Oaktribe commented Sep 3, 2021

So far there has been no timeout issues for me. So it looks like it has helped? Will let it run over the weekend too and report back.

@louislam louislam pinned this issue Sep 3, 2021
@gaby
Copy link
Contributor

gaby commented Sep 5, 2021

From my understanding the DNS issues affecting Alpine happens if the host is using a Kernel older than 5.x. When the host is using 5.x or newer the issue doesn't happen. Commit here: torvalds/linux@4e35c1c

Another thing that helps with DNS issues in alpine is installing the bind-tools package.

@louislam
Copy link
Owner Author

louislam commented Sep 5, 2021

Another thing that helps with DNS issues in alpine is installing the bind-tools package.

Interesting, but is it just a dns cli tools? Why it could affect that?👀

@gaby
Copy link
Contributor

gaby commented Sep 5, 2021

I think you are right, it may be just be a cli tool. We use it since this comment in the Alpine Repo (gliderlabs/docker-alpine#539 (comment)). It helps when doing healthchecks in Kubernetes.

Another suggestion would be to update the documentation and recommend running uptime-kuma Docker container with:

docker run --dns 8.8.8.8 --dns 1.1.1.1

@dbrennand
Copy link

dbrennand commented Sep 5, 2021

Hi @louislam

I have also been experiencing these issues and just started using the debian based image. I will report back in a couple of days.

@dr460nf1r3
Copy link

No more random downtimes due to mentioned errors on my end for days, seems to be fixed by this.

@gaby
Copy link
Contributor

gaby commented Sep 5, 2021

Any test hostname that can be used to test this issue using the Alpine container?

@Oaktribe
Copy link

Oaktribe commented Sep 6, 2021

No timeouts on my end either. It has been running for 5 days, looks like it works perfectly fine with the Debian image.

@louislam
Copy link
Owner Author

louislam commented Sep 6, 2021

Thank you for you guys' reports. Combining the reports here and the similar reports from the Internet, I am pretty sure Alpine Linux is having some kind of dns problem that they are not going to fix.

Plus, armv7 error in Alpine Linux >= 3.13, 3.14 (#41), I am going to fade out Alpine Linux in the next release.

If you are a developer in any other projects and to future me, my advice is that you should keep away from Alpine Linux. Imagine that your application need to call payment gateway API such as Paypal and it throws EAI_AGAIN randomly, that could be a disaster.

@dbrennand
Copy link

Hmm... I'm still seeing EAI_AGAIN with the Debian image. 🤔definitely doesn't occur as often as before though.

@louislam
Copy link
Owner Author

louislam commented Sep 7, 2021

Hmm... I'm still seeing EAI_AGAIN with the Debian image. 🤔definitely doesn't occur as often as before though.

Then your local dns server is actually busy, you can config your container to use Google DNS / Cloudflare DNS directly.

#295 (comment)

#295 (comment)

@louislam louislam closed this as completed Sep 9, 2021
@louislam louislam unpinned this issue Sep 9, 2021
@dbrennand
Copy link

Hi @louislam

So you mentioning that my DNS server is busy got me thinking, I remember seeing a post on /r/selfhosted about PiHole rate-limiting queries since FTL v5.7. This might be why I'm seeing EAI_AGAIN errors.

I have just added the RATE_LIMIT=0/0 to my PiHole's /etc/pihole/pihole-FTL.conf and ran sudo systemctl restart pihole-FTL. Hopefully this fixes my issues. Leaving this here in case it may help someone else who is using PiHole.

@louislam
Copy link
Owner Author

louislam commented Sep 9, 2021

Hi @louislam

So you mentioning that my DNS server is busy got me thinking, I remember seeing a post on /r/selfhosted about PiHole rate-limiting queries since FTL v5.7. This might be why I'm seeing EAI_AGAIN errors.

I have just added the RATE_LIMIT=0/0 to my PiHole's /etc/pihole/pihole-FTL.conf and ran sudo systemctl restart pihole-FTL. Hopefully this fixes my issues. Leaving this here in case it may help someone else who is using PiHole.

Oh, Pihole again. Some users were using Pihole that also got the error.
I personally think that Pihole is "anti-pattern", because there is no secondary dns where you are using pihole. Pihole down, everything is down.

My recommendation is you should don't put serious things behind Pihole. Just use it for the browsing and watching Youtube.

@thespad
Copy link

thespad commented Sep 10, 2021

You can just run two instances on different hardware; it's no different to any other DNS resovler, you want redundancy.

As for Alpine, most of the Linuxserver fleet is using an Alpine base (mix of 3.14, 3.13, and a few older versions) and we've never encountered any widespread DNS issues - with the scale of use of our images I would have expected to get a lot of reports if it was something systemic with Alpine.

Edit: Also yikes, the image is nearly a gig with the Debian "slim" base.

@louislam
Copy link
Owner Author

You can just run two instances on different hardware; it's no different to any other DNS resovler, you want redundancy.

As for Alpine, most of the Linuxserver fleet is using an Alpine base (mix of 3.14, 3.13, and a few older versions) and we've never encountered any widespread DNS issues - with the scale of use of our images I would have expected to get a lot of reports if it was something systemic with Alpine.

I actually can't reproduce the dns problem, but the fact is a few users keep reporting such problem in this repo and alpine's repo (gliderlabs/docker-alpine#255).

I still build alpine image, you can use 1.6.0-alpine, 1-alpine, alpine.

@louislam louislam removed the help wanted May need your help to test or answer label Sep 10, 2021
@dirdante2
Copy link

dirdante2 commented Jan 11, 2022

hello
i just want to add the following info
i using dockerimage uptimekuma V 1.11.1
and dockerimage pihole Docker Tag 2021.11

i noticed a few random EAI_AGAIN errors but http website status says still UP
so everything works ... except i couldn't get smtp notification working (didn´t know why)

after updating phihole to Docker Tag 2022.01.1 and uptimekuma to 1.11.3
every monitor check that uses dns stop working completely including just http://domain status check

adding
dns:
- 1.1.1.1
- 9.9.9.9
- 8.8.8.8
to docker-compose file fixed all error including smtp notification

only if i user dns monitor and specific ask pihole i get status down error
its wired because everything else works perfectly with pihole

hope this helps

edit
just updating uptimekuma from 1.11.1 to 1.11.3
while using the old phihole container leads to the same result : uptimekuma can´t use pihole dns (I don't mean that it is uptime Kuma's fault)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants