Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service "pihole-FTL" keeps crashing/stopping; both as HW on RPi 2 Mod B+ & as VM #614

Closed
3 tasks done
kuerious opened this issue Jul 10, 2019 · 17 comments
Closed
3 tasks done

Comments

@kuerious
Copy link

kuerious commented Jul 10, 2019

In raising this issue, I confirm the following (please check boxes, eg [X]) Failure to fill the template will close your issue:

  • I have read and understood the contributors guide.
  • The issue I am reporting can be replicated
  • The issue I am reporting isn't a duplicate

How familiar are you with the codebase?:

6....ish


[BUG | ISSUE] Expected Behaviour:
Running latest pull of DietPi, installing "Pi-hole", runs 24hrs/day, 7 days per week, without crashing.

[BUG | ISSUE] Actual Behaviour:
A minimum of 1x/day, sometimes up to 3x/day, the service/process "pihole-FTL" crashes without (apparent) log entries as to why.

[BUG | ISSUE] Steps to reproduce:
Installed latest version of

  • DietPi (as of writing, see this link); ran all system & security updates
  • Pi-hole (first ran via DietPi "Software Optimized" installer; 2nd try using "Curl" from website.)
  • For DoH via Cloudflare, used steps first at original link, then used steps at this link
  • No further software or optimizations done

Log file output [if available]

Token File here

Device specifics
1st Attempt:
Hardware Type: rPi 2 Model B+, 8GB microSD
OS: DietPi (Stretch + security updates)

2nd Attempt:
Hardware Type: VM w/ 4x core, 4GB RAM, 8GB space allocated
OS: DietPi (Stretch + security updates)

This template was created based on the work of udemy-dl.

@kuerious
Copy link
Author

kuerious commented Jul 10, 2019

Extra details

Cross-posted at DietPi GitHub as Issue# 2976. HW and VM details above/beyond the above are listed.

@AzureMarker
Copy link
Contributor

AzureMarker commented Jul 10, 2019

Do you have a log of the crash, either from journalctl -u pihole-FTL or /var/log/pihole-FTL.log?

@kuerious
Copy link
Author

kuerious commented Jul 10, 2019

And that's another thing ... it never seems to leave an error message in the logs. See here:
https://pastebin.com/Qg0cejwR

This is the closest I've ever come to a log file that at least shows the entries leading-up to the event:
https://i.imgur.com/UdubFGH.png

@MichaIng
Copy link
Contributor

@kuerious @Mcat12
Lets keep discussion here, so no double posting required.

One idea is that it is due to DietPi-RAMlog, respectively DietPi-Logclear which removes non-today Pi-hole log entries when the files fill half of available tmpfs size.

After the logs are truncated, pihole-FTL is restarted in a special way that did work and was required some earlier times, but perhaps it does not anymore in some cases: https://github.com/MichaIng/DietPi/blob/master/dietpi/func/dietpi-logclear#L164-L177

  • The dnsmasq is relic from pre-FTLDNS times, which could be removed, but does not hurt.
  • I am not sure if this resolvconf restart is required? I remember it was.
  • Actually what would make sense IMO is to stop pihole-FTL before touching the log files and start if afterwards, instead of having this stop>stop>stop>start>start>start part. But you guys might know better how pihole-FTL handles/expects log files and how to have a safe restart procedure in combination with resolvconf.

I will also run some tests later and try to replicate the issue.

@DL6ER
Copy link
Member

DL6ER commented Jul 10, 2019

I am not sure if this resolvconf restart is required? I remember it was.

I don't think it is. I never did it myself.

But you guys might know better how pihole-FTL handles/expects log files and how to have a safe restart procedure in combination with resolvconf.

I don't think you need to restart pihole-FTL here.
We open and close the log file after each logging. I checked git blame and saw that the last changes to this function were made back in April 2017.
It should be (fairly) safe to leave FTL running while truncating the log file (depending on how you do it).
Furthermore, it might be worth injecting a line like

echo "DietPi: Truncated log file due to size." > /var/log/pihole-FTL.log

or something similar to avoid or at least reduce uncertainties if the log is really empty or was flushed at some point. Probably even with a time stamp.

it never seems to leave an error message in the logs

We have signal handlers that should catch and report every crash or termination request. The only thing we can not catch and log is an immediate termination via SIGKILL. This would have to be found using other means.

@MichaIng
Copy link
Contributor

MichaIng commented Jul 10, 2019

@DL6ER

We open and close the log file after each logging. I checked git blame and saw that the last changes to this function were made back in April 2017.

Okay I think stopping it before touching the files still makes sense to avoid the rare change that both touch the logs the same time? Although it was never an issue with other logs where it could basically be the same, if not concurrent accesses are handled kernel-wise. However I think there is a difference between "opening" and "closing" a file, so "occupying" it and doing a single write, AFAIK.

Furthermore, it might be worth injecting a line like

Very good idea, will add this.

@kuerious
There is a syntax in your journalctl check, please try: journalctl -u pihole-FTL


Ah now I see something:

  • We handle pihole.log the mentioned way but not pihole-FTL.log.
  • Makes sense to handle/truncate both in case, but I will check when running some tests later.

@DL6ER
Copy link
Member

DL6ER commented Jul 10, 2019

to avoid the rare change that both touch the logs the same time?

I do not yet see the real difference than just truncating the file. The worst thing that could happen would be a cropped log line. We do not write to the file when it cannot be opened and when we opened it, you should not be able to delete it: the file might disappear on the file system but our handle stays open and we can write to the file.
And if you just truncate the file by running an echo > command, then I don't see problems here.

In contrast, (re)starting pihole-FTL actually involves a lot of work. Assume a user has millions of domains in his blocking lists and possibly configured a number of regex filters as well, then a restart can need up to (several) minutes in the worst case to complete on really low-end devices (thinking of Pi 1B+ or similar).

We handle pihole.log the mentioned way but not pihole-FTL.log.

Yeah, it's different for pihole.log, however, I think we even disable logging by default on new installs? This is somewhat guessed, though. we really should disable logging by default on new installs!

@MichaIng
Copy link
Contributor

MichaIng commented Jul 10, 2019

pihole logging off if I see right.

Okay so pihole-FTL.log is for the service state and it makes sense then that we do no special handling for this in when RAMlog being enabled (thus hourly log clearing). pihole.log is the one that shows blocked requests and can be seen on web UI, right? Actually having this enabled by default makes quite sense to me to have an overview about how much and which ad requests are blocked. As well we (DietPi) receive reports about certain github.com subdomains being blocked (codeload.github.com and another one I currently don't remember), which then breaks any download from GitHub repos. I couldn't find out until now which blocklist is responsible, nor could I replicate the issue on a fresh Pi-hole install with all by default available blocklists enabled. So far I could not get the users to check the pihole.log which would contain the responsible block list? However adding the affected domain to the whitelist always solved the issue.

Anyway I have to spin up a test install to go through all of this, hopefully with a solution for clearing pihole.log without doing a pihole-FTL restart. One idea was to not have it in RAMlog, but instead in the userdata dir that most users move to external drive (hopefully), then leaving a symlink in /var/log/. Clearing logs can skip symlinks of course as long as their destination is outside the tmpfs.

And if you just truncate the file by running an echo > command, then I don't see problems here.

It's sed -i '/<todays_timestamp>/!d' /var/log/pihole.log so only logs from past days are removed. But that should not make a difference.

@AzureMarker
Copy link
Contributor

AzureMarker commented Jul 11, 2019

You can find what is blocking a domain by using pihole -q or the associated function on the web interface (under "Tools -> Query Lists").

The only current usages of pihole.log are the "Tail pihole.log" feature on the web interface, pihole -t (which does the same thing but with added colors), and the debugger (which just saves the last few lines). None of these usages are critical or so often used that they can't be disabled by default.

@MichaIng
Copy link
Contributor

MichaIng commented Jul 11, 2019

Okay I just checked things now:

  • pihole-FTL.log is just the service/process log, which makes sense to have since it gives much more details for debugging than what journald receives.
    • Clearing it on hourly basis as usual, if RAMlog enabled, is totally okay and does not break anything.
  • pihole.log contains detailed dnsmasq access log. Clearing it does not affect the query log from web UI and after some times testing does not break the service (nor any name resolving) as well. Perhaps this was an issue in the past but is not anymore?
  • The query logs from web UI are stored in database /etc/pihole/pihole-FTL.db (?)

So indeed pihole.log could/should be disabled without any disadvantage for regular users. I used:

pihole -l off
rm /var/log/pihole.log
  • Output is a bid confusing:
  [✓] Deleted  queries from database
  [i] Disabling logging...
  [✓] Restarting DNS service
  [✓] Logging has been disabled!
  • Logs from database (thus web UI) are removed as well, besides the most current four entries (in my case).
  • But help page states: off Disable and flush the Pi-hole log at /var/log/pihole.log and no word about query/database logs
  • Would be good to make things clear here and fully separate pihole.log file from database query logs.
  • Since it seems doubled to me, it could be instead made clear and grant some flexibility by allowing to choose between database logging (default) and file logging (or both).
  • Another bid confusing thing when having pihole.log disabled: On web UI settings page system tab, it allows to "Enable query logging", while when switching to privacy it shows full logging enabled, which is also proven by generating query log entries. So the switch on settings>system page toggles the pihole.log but not the actual query log from database that is shown on the web UI.

So basically a little cleanup and clear separation of these two logging systems would be helpful, and yeah, disabling pihole.log by default totally makes sense 👍.

For DietPi I will now implement the following after Pi-hole installer has finished:

pihole -l off
rm /var/log/pihole.log

@DL6ER
Could you run the above as well and check if your crashes are resolved then?

Additionally I will remove the special pihole.log handling from our logclear script. Even if enabled, I could not force any error by clearing the file. Furthermore the pihole-FTL restart, when file has grown over limit, seems to break in cases (in combination with resolvconf restart?) and as @Mcat12 mentioned is a large overhead as well.

MichaIng added a commit to MichaIng/DietPi that referenced this issue Jul 11, 2019
+ DietPi-Software | Pi-hole: Disable pihole.log by default: pi-hole/FTL#614 (comment)
+ DietPi-Software | Pi-hole: Add Pi-hole specific Lighttpd include config based on: https://github.com/pi-hole/pi-hole/blob/master/advanced/lighttpd.conf.debian
+ DietPi-Software | Lighttpd: Do not create lighttpd.conf backup, since we only edit webroot or fix a Buster upgrade bug, which both is simply required and expected by user to have the just chosen software title + Lighttpd up and running.
+ DietPi-Logclear | Remove special pihole.log handling, which is not required since clearing it does not break the service/process anymore and query logs are holt in database. Also pihole.log is disabled now by default on our Pi-hole installs.
+ DietPi-Logclear | Remove log file exclusion code fragments, which was never really implemented
+ DietPi-Logclear | Further coding enhancements
+ DietPi-Cleaner | Never remove Git, since it is required often for internal updaters, e.g. Pi-hole
+ DietPi-Cleaner | Fix /var/tmp/dietpi/logs/dietpi-ramlog_store removal and use new DietPi-Set_software apt-cache function to clean APT
+ DietPi-Cleaner | Minor coding
@kuerious
Copy link
Author

I, too, have done the above:

root@dns1:~# pihole -l off
[✓] Restarting DNS service
[✓] Flushed /var/log/pihole.log
[✓] Deleted 46466 queries from database
[i] Disabling logging...
[✓] Restarting DNS service
[✓] Logging has been disabled!
root@dns1:~# rm /var/log/pihole.log
root@dns1:~#

It seems to have cleared the queries logged up to this point (in the daily/24-hour period), which I find strange, see here:

I will keep an eye on things and report back anything of interest.

@MichaIng
Copy link
Contributor

@kuerious

It seems to have cleared the queries logged up

Jep exactly as I mentioned above, a bid strange it is. However this is just a single time and the query logs (in database) continue as before, as wanted.

@kuerious
Copy link
Author

Well, it's been a few hours, and the graphs all seem to be happy:

https://i.imgur.com/xEpTEdG.png

No need for a reboot in awhile. Now, mind you, this is the VM I used with the x86_64 DietPi image. I set the specs to be 4x cores, 4GB memory, and 8GB storage. This is geared more towards the new RPi 4, not the lower-end RPi devices.

Should I drop the memory to see if things happen? Or will it not matter?

@AzureMarker
Copy link
Contributor

The pihole.log file itself should not be deleted, as it is still used for general dnsmasq logging (just not query logging).

Previously, before FTL v4, we read from the dnsmasq log in order to get the query information. Now that we run dnsmasq as part of FTL, we read the queries directly from dnsmasq and no longer need the log.

@MichaIng
Copy link
Contributor

MichaIng commented Jul 12, 2019

The service recreates the log file on start anyway, so no point in removing it.

Btw also the dir /var/log/pihole is created but not in use with default setup?

@kuerious
Memory size should not play a role. The size of the /var/log tmpfs (50 MiB by default) is relevant. If pihole.log reaches 1/3 of this size, logclear kicks in. But the log file can hardly reach this size in reasonable time with this query logs disabled.

@AzureMarker
Copy link
Contributor

/var/log/pihole seems to have been planned as the location of the logs, but is not used by default. @DL6ER, can we remove this from the service script?

@DL6ER
Copy link
Member

DL6ER commented Mar 8, 2020

Solved by DietPi 6.26.0

@DL6ER DL6ER closed this as completed Mar 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants