Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store domainlist IDs for blocked/permitted queries #1409

Merged
merged 5 commits into from
Sep 11, 2022

Conversation

DL6ER
Copy link
Member

@DL6ER DL6ER commented Aug 19, 2022

By submitting this pull request, I confirm the following:

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

How familiar are you with the codebase?:

10


Store domainlist IDs for blocked/permitted queries as requested on Discourse

@yubiuser
Copy link
Member

Do you also want to expose this in the web interface? Linking from query log / long term back to the domain table?

@DL6ER
Copy link
Member Author

DL6ER commented Aug 19, 2022

Yeah, that's the idea. It should be fairly simply as the code already exists for black-regex'ed domains.

@DL6ER DL6ER force-pushed the tweak/white_regex_id branch from 98f65e4 to ff30d4a Compare August 20, 2022 13:21
@jpgpi250
Copy link

been running this FTL branch "since Sat 2022-08-20 17:47:07 CEST; 1 weeks 3 days", no problems detected
Pi-hole version is v5.11.4 (Latest: v5.11.4)
AdminLTE version is v5.13 (Latest: v5.13)
FTL version is tweak/white_regex_id vDev-1ccfd9a (Latest: v5.16.3)

query results are remarkable, provide a lot of additional information.

@jpgpi250
Copy link

jpgpi250 commented Sep 1, 2022

DL6ER said, in reply to yubiuser: "Yeah, that's the idea. It should be fairly simply as the code already exists for black-regex'ed domains."

I have been looking at this (/var/www/html/admin/scripts/pi-hole/js/queries.js)

currently there are 3 lines of code for status 4 and status 10:

          if (data.length > 9 && data[9] > 0) {
            regexLink = true;
          }

There are 2 options:

  1. add these lines to the status entries where you want to display the underlined link, which opens a new tab, showing the matching regex.

  2. remove this code from status 4 and status 10, add the code directly after the variable definition, so that the evaluation happens for all entries:

     var fieldtext,
        buttontext = "",
        blocked = false,
        isCNAME = false,
        regexLink = false;
		
      if (data.length > 9 && data[9] > 0) {
        regexLink = true;
      }

      switch (data[4]) {

I tested option 2 (evaluate all), looked at some entries in my database, including cname entries (which is the only other status type (9) that would have a populated additional field entry), and found that the query page behaves as expected.

edit
these tests revealed another problem with CNAME entries, see here, NOT related to this branch (also tested with the master branch), for which I do not have an explanation.
/edit

I may be overlooking something, hence not a PR yet, just an informational note about my test.

@jpgpi250
Copy link

jpgpi250 commented Sep 5, 2022

  • now running combined tweak (CNAME and white_regex_id): tweak/cnamea_and_regexes
  • just upgraded the web component of pihole to AdminLTE version v5.14.2 an applied the above solution again (option 2 - evaluate all). This change still works, and lead to the following (examples) in the query log (web interface)

image
image

image
image

image
image

looking good so far...

@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/cname-cant-explain-this/57457/21

@DL6ER DL6ER marked this pull request as ready for review September 5, 2022 18:18
@DL6ER DL6ER requested a review from a team September 5, 2022 18:18
@DL6ER
Copy link
Member Author

DL6ER commented Sep 5, 2022

Documentation will be needed here: https://docs.pi-hole.net/database/ftl/#additional_info_regex

@jpgpi250
Copy link

jpgpi250 commented Sep 7, 2022

@yubiuser, informational only...

you might want to know why this change is important to me (and other users).
I've said in this topic: whitelist entries are added as response to specific problems, most users never look back, once the problem is solved. When a whitelist entry is no longer required, there is no easy way to detect this.

these sql queries (examples for whitelist - change the type to see other types) show the result of this PR, allowing users to obtain relevant info:

  • note the limitation, 10 days, however it looks like selecting a lot more days doesn't really affect execution time.
  • note this change needs to be active long enough to produce reliable results.
  • add these queries to the documentation?

used exact whitelist entries:

sqlite3 ".timeout = 5000" ".headers on" ".mode column" \
"ATTACH '/etc/pihole/pihole-FTL.db' AS pihole_FTL" \
"ATTACH '/etc/pihole/gravity.db' AS gravity" \
"SELECT count(pihole_FTL.queries.additional_info), gravity.domainlist.id, gravity.domainlist.domain, gravity.gravity.domain AS gravity_entry FROM pihole_FTL.queries \
INNER JOIN gravity.domainlist \
ON pihole_FTL.queries.additional_info = gravity.domainlist.id \
LEFT JOIN gravity.gravity \
ON gravity_entry = gravity.domainlist.domain \
WHERE typeof(additional_info) = 'integer' \
AND gravity.domainlist.type = 0 \
AND pihole_FTL.queries.timestamp > strftime('%s','now','-10 days') \
GROUP BY additional_info ORDER BY count(additional_info) DESC;"

unused exact whitelist entries:

sqlite3 ".timeout = 5000" ".headers on" ".mode column" \
"ATTACH '/etc/pihole/pihole-FTL.db' AS pihole_FTL" \
"ATTACH '/etc/pihole/gravity.db' AS gravity" \
"SELECT DISTINCT gravity.domainlist.id, gravity.domainlist.domain, gravity.gravity.domain AS gravity_entry FROM gravity.domainlist \
LEFT JOIN gravity.gravity \
ON gravity_entry = gravity.domainlist.domain \
WHERE gravity.domainlist.type = 0 \
AND gravity.domainlist.id NOT IN \
( \
SELECT gravity.domainlist.id FROM pihole_FTL.queries \
INNER JOIN gravity.domainlist \
ON pihole_FTL.queries.additional_info = gravity.domainlist.id \
WHERE typeof(additional_info) = 'integer' \
AND gravity.domainlist.type = 0 \
AND pihole_FTL.queries.timestamp > strftime('%s','now','-10 days') \
) \
ORDER BY gravity.domainlist.id ASC;"

@DL6ER
Copy link
Member Author

DL6ER commented Sep 8, 2022

Documentation change pi-hole/docs#774

@DL6ER DL6ER added PR: Approval Required Open Pull Request, needs approval and removed Documentation needed labels Sep 8, 2022
@jpgpi250
Copy link

jpgpi250 commented Sep 8, 2022

@DL6ER @yubiuser
I noticed the documentation PR has been updated, the PR appears to be ready for merge

Do you want me to make a PR (attempt) to modify the web interface (option 2) as described above? Of course, I will do this only if you approve this method...

@yubiuser
Copy link
Member

yubiuser commented Sep 8, 2022

This one is ready for review and I have no doubt we will merge this at some point. It would be nice if you could file a web interface PR. The details are up to you, I'd would prefere

so that the evaluation happens for all entries

@jpgpi250
Copy link

jpgpi250 commented Sep 8, 2022

second attempt: pi-hole/web#2341

Copy link
Member

@yubiuser yubiuser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a case that is not covered yet:
If I have a blacklisted domain that is the target of a CNAME, FTL will write the CNAME domain to the addinfo table. However, to in such a case two information needed to be stored: the domain and the responsible domain_id.

src/api/api.c Outdated Show resolved Hide resolved
src/database/gravity-db.c Outdated Show resolved Hide resolved
@jpgpi250
Copy link

jpgpi250 commented Sep 9, 2022

However, to in such a case two information needed to be stored: the domain and the responsible domain_id.

Do you mean that both the domain and the ID will be stored in the additional_field? In that case, the '!isNAN' function will never work, thus no link, and the displayed CNAME data will probably be corrupted (showing domain & ID)

@yubiuser
Copy link
Member

yubiuser commented Sep 9, 2022

Do you mean that both the domain and the ID will be stored in the additional_field? I

Yes. However, this will need a bigger database scheme change and we will see if DL6ER is going to implement this.

In that case, the '!isNAN' function will never work, thus no link, and the displayed CNAME data will probably be corrupted (showing domain & ID)

In case the database changes your PR will need adjustment. I'm sure we figure out a way to store both, domain and id, and have the link.

@jpgpi250
Copy link

jpgpi250 commented Sep 9, 2022

May I disagree on this (store both)?

This is what happens if there is a cname entry in gravity, e.g.

 pihole -q -exact g.ezoic.net
 Exact matches for g.ezoic.net found in:
  - file:///home/pi/blocklistproject/blocklistproject
  - http://localhost/mylist.txt
  - https://v.firebog.net/hosts/static/w3kbl.txt
  - https://www.github.developerdan.com/hosts/lists/ads-and-tracking-extended.txt

image
the status column indicates blocked (gravity, CNAME)

now adding g.ezoic.net as exact blacklist

 pihole -q -exact g.ezoic.net
 Exact match found in exact blacklist
   g.ezoic.net
 Exact matches for g.ezoic.net found in:
  - file:///home/pi/blocklistproject/blocklistproject
  - http://localhost/mylist.txt
  - https://v.firebog.net/hosts/static/w3kbl.txt
  - https://www.github.developerdan.com/hosts/lists/ads-and-tracking-extended.txt

image

In my opinion there is more than enough indication that it's a 'deep CNAME inspecton' problem, one caused by gravity, the next one caused by a exact blacklist.

I don't think you need to overcomplicate things, the available info is sufficient to solve a problem (undesired block).

@yubiuser
Copy link
Member

yubiuser commented Sep 9, 2022

But you can't tell which exact (could be regex as well) domain blocked g.ezoic.net nor will you have a link to the domain list for that entry.

@jpgpi250
Copy link

jpgpi250 commented Sep 9, 2022

The existing (current master) provides this functionality for regex blacklist only. This change is thus a great improvement, compared to master. There is a single case (deep CNAME inspection + domainlist entry) which doesn't provide a link.
No problem for me, if this can be fixed easily, but the code in queries.js will also be affected, no idea what this will require to make it work.

Anyway, it's up to DL6ER and you to decide which way to go, however, please don't drop this change, because things are getting to complex, the results I'm already able to get out of the database are valuable.

edit
additional concern: It's nice have all the info in the web interface, however, my initial goal was to be able to sqlite3 query (working examples in this conversation (above)). I have no idea how I would write the query, when the additional_info field contains both domain and ID. Crucial to the execution time of the query is "WHERE typeof(additional_info) = 'integer'", which would no longer be valid.
/edit

@DL6ER
Copy link
Member Author

DL6ER commented Sep 9, 2022

The only solution to this would be adding another column. We would actually add two new columns: domainlist_id and cname_domainID (both integer, both may be NULL) and stop using additional_info altogether. I think this is the only really useful solution here. The only other option (storing JSON encoded data) would be very slow.

However, I'm aware that the current solution is suboptimal (we can store only of of the two information bits), however, in the exact example you discussed here there would actually be no difference as the blocked domain came from gravity and, thus, doesn't even have a domainlist ID. Of course, it's easy to craft another example where this isn't the case, though.

@jpgpi250
Copy link

jpgpi250 commented Sep 9, 2022

I'm aware that the current solution is suboptimal

I don't agree (but then again, who am I)

The only time there isn't a link to the exact OR regex entry is the CNAME case.
In that case, the domain, causing the block is available in the web interface.
All the user needs to do is 'pihole -q <domain name>' to find the cause. This is the current solution in the master branch, everything else that comes out of this PR is a bonus...

@jpgpi250 jpgpi250 mentioned this pull request Sep 9, 2022
1 task
src/api/api.c Outdated Show resolved Hide resolved
Co-authored-by: yubiuser <[email protected]>
Signed-off-by: DL6ER <[email protected]>
@DL6ER DL6ER requested a review from yubiuser September 11, 2022 09:31
@DL6ER
Copy link
Member Author

DL6ER commented Sep 11, 2022

@yubiuser Sorry, I didn't even see your review suggestions in hectic everyday life. I applied them just now.

Copy link
Member

@yubiuser yubiuser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After internal discussion we agreed to keep the PR the way it is now. Resolving the remaining limitation (no domainlist_id is stored in case of a CNAME reply) would currently be to time-consuming as it requires a more extensive database re-design.

@DL6ER DL6ER merged commit 4cd6809 into development Sep 11, 2022
@DL6ER DL6ER deleted the tweak/white_regex_id branch September 11, 2022 20:12
@jpgpi250
Copy link

@DL6ER you need / can delete the branch tweak/cnamea_and_regexes (switched to development branch, since it now contains both changes)

@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/status-codes-in-ftl-database-for-whitelisted-entries/55099/5

@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/pi-hole-ftl-v5-18-web-v5-15-and-core-v5-12-1-released/57894/1

This was referenced Oct 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants