Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subdomain matching / regex support #12

Closed
dtjm opened this issue Feb 14, 2020 · 10 comments · Fixed by #283
Closed

Subdomain matching / regex support #12

dtjm opened this issue Feb 14, 2020 · 10 comments · Fixed by #283
Labels
🔨 enhancement New feature or request
Milestone

Comments

@dtjm
Copy link

dtjm commented Feb 14, 2020

Hello,

It looks like the current implementation does exact matching on the domain name. I'm wondering if you would be open to a change that adds suffix matching, such that a blacklist entry of foo.com would also block *.foo.com, *.*.foo.com, etc.

I would probably implement it using a trie data structure to minimize the memory cost and lookups would probably be O(n) where n is the length of the search string.

This behavior could be configurable so that the default behavior remains the same unless this feature is enabled.

@0xERR0R
Copy link
Owner

0xERR0R commented Feb 14, 2020

Blocky supports currently black and whitelists in hosts format. This format does not allow to define wildcards etc., only whole domain names. There are lot of lists with domain names for ads in this format. Disadvantage is: one record must exist for each subdomain.
There is also another common format of blacklists in the wild: adblock format. Lists in this format are more flexible, you can define subdomains and use wildcards.
Another option is a regex format, like pihole is using.
I think, it would be nice to have an option to block a domain with wildcard. Maybe a regex is more powerful than adblock or just wildcard.

@0xERR0R 0xERR0R added the 🔨 enhancement New feature or request label Feb 14, 2020
@0xERR0R 0xERR0R added this to the 0.15 milestone Apr 23, 2021
@0xERR0R 0xERR0R modified the milestones: 0.15, 0.16 Jul 27, 2021
@anaschaudhary33
Copy link

I also want to see this feature in blocky.

@0xERR0R
Copy link
Owner

0xERR0R commented Sep 16, 2021

I see the need of this feature, but I'm not sure, what is the best approach to implement it:

  • Simple wildcard for subdomains: user can define a custom list entry with wildcard for subdoman, e.g. *.youtube.com. This would block only the subdomains like "www.youtube.com" and "m.youtube.com". The implementation is simple and the performance impact is very low. But this is not very flexible, for example you can't define ``youtube` to block "m.youtube.com" and "youtube-otherdomain.de".
  • Regex approach: user can define a regex entry like ^.*youtube.*$. Very flexible approach, but it needs more CPU resources for each request. It is necessary to introduce some "magic" character to distinguish between a "normal" host name and a reges (e.g. ^ or like AdguardHome /
  • Some fancy lists (adblock format)

Any ideas?

@anaschaudhary33
Copy link

anaschaudhary33 commented Sep 16, 2021

I think adblock format would be best. As it is much popular nowadays and give more flexibility.

@0xERR0R
Copy link
Owner

0xERR0R commented Sep 16, 2021

This is correct, but adblock format was designed for client side blocking (where you have the whole url). Here we have only the domain name, therefore it is only a sub set and you can't use all predefined lists

@anaschaudhary33
Copy link

Yes. But we can use the manually defined list.
And for predefined lists we can rely on the host format.

I was previously using adguard home and it also works with Adblock rule.

@shahbazkhan777
Copy link

I will vote in favour of regex format if you want manually defined blocklists. It would be more powerful than simple wildcard or adblock plus format.

@0xERR0R
Copy link
Owner

0xERR0R commented Sep 16, 2021

I tend to the solution with regex too, and I hope this will not have much performance impact (every request must be checked against all defined regex, but the number of regex should be negligible. I also prefer AdguardHome approach: each regex entry must be enclosed in "/", for example /^banners?[_.-]/

@0xERR0R 0xERR0R changed the title Subdomain matching Subdomain matching / regex support Sep 18, 2021
0xERR0R added a commit that referenced this issue Sep 18, 2021
0xERR0R added a commit that referenced this issue Sep 18, 2021
@LexterS999
Copy link

So it's pihole regex style or adguard regex style?

@0xERR0R
Copy link
Owner

0xERR0R commented Sep 21, 2021

Hey, not sure about pihole, but adguard format works. For example: https://github.com/mmotti/adguard-home-filters/blob/master/regex.txt

kwitsch added a commit that referenced this issue Sep 23, 2022
[pull] development from 0xERR0R:development
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants