Subdomain matching / regex support #12

dtjm · 2020-02-14T18:09:30Z

Hello,

It looks like the current implementation does exact matching on the domain name. I'm wondering if you would be open to a change that adds suffix matching, such that a blacklist entry of foo.com would also block *.foo.com, *.*.foo.com, etc.

I would probably implement it using a trie data structure to minimize the memory cost and lookups would probably be O(n) where n is the length of the search string.

This behavior could be configurable so that the default behavior remains the same unless this feature is enabled.

The text was updated successfully, but these errors were encountered:

0xERR0R · 2020-02-14T21:26:18Z

Blocky supports currently black and whitelists in hosts format. This format does not allow to define wildcards etc., only whole domain names. There are lot of lists with domain names for ads in this format. Disadvantage is: one record must exist for each subdomain.
There is also another common format of blacklists in the wild: adblock format. Lists in this format are more flexible, you can define subdomains and use wildcards.
Another option is a regex format, like pihole is using.
I think, it would be nice to have an option to block a domain with wildcard. Maybe a regex is more powerful than adblock or just wildcard.

anaschaudhary33 · 2021-09-15T16:51:47Z

I also want to see this feature in blocky.

0xERR0R · 2021-09-16T08:44:04Z

I see the need of this feature, but I'm not sure, what is the best approach to implement it:

Simple wildcard for subdomains: user can define a custom list entry with wildcard for subdoman, e.g. *.youtube.com. This would block only the subdomains like "www.youtube.com" and "m.youtube.com". The implementation is simple and the performance impact is very low. But this is not very flexible, for example you can't define ``youtube` to block "m.youtube.com" and "youtube-otherdomain.de".
Regex approach: user can define a regex entry like ^.*youtube.*$. Very flexible approach, but it needs more CPU resources for each request. It is necessary to introduce some "magic" character to distinguish between a "normal" host name and a reges (e.g. ^ or like AdguardHome /
Some fancy lists (adblock format)

Any ideas?

anaschaudhary33 · 2021-09-16T10:28:56Z

I think adblock format would be best. As it is much popular nowadays and give more flexibility.

0xERR0R · 2021-09-16T10:32:15Z

This is correct, but adblock format was designed for client side blocking (where you have the whole url). Here we have only the domain name, therefore it is only a sub set and you can't use all predefined lists

anaschaudhary33 · 2021-09-16T11:33:30Z

Yes. But we can use the manually defined list.
And for predefined lists we can rely on the host format.

I was previously using adguard home and it also works with Adblock rule.

shahbazkhan777 · 2021-09-16T11:43:53Z

I will vote in favour of regex format if you want manually defined blocklists. It would be more powerful than simple wildcard or adblock plus format.

0xERR0R · 2021-09-16T12:44:18Z

I tend to the solution with regex too, and I hope this will not have much performance impact (every request must be checked against all defined regex, but the number of regex should be negligible. I also prefer AdguardHome approach: each regex entry must be enclosed in "/", for example /^banners?[_.-]/

LexterS999 · 2021-09-21T01:31:20Z

So it's pihole regex style or adguard regex style?

0xERR0R · 2021-09-21T05:44:32Z

Hey, not sure about pihole, but adguard format works. For example: https://github.com/mmotti/adguard-home-filters/blob/master/regex.txt

[pull] development from 0xERR0R:development

0xERR0R added the 🔨 enhancement New feature or request label Feb 14, 2020

0xERR0R mentioned this issue Nov 8, 2020

Change cache data structure from slice to DAWG? #109

Closed

0xERR0R added this to the 0.15 milestone Apr 23, 2021

0xERR0R mentioned this issue Jul 12, 2021

wildcard blocklists #232

Closed

0xERR0R modified the milestones: 0.15, 0.16 Jul 27, 2021

0xERR0R changed the title ~~Subdomain matching~~ Subdomain matching / regex support Sep 18, 2021

0xERR0R added a commit that referenced this issue Sep 18, 2021

regex support for matching (#12)

0bf213f

0xERR0R mentioned this issue Sep 18, 2021

regex support for matching (#12) #283

Merged

0xERR0R closed this as completed in #283 Sep 18, 2021

0xERR0R added a commit that referenced this issue Sep 18, 2021

regex support for matching (#12)

e7ddab7

12425 mentioned this issue Jun 8, 2022

Automatic subdomain blocking #556

Closed

kwitsch added a commit that referenced this issue Sep 23, 2022

Merge pull request #12 from 0xERR0R/development

2bf0206

[pull] development from 0xERR0R:development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subdomain matching / regex support #12

Subdomain matching / regex support #12

dtjm commented Feb 14, 2020

0xERR0R commented Feb 14, 2020

anaschaudhary33 commented Sep 15, 2021

0xERR0R commented Sep 16, 2021

anaschaudhary33 commented Sep 16, 2021 •

edited

Loading

0xERR0R commented Sep 16, 2021

anaschaudhary33 commented Sep 16, 2021

shahbazkhan777 commented Sep 16, 2021

0xERR0R commented Sep 16, 2021

LexterS999 commented Sep 21, 2021

0xERR0R commented Sep 21, 2021

Subdomain matching / regex support #12

Subdomain matching / regex support #12

Comments

dtjm commented Feb 14, 2020

0xERR0R commented Feb 14, 2020

anaschaudhary33 commented Sep 15, 2021

0xERR0R commented Sep 16, 2021

anaschaudhary33 commented Sep 16, 2021 • edited Loading

0xERR0R commented Sep 16, 2021

anaschaudhary33 commented Sep 16, 2021

shahbazkhan777 commented Sep 16, 2021

0xERR0R commented Sep 16, 2021

LexterS999 commented Sep 21, 2021

0xERR0R commented Sep 21, 2021

anaschaudhary33 commented Sep 16, 2021 •

edited

Loading