Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-evaluate how syntax/software support is modeled #496

Closed
collinbarrett opened this issue Sep 19, 2018 · 12 comments
Closed

re-evaluate how syntax/software support is modeled #496

collinbarrett opened this issue Sep 19, 2018 · 12 comments
Labels
directory-data changes to basic FilterLists data feedback wanted provide your input

Comments

@collinbarrett
Copy link
Owner

collinbarrett commented Sep 19, 2018

This issue is to continue discussion from #493 with @hawkeye116477 , @DandelionSprout , and whoever else.

FilterLists is currently designed so that each FilterList is linked to a single Syntax. Each Syntax is supported by multiple Software.

However, we might need to expand the complexity of that model a bit more. Some ideas summarized below:

0. Keep on Keeping on, but Stricter:

Continue using the current model, but be very strict about saying that a Software supports a Syntax. The example that came up is that we think AdGuard supports nearly all of uBlock Origin Static, however it does not support Scriptlet Injection. In this case, we would say that AdGuard does not support uBlock Origin Static. Only Software that can support any kind of rule defined in a Syntax should be linked to that Syntax.

1. Most Precise:

The most precise solution would be to drop the Syntax model altogether in favor of each FilterList having its own set of Software that is known to support it. That incurs a lot of data maintenance overhead, and I am really hoping to keep Syntax around in some form to lighten the load of having to maintain all those individual incompatibilities.

2. Boolean Partial Syntax Support:

I proposed adding a boolean flag to each SoftwareSyntax that represents if the Syntax is fully supported by the Software (all rules except comments/metadata should be applied by the Software) or only partially supported (some of the rules are ignored by the Software, while some are applied). We could then indicate in the UI somehow whether the Software fully or partially supports the FilterList.

3. More Granular Syntaxes:

Another option could be to chop a Syntax like uBlock Origin Static into multiple Syntaxes. So, instead of having uBlock Origin Static, we would instead have: uBlock Origin Static Network Filtering, uBlock Origin Static Extended Filtering, uBlock Origin Scriptlet Injection, ... This solution would also require changing the relationship from a FilterList to a Syntax from one-to-many to many-to-many so that each FilterList can implement multiple Syntaxes.

Surfacing software support is one of the primary pieces of value that FilterLists provides, in my view, so getting this right is pretty key. Please provide your further thoughts/suggestions.

@collinbarrett collinbarrett added directory-data changes to basic FilterLists data feedback wanted provide your input labels Sep 19, 2018
@collinbarrett collinbarrett mentioned this issue Sep 19, 2018
@collinbarrett
Copy link
Owner Author

Related to #492

@DandelionSprout
Copy link
Contributor

I'm in favour of options №2 (Boolean) and №3 (Granularity), whichever one you two decide upon.

№1 (Most Precise) would in my eyes be a disaster in the making, since not only would I (and other pull makers) have to keep track on dozens of software tools for each list, leading to the time it takes me to put together a mass-addition pull being roughly doubled (if not tripled due to concentration exhaustion), but it also wouldn't properly account for new software tools that are occasionally discovered (Recent examples of which are FireHOL and Samsung Knox).

@collinbarrett
Copy link
Owner Author

Thanks, @DandelionSprout . I tend to agree with you. I also just added another option, Keep on Keeping on, but Stricter, to the OP.

@DandelionSprout
Copy link
Contributor

I'm not in favour of №0 (Keeping on but stricter), as it underestimates the syntax crossover potential that stems from uBO, Nano and AdGuard occasionally co-operating with each other to create industry standards. Sure, each of them have their own quirks, which can on rare occasions cause problems, but there's several discussion threads on record out there where they set out to agree on common standards (a notable instance of this being AdguardTeam/AdguardBrowserExtension#917).

It's better than №1, but my money remains on either №2 or №3.

@collinbarrett
Copy link
Owner Author

collinbarrett commented Sep 19, 2018

Option 3 could be partially automated away at some point, which would be cool. In the same vein as #202 , we could have FilterLists' SnapshotService analyze rules in lists, create a collection of sub/granular-Syntaxes that the list currently represents based on string patterns, and then update the Software support automatically. It wouldn't be easy to implement, and there may be corners (a certain pattern meaning different things in two different Syntaxes, for example) but could be a useful feature.

The biggest change Option 3 would entail now would be updating the model so that each list can support many Syntaxes (adding a FilterListSyntax many-to-many model). Then, as we go, we could expand the Syntax dataset with more granular Syntaxes.

@hawkeye116477
Copy link
Contributor

hawkeye116477 commented Sep 19, 2018

I think that option 3 is probably the best, but I also think that uBlock Origin Static Network Filtering, uBlock Origin Static Extended Filtering, uBlock Origin Scriptlet Injection isn't need, should suffice uBlock Origin Static (compatible with AdGuard) and uBlock Origin Static (not compatible with AdGuard), cuz filterlists can have all three and still be compatible. Hovewer there are some exceptions when filters combined with for example uBO scriptlet inject can cause problems for AdGuard or if list only uses scriptlet injection. Hovewer I see that most filters (which are on filterlists) are rather not created specifically for AdGuard, but only uBO compatibility is certain. So maybe „default" should be uBlock Origin Static (not compatible with AdGuard) and if it's compatible for AdGuard, then author of filterlists should make PR and change that to uBlock Origin Static (compatible with AdGuard).

@collinbarrett
Copy link
Owner Author

Would be curious if @gorhill or @ameshkov had any suggestions. If not, totally fine.

@ameshkov
Copy link

I also think that options 2 and 3 are better than the other two.

Option 3, however, requires quite a bit of work and cannot be done in a short time. At the same time, it will be really handy to filters maintainers and us developers, but it might be overly complicated and confusing for regular users.

Anyway, if you decide to proceed with option 3, please let me know, I'll find some time to help.

@collinbarrett
Copy link
Owner Author

I am in the process of making some data model changes to allow support for option 3 as well as a few other additional bits of data. @DandelionSprout , please hold off for a few days at least before making any data PRs as I'm manipulating the json. I'll ping you back here when the json has stabilized.

I'm also really pushing hard to get a solution for #372 going "soon".

@collinbarrett
Copy link
Owner Author

collinbarrett commented Aug 28, 2020

new data model is fairly well fleshed out here. each list can now have many syntaxes so we can be a bit more granular.

I still need to document the changes. I hope to deploy in the next week or so. I don't plan any major UI changes just yet other than to make sure nothing existing breaks with the new db/api launch.

it will also resolve:

  • support data model for multi-part lists #503 (each entry in FilterListViewUrl.json can have a property called "SegmentNumber". if a list is split across several "segments", this allows multiple "viewUrls" to reflect that.
  • update mirror links for ABPindo #1231 / support DNS addresses #696 there will be no limit on the number of mirrors the database can support. a new property call "primariness" in FilterListViewUrl.json indicates how primary each viewUrl is for the list. a value of "1" is the primary source. a secondary mirror has a value of "2". etc.
  • add homeUrlMirror or similar #697 a new property called "OnionUrl" gives each list the opportunity to have a tor url. (do we need more than one? do we need support for an OnionUrl for "viewUrl"s? not sure software supports subscribing to OnionUrls?)

once this new data model is deployed, I plan to shift some real effort towards #372 . it's long overdue. I hate that people (mostly @DandelionSprout ) want to help maintain this project but they have to fiddle with json files. it's very sub-optimal.

@collinbarrett
Copy link
Owner Author

@DandelionSprout PRs welcome again. the refactor is complete. more notes here

@collinbarrett
Copy link
Owner Author

Closing as each FilterList can now be associated with multiple Syntaxes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
directory-data changes to basic FilterLists data feedback wanted provide your input
Projects
None yet
Development

No branches or pull requests

4 participants