Skip to content
This repository has been archived by the owner on Nov 6, 2023. It is now read-only.

Include rulesets with different platforms in duplicate checks #11440

Closed
Bisaloo opened this issue Jul 27, 2017 · 10 comments
Closed

Include rulesets with different platforms in duplicate checks #11440

Bisaloo opened this issue Jul 27, 2017 · 10 comments

Comments

@Bisaloo
Copy link
Collaborator

Bisaloo commented Jul 27, 2017

I think we should review the way tests are currently working to check duplicates.

At the moment, tests will pass even if a target is duplicated in more than one ruleset as long as it is not in more than one ruleset without platform. For example, the domain in this PR (#11438) was not in duplicate-whitelist.txt.

But this can cause problems. In this case, it was pretty obvious that there is mixed content so the ruleset without platform shouldn't have made it to the repo.

But we can think of more tricky cases, where the mixedcontent is not on the landing page but on deeplinks. In that case, User 1 creates a ruleset without platform="mixedcontent". User 2 sees that the website can be secured but doesn't realize there is already a rule for it (platform="mixedcontent" rules are disabled by default for most users). But User 2 misses the fact that this website has mixed content and creates a ruleset without platform. We are now breaking a website when we could have easily avoided it.

I propose that all duplicates require to be in duplicate-whitelist.txt, even if one of the rulesets they are in has a platform. I know that in some cases, some contributors will secure some resources in a ruleset with platform and the rest in platform="mixedcontent" but adding such domains in the whitelist is trivial and that's what the whitelist is for anyways (not to mention the fact that to my knowledge, no one is doing that anymore. MB used to do it but they were the only one).

@Bisaloo
Copy link
Collaborator Author

Bisaloo commented Jul 27, 2017

To answer @J0WI's comment in #11438 (comment):

Do you have any idea how many rules would be affected by this change?

As far as I can tell, not many. I don't know how the current duplicate-whitelist.txt was generated but it already includes some of the cases I am describing (1 ruleset without platform + 1 ruleset with platform="mixedcontent" such as https://github.com/EFForg/https-everywhere/pull/10910/files) but apparently not all of them.

egrep '<target\s+host=\s*"' *.xml | cut -d '"' -f 2 | sort | uniq -c | grep -v '1 ' | wc -l

returns 230 candidates (with some possible false positives (commented targets)).

The current duplicate-whitelist.txt in master has

wc -l utils/duplicate-whitelist.txt

153 targets.

@cschanaj
Copy link
Collaborator

cschanaj commented Jul 27, 2017

@Bisaloo there is 224 duplicated target in the code base, found using find-duplicate-targets.go.

See also duplicate-whitelist-cschanaj.txt

Updated: 2018.01.24

@Bisaloo
Copy link
Collaborator Author

Bisaloo commented Jul 27, 2017

Thanks @cschanaj! If we want to create the new whitelist using your script, it may be useful to group targets by domain.

@cschanaj
Copy link
Collaborator

cschanaj commented Jul 27, 2017

@Bisaloo updated the file accordingly, please check. thanks!

P.S. I guess utils/trivial-validate.py#L189 is the cause of this issue.

@Bisaloo Bisaloo changed the title Include rulesets with platforms in duplicate checks Include rulesets with different platforms in duplicate checks Jul 27, 2017
@Bisaloo
Copy link
Collaborator Author

Bisaloo commented Jul 27, 2017

P.S. I guess utils/trivial-validate.py#L189 is the cause of this issue.

Yes, and this is intended behaviour too. I am arguing for a change of policy concerning the whitelist. This is not a bug (if we forget the fact that some targets are in duplicate-whitelist.txt when they shouldn't because their rulesets have different platform).

@cschanaj
Copy link
Collaborator

cschanaj commented Jul 27, 2017

I agree with the proposed changes of policy since the platform are mutually exclusive to each other (we have get ride of all platform="cacert mixedcontent" recently). IMHO, there is little reason to allow duplicated target in different platform.

FYI, duplicate-target-platforms.csv shows the platform of the duplicated target. It seems that a number of them can be removed under the current policy...

See also utils/relaxng.xml#L15

@cschanaj
Copy link
Collaborator

See #11484

@Bisaloo
Copy link
Collaborator Author

Bisaloo commented Feb 6, 2018

Given there are no objections, I would like to go forward with this and propose a PR to implement this change.

@cschanaj, since you have done most of the work already, I think it's fair you submit this PR but if you prefer, I can do it myself.

@cschanaj
Copy link
Collaborator

cschanaj commented Feb 6, 2018

@Bisaloo I have limited availability at the moment, feel free to submit a PR yourself 😄

@Bisaloo
Copy link
Collaborator Author

Bisaloo commented Mar 17, 2018

Closed by #14861

@Bisaloo Bisaloo closed this as completed Mar 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants