-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with the 2020 Privacy chapter #1760
Comments
@KenjiBaheux , whotracks.me clearly lists Google Fonts, currently as the 5th biggest "tracker" under the CDN category. I guess because it collects anonymous use and because it is used across so many sites it has the potential to be a tracker. A quick search on their GitHub repo confirms that theory @ydimova do you have anything further to add? I do think however @ydimova we could be more transparent about our sources of data. I know you state you use WhoTracksMe's list at the beginning of the paragraph, but think we can do a few things here to further increase transparency:
Could you open a pull request for these changes to the chapter and update the data sheet for this chapter? |
Google Fonts requests are non-credentialed, so it cannot be used as a tracker. (anonymous use collection is, by definition, not tracking) |
I think that's a little simplistic @yoavweiss but either way we present the data from WhoTracksMe, as stated in the chapter, so the issue would need to be raised there. |
That sounds like poor reasoning to classify something as a "tracker", rather than a "potential tracker" or some other category that clarifies that it's not actually tracking users. |
Alternatively, we could reconsider the use of this dubious source... A very quick glance shows a few other non-tracker "trackers" on their lists: They seem to generally consider any request that's not on the same host as "tracker", whether that's the case or not. As such, I'd like to question using that as "data". |
I think there's an argument to remove the See the data here and it looks to cover all the services being questioned. @ydimova what's your thoughts? |
This data source seems to be quite loose with their definition of "tracker". I agree that we should be more careful about painting hosts with "potential to track" as actual known trackers. Omitting the CDN category or manually checking each of the top hosts both sound like good resolutions. I'd like to hear @ydimova's thoughts as well. |
📟 ping @ydimova |
WhoTracksMe tries to limit the number of false positive trackers, however they do state in their academic paper that while all of the domains they consider as trackers, have tracking capabilities, not all of them are effectively used in a tracking context. (https://arxiv.org/abs/1804.08959) |
SGTM thanks @ydimova. What are the next steps and owners to get the chapter updated? I'm not sure if these are changes that require rerunning any queries or if we can filter out certain results in the sheets to regenerate charts. There may also be some rewriting of the chapter needed to account for the differences in the results. |
URL
https://almanac.httparchive.org/en/2020/privacy#fig-2
Describe the issue
Google Fonts is shown on the graph (google_fonts) but as far as I know Google Fonts does not track users.
See this page for a more detailed answer.
Expected behavior
Google Fonts (possibly other innocuous things) not being painted as a tracker.
The article uses whotracks.me's data about tackers which is based on the following criteria. These criteria don't seem to apply to Google Fonts.
Perhaps, there are other oversights like this one?
The text was updated successfully, but these errors were encountered: