Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with the 2020 Privacy chapter #1760

Closed
KenjiBaheux opened this issue Dec 14, 2020 · 10 comments · Fixed by #1935
Closed

Issue with the 2020 Privacy chapter #1760

KenjiBaheux opened this issue Dec 14, 2020 · 10 comments · Fixed by #1935
Assignees
Labels
question Further information is requested writing Related to wording and content
Milestone

Comments

@KenjiBaheux
Copy link

URL
https://almanac.httparchive.org/en/2020/privacy#fig-2

Describe the issue
Google Fonts is shown on the graph (google_fonts) but as far as I know Google Fonts does not track users.
See this page for a more detailed answer.

Expected behavior
Google Fonts (possibly other innocuous things) not being painted as a tracker.

The article uses whotracks.me's data about tackers which is based on the following criteria. These criteria don't seem to apply to Google Fonts.

Perhaps, there are other oversights like this one?

@KenjiBaheux KenjiBaheux added bug Something isn't working writing Related to wording and content labels Dec 14, 2020
@tunetheweb tunetheweb added the question Further information is requested label Dec 14, 2020
@tunetheweb tunetheweb added this to the 2020 Backlog milestone Dec 14, 2020
@tunetheweb
Copy link
Member

@KenjiBaheux , whotracks.me clearly lists Google Fonts, currently as the 5th biggest "tracker" under the CDN category. I guess because it collects anonymous use and because it is used across so many sites it has the potential to be a tracker. A quick search on their GitHub repo confirms that theory

@ydimova do you have anything further to add?

I do think however @ydimova we could be more transparent about our sources of data. I know you state you use WhoTracksMe's list at the beginning of the paragraph, but think we can do a few things here to further increase transparency:

  1. Directly link this source in the graph, like other chapters have done.
  2. Explain how the WhoTracksMe data was presented in this chapter - is it raw or was it combined with HttpArchive data in any way?
  3. Looks like the sql_file attribute is wrong for fig2 (points to cookies one) and should be removed.
  4. Should add an explanation to the trackers result sheet again explaining where this data came from and what manipulation (if any) was performed on the data and how othwers can get access to this data (e.g. link to WhoTracksMe GitHub repo)

Could you open a pull request for these changes to the chapter and update the data sheet for this chapter?

@yoavweiss
Copy link
Collaborator

Google Fonts requests are non-credentialed, so it cannot be used as a tracker. (anonymous use collection is, by definition, not tracking)

@tunetheweb
Copy link
Member

I think that's a little simplistic @yoavweiss but either way we present the data from WhoTracksMe, as stated in the chapter, so the issue would need to be raised there.

@yoavweiss
Copy link
Collaborator

because it is used across so many sites it has the potential to be a tracker. A quick search on their GitHub repo confirms that theory

That sounds like poor reasoning to classify something as a "tracker", rather than a "potential tracker" or some other category that clarifies that it's not actually tracking users.

@yoavweiss
Copy link
Collaborator

I think that's a little simplistic @yoavweiss but either way we present the data from WhoTracksMe, as stated in the chapter, so the issue would need to be raised there.

Alternatively, we could reconsider the use of this dubious source...

A very quick glance shows a few other non-tracker "trackers" on their lists:

They seem to generally consider any request that's not on the same host as "tracker", whether that's the case or not. As such, I'd like to question using that as "data".

@tunetheweb tunetheweb removed the bug Something isn't working label Dec 14, 2020
@tunetheweb
Copy link
Member

I think there's an argument to remove the cdn category from this data. Or at the very least least add an explanatory paragraph about that, or treat it separately. I can understand WhoTrackMe's desire to include it for completeness, but can also understand how the use of CDN does not necessarily imply tracking.

See the data here and it looks to cover all the services being questioned.

@ydimova what's your thoughts?

@rviscomi
Copy link
Member

This data source seems to be quite loose with their definition of "tracker". I agree that we should be more careful about painting hosts with "potential to track" as actual known trackers. Omitting the CDN category or manually checking each of the top hosts both sound like good resolutions. I'd like to hear @ydimova's thoughts as well.

@rviscomi
Copy link
Member

rviscomi commented Jan 3, 2021

📟 ping @ydimova

@ydimova
Copy link
Contributor

ydimova commented Jan 6, 2021

WhoTracksMe tries to limit the number of false positive trackers, however they do state in their academic paper that while all of the domains they consider as trackers, have tracking capabilities, not all of them are effectively used in a tracking context. (https://arxiv.org/abs/1804.08959)
I agree that we could exclude CDNs and maybe also the Essential category as explained here https://whotracks.me/blog/tracker_categories.html ?

@rviscomi
Copy link
Member

rviscomi commented Jan 8, 2021

SGTM thanks @ydimova. What are the next steps and owners to get the chapter updated?

I'm not sure if these are changes that require rerunning any queries or if we can filter out certain results in the sheets to regenerate charts. There may also be some rewriting of the chapter needed to account for the differences in the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested writing Related to wording and content
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants