Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 2020 Privacy chapter for CDNs and Hosting categories #1935

Merged
merged 3 commits into from
Feb 5, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 22 additions & 12 deletions src/content/en/2020/privacy.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ We examine the prominence of the most common types of [third-party](./third-part

### Third-party trackers

We use [WhoTracksMe](https://whotracks.me/)'s tracker list to determine the percentage of websites that issue a request to a tracker. As shown in the following figure, we have found that at least one tracker is present on roughly 93% of websites.
We use [WhoTracksMe](https://whotracks.me/)'s tracker list to determine the percentage of websites that issue a request to a potential tracker. As shown in the following figure, we have found that at least one potential tracker is present on roughly 93% of websites.

{{ figure_markup(
image="privacy-websites-that-load-trackers.png",
caption="Websites including at least one tracker",
caption="Websites including at least one potential tracker",
description="Bar chart showing that 92.94% of desktop websites and 92.97% of mobile websites load trackers.",
chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQJMtHv0Y1JcQJkcVyrqBk9dsujZeDtOZEv7uvE0xM2VrQSuTUDFya41TeRlTZDDe2rWmHwDghW3Dev/pubchart?oid=1325818112&format=interactive",
sheets_gid="1591448294"
Expand All @@ -45,29 +45,39 @@ We use [WhoTracksMe](https://whotracks.me/)'s tracker list to determine the perc
We examined the most widely used trackers and plot the prevalence of the 10 most popular ones.

{{ figure_markup(
image="privacy-biggest-third-party-trackers.png",
caption="Top 10 Trackers",
description="Bar chart showing the prevalence of the 10 most popular trackers used on mobile and desktop clients. There is little difference between desktop and mobile and mobile has 65.9% for google_analytics, 65.5% for googleapis.com, 63.3% for gstatic, 58.3% for google_fonts, 50.0% for doubleclick, 47.6% for google, 42.4% for google_tag_manage, 30.9% for facebook, 19.2% for google_adservices, and 13.1% for cloudflare.",
image="privacy-biggest-third-party-potential-trackers.png",
caption="Top 10 Potential Trackers",
description="Bar chart showing the prevalence of the 10 most popular potential trackers used on mobile and desktop clients. There is little difference between desktop and mobile and mobile has 65.5% for google_analytics, 65.9% for googleapis.com, 63.3% for gstatic, 58.3% for google_fonts, 50.0% for doubleclick, 47.6% for google, 42.4% for google_tag_manager, 30.9% for facebook, 19.2% for google_adservices, and 12.7% for cloudflare.",
chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQJMtHv0Y1JcQJkcVyrqBk9dsujZeDtOZEv7uvE0xM2VrQSuTUDFya41TeRlTZDDe2rWmHwDghW3Dev/pubchart?oid=850649042&format=interactive",
sheets_gid="1677398038",
sql_file="top100_cookies_set_from_header.sql"
sheets_gid="1677398038"
)
}}

The largest player on the online tracking market is without doubt Google, with eight of its tracking domains present in the top 10 trackers and prevalent on at least 70% of websites. They are followed are Facebook and Cloudflare–though the latter is probably more reflective of the popularity of them as a hosting site.
The largest player on the online tracking market is without doubt Google, with eight of its domains present in the top 10 potential trackers and prevalent on at least 70% of websites. They are followed by Facebook and Cloudflare–though the latter is probably more reflective of the popularity of them as a hosting site.

WhoTracksMe's tracker list also defines categories that the trackers belong to. If we remove CDNs and Hosting sites from our statistics, under the assumption they may not track—or at least that that is not their primary function—then you get a slightly different view of the top 10.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"under the assumption that they may not track" sounds a bit weaker than what I'd consider ideal. We are talking about CDN domains that are often cookieless. Might be interesting to scan HA to see if those CDN domains have cookies set on them, and if they don't, clarify that they are not tracking today, but are "potential trackers" as they have the power to start tracking in the future (which IIRC is the reasoning).

Copy link
Member Author

@tunetheweb tunetheweb Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand your concerns @yoavweiss but I would still prefer to err on the side of caution here given the chapters topic and the power these entities could wield in this space. I think your suggested proposal to scan HA would be limited in nature, and many do set cookies for LoadBalancing or WAF reasons. Plus cookies are far from the only way of tracking (particularly for hosting providers with access to IP addresses and the like).

My initial thought was to include a link to Google Fonts FAQ about this as an example with an explicit comment like "and some of these providers have statements they do not track"" but on re-reading that, I'm not sure that's what it really says so I find that a little weaker, so thought more confusing to include, hence went with above. If that FAQ or privacy policy was stronger in this regards, I think we could be stronger too.

I've tried to be present an independent and balanced view here, and certainly think it's an improvement on just including them as trackers without comment - but it's gonna be difficult to make everyone happy!

@ydimova @KenjiBaheux what's your view here? Guessing you'll both be on either side of this argument! 🙂

Copy link
Collaborator

@yoavweiss yoavweiss Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your suggested proposal to scan HA would be limited in nature, and many do set cookies for LoadBalancing or WAF reason.

Sure. But if they don't set cookies, that's a strong indication.

Plus cookies are far from the only way of tracking (particularly for hosting providers with access to IP addresses and the like)

That's fair. That would've been a different story if e.g. the relevant snippets included a referrerPolicy=no-referrer attribute, but that's not typically the case.

I've tried to be present an independent and balanced view here, and certainly think it's an improvement on just including them as trackers without comment

Agree that it's a significant improvement. Just think that it can be improved further... :)


{{ figure_markup(
image="privacy-biggest-third-party-trackers.png",
caption="Top 10 Trackers",
description="Bar chart showing the prevalence of the 10 most popular trackers used on mobile and desktop clients. There is little difference between desktop and mobile and mobile has 65.5% for google_analytics, 50.0% for doubleclick, 47.6% for google, 42.4% for google_tag_manager, 30.9% for facebook, 19.2% for google_adservices, 12.7% for youtube, 19.2% for google_syndication, and 6.5% for both twitter and wordpress_stats.",
chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQJMtHv0Y1JcQJkcVyrqBk9dsujZeDtOZEv7uvE0xM2VrQSuTUDFya41TeRlTZDDe2rWmHwDghW3Dev/pubchart?oid=1831606887&format=interactive",
sheets_gid="1677398038"
)
}}

WhoTracksMe's tracker list also defines categories that the trackers belong to. The following figure shows the distribution of the different categories for the 100 largest trackers.
Here Google still makes up seven out of the top 10 domains. The following figure shows the distribution of the different categories for the 100 largest potential trackers by category.

{{ figure_markup(
image="privacy-tracker-categories.png",
caption="Categories of the 100 most popular trackers",
description="Bar chart showing distribution of the top 100 trackers on the web with 56 for advertising, 11 for cdn, 9 for site_analytics, 6 for both social media and misc, 3 for both essential and customer_help, 2 for both audio and video and 1 for both comments and undefined.",
caption="Categories of the 100 most popular potential trackers",
description="Bar chart showing distribution of the top 100 potential trackers on the web with 56 for advertising, 11 for cdn, 9 for site_analytics, 6 for both social media and misc, 3 for both essential and customer_help, 2 for both audio and video and 1 for both comments and undefined.",
chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQJMtHv0Y1JcQJkcVyrqBk9dsujZeDtOZEv7uvE0xM2VrQSuTUDFya41TeRlTZDDe2rWmHwDghW3Dev/pubchart?oid=1117413918&format=interactive",
sheets_gid="1431872451",
)
}}

Nearly 60% of the most popular trackers are advertising-related. This could be due to the profitability of the online advertising market.
Nearly 60% of the most popular trackers are advertising-related. This could be due to the profitability of the online advertising market being perceived to be related to the amount of tracking.

### Cookies

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/static/images/2020/privacy/privacy-tracker-categories.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.