Query metrics: Chapter 5. Third Parties #86

rviscomi · 2019-07-23T18:46:34Z

Part	Chapter	Authors	Reviewers	Tracking Issue
I. Page Content	5. Third Parties	@patrickhulce	@simonhearne @flowlabs @jasti @zeman	#8

READ ME!

All of the metrics in the table below have been marked as Able To Query during the metrics triage. The analyst assigned to each metric is expected to write the corresponding query and submit a PR to have it reviewed and added to the repo.

In order to stay on schedule and have the data ready for authors, please have all metrics reviewed and merged by August 5.

Assignments

ID	Metric description	Analyst
05.01	Percentage of pages that include at least one third-party resource.	@patrickhulce
05.02	Percentage of pages that include at least one ad resource.	@patrickhulce
05.03	Percentage of requests that are third party requests broken down by third party category by resource type.	@patrickhulce
05.04	Percentage of total bytes that are from third party requests broken down by third party category by resource type.	@patrickhulce
05.05	Percentage of total script execution time that is from third party scripts broken down by third party category.	@patrickhulce
05.06	Top 100 third party domains by request volume	@patrickhulce
05.07	Top 100 third party domains by total byte weight	@patrickhulce
05.08	Top 100 third party domains by total script execution time	@patrickhulce
05.09	Top 100 third party requests by request volume	@patrickhulce
05.10	Top 100 third party requests by total script execution time	@patrickhulce
05.11	Percentile breakdown page-relative percentage of requests that are third party requests broken down by third party category.	@patrickhulce
05.12	Percentile breakdown page-relative percentage of total bytes that are from third party requests broken down by third party category.	@patrickhulce
05.13	Percentile breakdown page-relative percentage of total script execution time that is from third party scripts.	@patrickhulce

Checklist of metrics to be merged

The text was updated successfully, but these errors were encountered:

rviscomi · 2019-11-08T06:06:49Z

Hey @patrickhulce, I'm going through your chapter to create the data viz you requested, but some of the query results don't match up with the values you're writing about. For example:

Categories

If the ubiquity of third-party content is unsurprising, perhaps more interesting is the breakdown of third-party content by provider type.

While advertising might be the most user-visible example of third-party presence on the web, analytics providers are the most common third-party category with 76% of sites including at least one analytics request. CDNs at 63%, ads at 57%, and developer utilities like Sentry, Stripe, and Google Maps SDK at 56% follow up as a close second, third, and fourth for appearing on the most web properties. The popularity of these categories forms the foundation of our web usage patterns identified later in the chapter.

<insert graphic of metric 05_11>

Looking at the results of 05_11, I'm not seeing analytics with 76%. The median percentAnalyticsRequestsQuantiles is 2.91% for desktop and 2.82% for mobile. Were you looking at a different metric? Did you modify the query in some way not reflected by the results? FWIW I tweaked your query to only show the 10/25/50/75/90 percentiles as opposed to all 100, but the results are the same.

In your text you're mentioning the percent of sites having analytics (as opposed to requests), which sounds more accurate. But still, I don't know where you got that number for reference.

patrickhulce · 2019-11-08T14:09:31Z

FWIW I tweaked your query to only show the 10/25/50/75/90 percentiles as opposed to all 100, but the results are the same.

It will be a little more difficult to see the results I'm talking about with this change. I'm saying that "the most common third-party category with 76% of sites including at least one analytics request", not that analytics requests make up 76% of requests. If you look at the 25th percentile desktop you'll see analytics requests make up 0.93%, meaning they have at least 1, meaning 75% of pages have at least 1.

rviscomi · 2019-11-08T20:15:45Z

Sorry, what's special about 0.93% to indicate that there is at least 1 request? Is 1% == 1 request? Is there a more straightforward way to query this metric? Even if there isn't time to rewrite the query, how can we visualize the current 05_11 results to show what you're referring to?

patrickhulce · 2019-11-08T22:51:16Z

Well it's not possible to make fractional requests, so anything non-zero indicates that sites at that percentile made at least one request. My goal with the analysis was to point out interesting tidbits that weren't just regurgitating what could be obviously seen from a graph on first glance, but it sounds like this reached a little too far from obvious insights and I should have been a little more aligned with the pure results of the query?

Is there a more straightforward way to query this metric?

This is the query that was optimized to be the most flexible and can show the widest range of insights. I get that it makes cajoling the raw data into a visualization that matches what can be said about that data difficult though.

If we just want to match the analysis then we can basically throw out the quantiles and repeat the line for pages with a third party for each named category (https://github.com/HTTPArchive/almanac.httparchive.org/pull/107/files#diff-561fc8f885c05295633879f79753feffR6)

rviscomi · 2019-11-10T19:34:08Z

Reverted my query change. I'll close this out and open a new PR with any changes.

patrickhulce · 2019-11-10T23:12:28Z

Ok sounds good sorry for the trouble @rviscomi thanks very much for tackling those! Let me know if there's something specific I can knock out :) (will be flying starting at ~6pm PST today though, see ya soon!)

rviscomi · 2019-11-11T02:26:20Z

@patrickhulce just want to make sure you didn't misinterpret your own data. Here's how one of the data points in 05_11 is queried:

APPROX_QUANTILES(numberOfThirdPartyRequests / numberOfRequests, 100)

Each percentile is a percent of requests (count / total), not the number of requests. That's why I was asking about 1% != 1 request.

rviscomi added the analysis Querying the dataset label Jul 23, 2019

rviscomi added this to the Content written milestone Jul 23, 2019

rviscomi assigned patrickhulce Jul 23, 2019

rviscomi mentioned this issue Jul 23, 2019

Assign analysts to chapters #71

Closed

patrickhulce mentioned this issue Jul 29, 2019

Chapter 5: Add All Byte and Request Count Queries #107

Merged

patrickhulce mentioned this issue Aug 12, 2019

Chapter 5: Add first third party scripting query #119

Merged

patrickhulce closed this as completed Aug 20, 2019

This was referenced Sep 8, 2019

Write content: Chapter 5. Third Parties #134

Closed

Write content: Chapter 6. Fonts #143

Closed

Write content: Chapter 12. Mobile Web #147

Closed

Write content: Chapter 13. Ecommerce #135

Closed

Write content: Chapter 15. Compression #145

Closed

rviscomi reopened this Nov 8, 2019

rviscomi mentioned this issue Nov 8, 2019

Data visualizations #367

Merged

rviscomi closed this as completed Nov 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query metrics: Chapter 5. Third Parties #86

Query metrics: Chapter 5. Third Parties #86

rviscomi commented Jul 23, 2019 •

edited by patrickhulce

Loading

rviscomi commented Nov 8, 2019 •

edited

Loading

Categories

patrickhulce commented Nov 8, 2019 •

edited

Loading

rviscomi commented Nov 8, 2019 •

edited

Loading

patrickhulce commented Nov 8, 2019

rviscomi commented Nov 10, 2019

patrickhulce commented Nov 10, 2019

rviscomi commented Nov 11, 2019

Query metrics: Chapter 5. Third Parties #86

Query metrics: Chapter 5. Third Parties #86

Comments

rviscomi commented Jul 23, 2019 • edited by patrickhulce Loading

READ ME!

Assignments

Checklist of metrics to be merged

rviscomi commented Nov 8, 2019 • edited Loading

Categories

patrickhulce commented Nov 8, 2019 • edited Loading

rviscomi commented Nov 8, 2019 • edited Loading

patrickhulce commented Nov 8, 2019

rviscomi commented Nov 10, 2019

patrickhulce commented Nov 10, 2019

rviscomi commented Nov 11, 2019

rviscomi commented Jul 23, 2019 •

edited by patrickhulce

Loading

rviscomi commented Nov 8, 2019 •

edited

Loading

patrickhulce commented Nov 8, 2019 •

edited

Loading

rviscomi commented Nov 8, 2019 •

edited

Loading