Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 5: Add All Byte and Request Count Queries #107

Merged
merged 10 commits into from
Aug 15, 2019
23 changes: 23 additions & 0 deletions sql/2019/05_Third_Parties/05_06.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#standardSQL
# Top 100 third party domains by request volume
SELECT
thirdPartyDomain,
COUNT(*) AS totalRequests,
SUM(requestBytes) AS totalBytes
FROM (
SELECT
SAFE_CAST(REGEXP_EXTRACT(payload, r'_bytesIn":(\d+)') AS INT64) AS requestBytes,
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
NET.HOST(url) AS requestDomain,
DomainsOver50Table.requestDomain as thirdPartyDomain
FROM
`httparchive.requests.2019_07_01_mobile`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also want to include desktop in your analysis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was anticipating doing everything on mobile for consistency because some metrics are only available on lighthouse dataset which is mobile.

LEFT JOIN
`lighthouse-infrastructure.third_party_web.2019_07_01_all_observed_domains` AS DomainsOver50Table
rviscomi marked this conversation as resolved.
Show resolved Hide resolved
ON NET.HOST(url) = DomainsOver50Table.requestDomain
)
ORDER BY
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
totalRequests DESC
GROUP BY
thirdPartyDomain
LIMIT 100

23 changes: 23 additions & 0 deletions sql/2019/05_Third_Parties/05_07.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#standardSQL
# Top 100 third party domains by total byte weight
SELECT
thirdPartyDomain,
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
COUNT(*) AS totalRequests,
SUM(requestBytes) AS totalBytes
FROM (
SELECT
SAFE_CAST(REGEXP_EXTRACT(payload, r'_bytesIn":(\d+)') AS INT64) AS requestBytes,
NET.HOST(url) AS requestDomain,
DomainsOver50Table.requestDomain as thirdPartyDomain
FROM
`httparchive.requests.2019_07_01_mobile`
LEFT JOIN
`lighthouse-infrastructure.third_party_web.2019_07_01_all_observed_domains` AS DomainsOver50Table
ON NET.HOST(url) = DomainsOver50Table.requestDomain
)
ORDER BY
totalBytes DESC
GROUP BY
thirdPartyDomain
LIMIT 100