Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching 2020 queries #1318

Merged
merged 64 commits into from
Nov 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
d4acde6
Update README.md
raghuramakrishnan71 Sep 26, 2020
0efc322
Initial file
raghuramakrishnan71 Sep 27, 2020
8ef0da6
To pass linting checks start with heading.
raghuramakrishnan71 Sep 29, 2020
1bdafcf
Remove query numbers in filename.
raghuramakrishnan71 Sep 29, 2020
336abdd
Added file with query number.
raghuramakrishnan71 Sep 29, 2020
7302bf4
TTL by resource and third-party
raghuramakrishnan71 Sep 29, 2020
1b7facb
Resources served without cache option.
raghuramakrishnan71 Sep 29, 2020
6064aea
Third party resources served without using cache.
raghuramakrishnan71 Sep 29, 2020
ebb1dac
Seq. of GROUP BY and ORDER BY made similar.
raghuramakrishnan71 Oct 18, 2020
c2e5c50
Seq. of GROUP BY and ORDER BY made similar, date changed.
raghuramakrishnan71 Oct 18, 2020
dd00233
Seq. of GROUP BY and ORDER BY made similar.
raghuramakrishnan71 Oct 18, 2020
2f2e44c
Seq. of GROUP BY and ORDER BY made similar.
raghuramakrishnan71 Oct 18, 2020
836ab44
Initial version of query.
raghuramakrishnan71 Oct 18, 2020
695f01e
Initial version of query.
raghuramakrishnan71 Oct 18, 2020
f2799b3
Initial version of query.
raghuramakrishnan71 Oct 18, 2020
4c6c6e3
Initial version of query.
raghuramakrishnan71 Oct 18, 2020
8962e49
Seq. of fields in SELECT, GROUP, and ORDER made same.
raghuramakrishnan71 Oct 18, 2020
bd04bed
Seq. of fields in SELECT, GROUP, and ORDER made same.
raghuramakrishnan71 Oct 18, 2020
b089dd8
Date corrected.
raghuramakrishnan71 Oct 18, 2020
12a01ca
Removed blank line.
raghuramakrishnan71 Oct 18, 2020
b8c6d9f
Seq. of SELECT, GROUP, and ORDER BY made similar.
raghuramakrishnan71 Oct 18, 2020
112b11a
Mapped to writeup, fixed review comments.
raghuramakrishnan71 Oct 27, 2020
6e3316a
Mapped to writeup.
raghuramakrishnan71 Oct 27, 2020
63ea979
Fixed naming inconsistencies.
raghuramakrishnan71 Oct 27, 2020
bdd6f64
Fixed naming convention.
raghuramakrishnan71 Oct 28, 2020
07a903d
Removed * 100.
raghuramakrishnan71 Oct 30, 2020
de44e99
Removed * 100.
raghuramakrishnan71 Oct 30, 2020
275fd3f
Initial version, mapped to chapter.
raghuramakrishnan71 Oct 31, 2020
91a8e03
Initial version, mapped to chapter.
raghuramakrishnan71 Oct 31, 2020
65b3082
Initial version, mapped to chapter.
raghuramakrishnan71 Nov 1, 2020
53accaa
Fixed naming inconsistency.
raghuramakrishnan71 Nov 1, 2020
a6a4f1e
No longer used, as per chapter.
raghuramakrishnan71 Nov 1, 2020
2505ab7
Initial version, mapped to chapter.
raghuramakrishnan71 Nov 2, 2020
d58e26a
Updated to report invalid count and pct.
raghuramakrishnan71 Nov 2, 2020
53c124b
Removed, replaced by invalid_last_modified_and_expires_and_date.sql
raghuramakrishnan71 Nov 2, 2020
cd33565
Removed, not used.
raghuramakrishnan71 Nov 2, 2020
92e673d
Mapped query to chapter.
raghuramakrishnan71 Nov 4, 2020
99bab56
Mapped query to chapter.
raghuramakrishnan71 Nov 4, 2020
ab406eb
Initial query.
raghuramakrishnan71 Nov 4, 2020
18a373d
Duplicate query.
raghuramakrishnan71 Nov 4, 2020
9a1a2aa
Mapped query to chapter.
raghuramakrishnan71 Nov 4, 2020
9113f45
Not needed in context of the chapter.
raghuramakrishnan71 Nov 4, 2020
de74fcd
Avoid blank resp_cache_control.
raghuramakrishnan71 Nov 4, 2020
85d3975
Added no-store statistics.
raghuramakrishnan71 Nov 4, 2020
c4dd037
Added cacheable %.
raghuramakrishnan71 Nov 4, 2020
3f135ac
Initial version.
raghuramakrishnan71 Nov 6, 2020
f491f90
Will add a new version based on Chapter.
raghuramakrishnan71 Nov 6, 2020
425aaa3
Will add a new version based on Chapter.
raghuramakrishnan71 Nov 6, 2020
cc1b343
Will add a revised version as per Chapter metrics.
raghuramakrishnan71 Nov 6, 2020
88861c6
third party query not used.
raghuramakrishnan71 Nov 6, 2020
d9bf1f3
Initial version.
raghuramakrishnan71 Nov 6, 2020
e211fe9
Initial version.
raghuramakrishnan71 Nov 7, 2020
7f0809e
Denominator in percent corrected.
raghuramakrishnan71 Nov 7, 2020
ae835ef
Added more metrics.
raghuramakrishnan71 Nov 7, 2020
496612e
Initial version
raghuramakrishnan71 Nov 7, 2020
c88b448
Renamed.
raghuramakrishnan71 Nov 7, 2020
c0261db
Initial version.
raghuramakrishnan71 Nov 9, 2020
241982f
Renamed file.
raghuramakrishnan71 Nov 10, 2020
147e7a6
Renamed file.
raghuramakrishnan71 Nov 10, 2020
5037d14
Renamed file.
raghuramakrishnan71 Nov 10, 2020
420bf7f
Updated fetching of http_type.
raghuramakrishnan71 Nov 11, 2020
b0379ac
Corrected.
raghuramakrishnan71 Nov 11, 2020
aff4757
Modified ORDER BY.
raghuramakrishnan71 Nov 11, 2020
b12c978
Initial version.
raghuramakrishnan71 Nov 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sql/2020/20_Caching/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Caching Queries
17 changes: 17 additions & 0 deletions sql/2020/20_Caching/appcache_and_serviceworkers.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#standardSQL
# Use of AppCache and ServiceWorkers
SELECT
IF(STARTS_WITH(url, 'https'), 'https', 'http') AS http_type,
JSON_EXTRACT_SCALAR(report, "$.audits.appcache-manifest.score") AS using_appcache,
JSON_EXTRACT_SCALAR(report, "$.audits.service-worker.score") AS using_serviceworkers,
COUNT(0) AS occurrences,
SUM(COUNT(0)) OVER () AS total,
COUNT(0) / SUM(COUNT(0)) OVER () AS pct
FROM
`httparchive.lighthouse.2020_08_01_mobile`
GROUP BY
http_type,
using_appcache,
using_serviceworkers
ORDER BY
pct DESC
32 changes: 32 additions & 0 deletions sql/2020/20_Caching/cache_control_and_max_age_and_expires.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#standardSQL
# Use of Cache-Control, max-age in Cache-Control, and Expires
SELECT
client,
COUNT(0) AS total_requests,
COUNTIF(uses_cache_control) AS total_using_cache_control,
COUNTIF(uses_max_age) AS total_using_max_age,
COUNTIF(uses_expires) AS total_using_expires,
COUNTIF(uses_max_age AND uses_expires) AS total_using_max_age_and_expires,
COUNTIF(uses_cache_control AND uses_expires) AS total_using_both,
COUNTIF(NOT uses_cache_control AND NOT uses_expires) AS total_using_neither,
COUNTIF(uses_cache_control AND NOT uses_expires) AS total_using_only_cache_control,
COUNTIF(NOT uses_cache_control AND uses_expires) AS total_using_only_expires,
COUNTIF(uses_cache_control) / COUNT(0) AS pct_cache_control,
COUNTIF(uses_max_age) / COUNT(0) AS pct_using_max_age,
COUNTIF(uses_expires) / COUNT(0) AS pct_using_expires,
COUNTIF(uses_max_age AND uses_expires) / COUNT(0) AS pct_using_max_age_and_expires,
COUNTIF(uses_cache_control AND uses_expires) / COUNT(0) AS pct_using_both,
COUNTIF(NOT uses_cache_control AND NOT uses_expires) / COUNT(0) AS pct_using_neither,
COUNTIF(uses_cache_control AND NOT uses_expires) / COUNT(0) AS pct_using_only_cache_control,
COUNTIF(NOT uses_cache_control AND uses_expires) / COUNT(0) AS pct_using_only_expires
FROM (
SELECT
_TABLE_SUFFIX AS client,
TRIM(resp_expires) != "" AS uses_expires,
TRIM(resp_cache_control) != "" AS uses_cache_control,
REGEXP_CONTAINS(resp_cache_control, r'(?i)max-age\s*=\s*[0-9]+') AS uses_max_age
FROM
`httparchive.summary_requests.2020_08_01_*`
)
GROUP BY
client
72 changes: 72 additions & 0 deletions sql/2020/20_Caching/cache_control_directives.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#standardSQL
# Use of Cache-Control directives
SELECT
client,
COUNT(0) AS total_requests,
COUNTIF(uses_cache_control) AS total_using_cache_control,
COUNTIF(uses_max_age) AS total_using_max_age,
COUNTIF(uses_no_cache) AS total_using_no_cache,
COUNTIF(uses_public) AS total_using_public,
COUNTIF(uses_must_revalidate) AS total_using_must_revalidate,
COUNTIF(uses_no_store) AS total_using_no_store,
COUNTIF(uses_private) AS total_using_private,
COUNTIF(uses_proxy_revalidate) AS total_using_proxy_revalidate,
COUNTIF(uses_s_maxage) AS total_using_s_maxage,
COUNTIF(uses_no_transform) AS total_using_no_transform,
COUNTIF(uses_immutable) AS total_using_immutable,
COUNTIF(uses_stale_while_revalidate) AS total_using_stale_while_revalidate,
COUNTIF(uses_stale_if_error) AS total_using_stale_if_error,
COUNTIF(uses_no_store AND uses_no_cache AND uses_max_age_zero) AS total_using_no_store_and_no_cache_and_max_age_zero,
COUNTIF(uses_no_store AND uses_no_cache AND NOT uses_max_age_zero) AS total_using_no_store_and_no_cache_only,
COUNTIF(uses_no_store AND NOT uses_no_cache AND NOT uses_max_age_zero) AS total_using_no_store_only,
COUNTIF(uses_max_age_zero AND NOT uses_no_store) AS total_using_max_age_zero_without_no_store,
COUNTIF(uses_pre_check_zero AND uses_post_check_zero) AS total_using_pre_check_zero_and_post_check_zero,
COUNTIF(uses_pre_check_zero) AS total_using_pre_check_zero,
COUNTIF(uses_post_check_zero) AS total_using_post_check_zero,
COUNTIF(uses_cache_control AND NOT uses_max_age AND NOT uses_no_cache AND NOT uses_public AND NOT uses_must_revalidate AND NOT uses_no_store AND NOT uses_private AND NOT uses_proxy_revalidate AND NOT uses_s_maxage AND NOT uses_no_transform AND NOT uses_immutable AND NOT uses_stale_while_revalidate AND NOT uses_stale_if_error AND NOT uses_pre_check_zero AND NOT uses_post_check_zero) AS total_erroneous_directives,
COUNTIF(uses_cache_control) / COUNT(0) AS pct_using_cache_control,
COUNTIF(uses_max_age) / COUNT(0) AS pct_using_max_age,
COUNTIF(uses_no_cache) / COUNT(0) AS pct_using_no_cache,
COUNTIF(uses_public) / COUNT(0) AS pct_using_public,
COUNTIF(uses_must_revalidate) / COUNT(0) AS pct_using_must_revalidate,
COUNTIF(uses_no_store) / COUNT(0) AS pct_using_no_store,
COUNTIF(uses_private) / COUNT(0) AS pct_using_private,
COUNTIF(uses_proxy_revalidate) / COUNT(0) AS pct_using_proxy_revalidate,
COUNTIF(uses_s_maxage) / COUNT(0) AS pct_using_s_maxage,
COUNTIF(uses_no_transform) / COUNT(0) AS pct_using_no_transform,
COUNTIF(uses_immutable) / COUNT(0) AS pct_using_immutable,
COUNTIF(uses_stale_while_revalidate) / COUNT(0) AS pct_using_stale_while_revalidate,
COUNTIF(uses_stale_if_error) / COUNT(0) AS pct_using_stale_if_error,
COUNTIF(uses_no_store AND uses_no_cache AND uses_max_age_zero) / COUNT(0) AS pct_using_no_store_and_no_cache_and_max_age_zero,
COUNTIF(uses_no_store AND uses_no_cache AND NOT uses_max_age_zero) / COUNT(0) AS pct_using_no_store_and_no_cache_only,
COUNTIF(uses_no_store AND NOT uses_no_cache AND NOT uses_max_age_zero) / COUNT(0) AS pct_using_no_store_only,
COUNTIF(uses_max_age_zero AND NOT uses_no_store) / COUNT(0) AS pct_using_max_age_zero_without_no_store,
COUNTIF(uses_pre_check_zero AND uses_post_check_zero) / COUNT(0) AS pct_using_pre_check_zero_and_post_check_zero,
COUNTIF(uses_pre_check_zero) / COUNT(0) AS pct_using_pre_check_zero,
COUNTIF(uses_post_check_zero) / COUNT(0) AS pct_using_post_check_zero,
COUNTIF(uses_cache_control AND NOT uses_max_age AND NOT uses_no_cache AND NOT uses_public AND NOT uses_must_revalidate AND NOT uses_no_store AND NOT uses_private AND NOT uses_proxy_revalidate AND NOT uses_s_maxage AND NOT uses_no_transform AND NOT uses_immutable AND NOT uses_stale_while_revalidate AND NOT uses_stale_if_error AND NOT uses_pre_check_zero AND NOT uses_post_check_zero) / COUNT(0) AS pct_erroneous_directives
FROM (
SELECT
_TABLE_SUFFIX AS client,
TRIM(resp_cache_control) != "" AS uses_cache_control,
REGEXP_CONTAINS(resp_cache_control, r'(?i)max-age\s*=\s*[0-9]+') AS uses_max_age,
REGEXP_CONTAINS(resp_cache_control, r'(?i)max-age\s*=\s*0') AS uses_max_age_zero,
REGEXP_CONTAINS(resp_cache_control, r'(?i)public') AS uses_public,
REGEXP_CONTAINS(resp_cache_control, r'(?i)no-cache') AS uses_no_cache,
REGEXP_CONTAINS(resp_cache_control, r'(?i)must-revalidate') AS uses_must_revalidate,
REGEXP_CONTAINS(resp_cache_control, r'(?i)no-store') AS uses_no_store,
REGEXP_CONTAINS(resp_cache_control, r'(?i)private') AS uses_private,
REGEXP_CONTAINS(resp_cache_control, r'(?i)proxy-revalidate') AS uses_proxy_revalidate,
REGEXP_CONTAINS(resp_cache_control, r'(?i)s-maxage\s*=\s*[0-9]+') AS uses_s_maxage,
REGEXP_CONTAINS(resp_cache_control, r'(?i)no-transform') AS uses_no_transform,
REGEXP_CONTAINS(resp_cache_control, r'(?i)immutable') AS uses_immutable,
REGEXP_CONTAINS(resp_cache_control, r'(?i)stale-while-revalidate\s*=\s*[0-9]+') AS uses_stale_while_revalidate,
REGEXP_CONTAINS(resp_cache_control, r'(?i)stale-if-error\s*=\s*[0-9]+') AS uses_stale_if_error,
REGEXP_CONTAINS(resp_cache_control, r'(?i)pre-check\s*=\s*0') AS uses_pre_check_zero,
REGEXP_CONTAINS(resp_cache_control, r'(?i)post-check\s*=\s*0') AS uses_post_check_zero
FROM
`httparchive.summary_requests.2020_08_01_*`
)
GROUP BY
client

35 changes: 35 additions & 0 deletions sql/2020/20_Caching/cache_ttl_and_content_age_diff.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#standardSQL
# Difference between Cache TTL and the contents age
CREATE TEMPORARY FUNCTION toTimestamp(date_string STRING)
RETURNS INT64 LANGUAGE js AS '''
try {
var timestamp = Math.round(new Date(date_string).getTime() / 1000);
return isNaN(timestamp) ? -1 : timestamp;
} catch (e) {
return -1;
}
''';

SELECT
client,
percentile,
APPROX_QUANTILES(diff_in_days, 1000 IGNORE NULLS)[OFFSET(percentile * 10)] AS diff_in_days
FROM
(
SELECT
_TABLE_SUFFIX AS client,
ROUND((expAge - (startedDateTime - toTimestamp(resp_last_modified))) / 86400, 2) AS diff_in_days
FROM
`httparchive.summary_requests.2020_08_01_*`
WHERE
resp_last_modified <> "" AND
expAge > 0
),
UNNEST([10, 25, 50, 75, 90]) AS percentile
GROUP BY
client,
percentile
ORDER BY
client,
percentile

30 changes: 30 additions & 0 deletions sql/2020/20_Caching/content_age_older_than_ttl.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#standardSQL
# Requests with a content age older than its TTL
CREATE TEMPORARY FUNCTION toTimestamp(date_string STRING)
RETURNS INT64 LANGUAGE js AS '''
try {
var timestamp = Math.round(new Date(date_string).getTime() / 1000);
return isNaN(timestamp) ? -1 : timestamp;
} catch (e) {
return -1;
}
''';

SELECT
client,
COUNT(0) AS total_req,
COUNTIF(diff < 0) AS req_too_short_cache,
COUNTIF(diff < 0) / COUNT(0) AS perc_req_too_short_cache
FROM
(
SELECT
_TABLE_SUFFIX AS client,
expAge - (startedDateTime - toTimestamp(resp_last_modified)) AS diff
FROM
`httparchive.summary_requests.2020_08_01_*`
WHERE
resp_last_modified <> "" AND
expAge > 0
)
GROUP BY
client
54 changes: 54 additions & 0 deletions sql/2020/20_Caching/content_age_older_than_ttl_by_party.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#standardSQL
# Difference between Cache TTL and the content age for third party request
CREATE TEMPORARY FUNCTION toTimestamp(date_string STRING)
RETURNS INT64 LANGUAGE js AS '''
try {
var timestamp = Math.round(new Date(date_string).getTime() / 1000);
return isNaN(timestamp) ? -1 : timestamp;
} catch (e) {
return -1;
}
''';

SELECT
client,
party,
COUNT(0) AS total_req,
COUNTIF(diff < 0) AS req_too_short_cache,
COUNTIF(diff < 0) / COUNT(0) AS perc_req_too_short_cache
FROM
(
SELECT
"desktop" AS client,
IF(STRPOS(NET.HOST(requests.url), REGEXP_EXTRACT(NET.HOST(pages.url), r'([\w-]+)'))>0, 1, 3) AS party,
requests.expAge - (requests.startedDateTime - toTimestamp(requests.resp_last_modified)) AS diff
FROM
`httparchive.summary_requests.2020_08_01_desktop` requests
JOIN
`httparchive.summary_pages.2020_08_01_desktop` pages
ON
pages.pageid = requests.pageid
WHERE
TRIM(requests.resp_last_modified) <> "" AND
expAge > 0
UNION ALL
SELECT
"mobile" AS client,
IF(STRPOS(NET.HOST(requests.url), REGEXP_EXTRACT(NET.HOST(pages.url), r'([\w-]+)'))>0, 1, 3) AS party,
requests.expAge - (requests.startedDateTime - toTimestamp(requests.resp_last_modified)) AS diff
FROM
`httparchive.summary_requests.2020_08_01_mobile` requests
JOIN
`httparchive.summary_pages.2020_08_01_mobile` pages
ON
pages.pageid = requests.pageid
WHERE
TRIM(requests.resp_last_modified) <> "" AND
expAge > 0
)
GROUP BY
client,
party
ORDER BY
client,
party
68 changes: 68 additions & 0 deletions sql/2020/20_Caching/invalid_cache_control_directives.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#standardSQL
# List of invalid Cache-Control directive names.
SELECT
client,
total_requests,
total_using_cache_control,
directive_name,
directive_occurrences,
pct_of_cache_control,
pct_of_total_requests
FROM
(
(
SELECT
"desktop" AS client,
total_requests,
total_using_cache_control,
directive_name,
COUNT(0) AS directive_occurrences,
COUNT(0) / total_using_cache_control AS pct_of_cache_control,
COUNT(0) / total_requests AS pct_of_total_requests
FROM
`httparchive.summary_requests.2020_08_01_desktop`,
UNNEST(REGEXP_EXTRACT_ALL(LOWER(resp_cache_control), r'([a-z][^,\s="\']*)')) AS directive_name
CROSS JOIN (
SELECT
COUNT(0) AS total_requests,
COUNTIF(TRIM(resp_cache_control) != "") AS total_using_cache_control
FROM
`httparchive.summary_requests.2020_08_01_desktop`
)
GROUP BY
client,
total_requests,
total_using_cache_control,
directive_name
)
UNION ALL
(
SELECT
"mobile" AS client,
total_requests,
total_using_cache_control,
directive_name,
COUNT(0) AS directive_occurrences,
COUNT(0) / total_using_cache_control AS pct_of_cache_control,
COUNT(0) / total_requests AS pct_of_total_requests
FROM
`httparchive.summary_requests.2020_08_01_mobile`,
UNNEST(REGEXP_EXTRACT_ALL(LOWER(resp_cache_control), r'([a-z][^,\s="\']*)')) AS directive_name
CROSS JOIN (
SELECT
COUNT(0) AS total_requests,
COUNTIF(TRIM(resp_cache_control) != "") AS total_using_cache_control
FROM
`httparchive.summary_requests.2020_08_01_mobile`
)
GROUP BY
client,
total_requests,
total_using_cache_control,
directive_name
)
)
WHERE
directive_name NOT IN ('max-age', 'public', 'no-cache', 'must-revalidate', 'no-store', 'private', 'proxy-revalidate', 's-maxage', 'no-transform', 'immutable', 'stale-while-revalidate', 'stale-if-error', 'pre-check', 'post-check')
ORDER BY
client, directive_occurrences DESC
31 changes: 31 additions & 0 deletions sql/2020/20_Caching/invalid_last_modified_and_expires_and_date.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#standardSQL
# Valid date in Last-Modified, Expires, and Date headers
SELECT
client,
COUNT(0) AS total_requests,
COUNTIF(uses_date) AS total_using_date,
COUNTIF(uses_last_modified) AS total_using_last_modified,
COUNTIF(uses_expires) AS total_using_expires,
COUNTIF(uses_date AND NOT has_valid_date) AS total_using_invalid_date,
COUNTIF(uses_last_modified AND NOT has_valid_last_modified) AS total_using_invalid_last_modified,
COUNTIF(uses_expires AND NOT has_valid_expires) AS total_using_invalid_expires,
COUNTIF(uses_date) / COUNT(0) AS pct_using_date,
COUNTIF(uses_last_modified) / COUNT(0) AS pct_using_last_modified,
COUNTIF(uses_expires) / COUNT(0) AS pct_using_expires,
COUNTIF(uses_date AND NOT has_valid_date) / COUNT(uses_date) AS pct_using_invalid_date,
COUNTIF(uses_last_modified AND NOT has_valid_last_modified) / COUNT(uses_last_modified) AS pct_using_invalid_last_modified,
COUNTIF(uses_expires AND NOT has_valid_expires) / COUNT(uses_expires) AS pct_using_invalid_expires
FROM (
SELECT
_TABLE_SUFFIX AS client,
TRIM(resp_date) != "" AS uses_date,
TRIM(resp_last_modified) != "" AS uses_last_modified,
TRIM(resp_expires) != "" AS uses_expires,
REGEXP_CONTAINS(TRIM(resp_date), r'^(Mon|Tue|Wed|Thu|Fri|Sat|Sun), \d{1,2} (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{4} \d{2}:\d{2}:\d{2} GMT$') AS has_valid_date,
REGEXP_CONTAINS(TRIM(resp_last_modified), r'^(Mon|Tue|Wed|Thu|Fri|Sat|Sun), \d{1,2} (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{4} \d{2}:\d{2}:\d{2} GMT$') AS has_valid_last_modified,
REGEXP_CONTAINS(TRIM(resp_expires), r'^(Mon|Tue|Wed|Thu|Fri|Sat|Sun), \d{1,2} (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{4} \d{2}:\d{2}:\d{2} GMT$') AS has_valid_expires
FROM
`httparchive.summary_requests.2020_08_01_*`
)
GROUP BY
client
34 changes: 34 additions & 0 deletions sql/2020/20_Caching/last_modified_and_etag.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#standardSQL
# Presence of Last-Modified and ETag header, statistics on weak, strong, and invalid ETag.
SELECT
client,
COUNT(0) AS total_requests,
COUNTIF(uses_no_etag) AS total_using_no_etag,
COUNTIF(uses_etag) AS total_using_etag,
COUNTIF(uses_weak_etag) AS total_using_weak_etag,
COUNTIF(uses_strong_etag) AS total_using_strong_etag,
COUNTIF(NOT uses_weak_etag AND NOT uses_strong_etag AND uses_etag) AS total_using_invalid_etag,
COUNTIF(uses_last_modified) AS total_using_last_modified,
COUNTIF(uses_etag AND uses_last_modified) AS total_using_both,
COUNTIF(NOT uses_etag AND NOT uses_last_modified) AS total_using_neither,
COUNTIF(uses_no_etag) / COUNT(0) AS pct_using_no_etag,
COUNTIF(uses_etag) / COUNT(0) AS pct_using_etag,
COUNTIF(uses_weak_etag) / COUNT(0) AS pct_using_weak_etag,
COUNTIF(uses_strong_etag) / COUNT(0) AS pct_using_strong_etag,
COUNTIF(NOT uses_weak_etag AND NOT uses_strong_etag AND uses_etag) / COUNT(0) AS pct_using_invalid_etag,
COUNTIF(uses_last_modified) / COUNT(0) AS pct_using_last_modified,
COUNTIF(uses_etag AND uses_last_modified) / COUNT(0) AS pct_using_both,
COUNTIF(NOT uses_etag AND NOT uses_last_modified) / COUNT(0) AS pct_using_neither
FROM (
SELECT
_TABLE_SUFFIX AS client,
TRIM(resp_etag) = "" AS uses_no_etag,
TRIM(resp_etag) != "" AS uses_etag,
TRIM(resp_last_modified) != "" AS uses_last_modified,
REGEXP_CONTAINS(TRIM(resp_etag), '^W/\".*\"') AS uses_weak_etag,
REGEXP_CONTAINS(TRIM(resp_etag), '^\".*\"') AS uses_strong_etag
FROM
`httparchive.summary_requests.2020_08_01_*`
)
GROUP BY
client
Loading