Finalize assignments: Chapter 16. Caching #18

rviscomi · 2019-05-21T01:41:24Z

Section	Chapter	Author	Reviewers
IV. Content Distribution	16. Caching	@paulcalvano	@yoavweiss @colinbendell

Due date: To help us stay on schedule, please complete the action items in this issue by June 3.

To do:

Assign subject matter expert (author)
Finalize peer reviewers
Finalize metrics

Current list of metrics:

TTL by resource
Resources served without cache
Cache strategy?
Cache TTL vs Content Age
Availability of Last-Modified vs. ETag validators
Validity of Dates in Last-Modified and Date headers
Set-Cookie on cacheable responses?
Use of Cache-Control: max-age vs. Expires
Use of Vary (how many dimensions, what headers, etc.)
Use of other Cache-Control directives (e.g., public, private, immutable)
1st Party vs 3rd Party Caching
Public vs Private
Use of must-revalidate
Service Worker caching
AppCache

👉Optional AI (@paulcalvano): Peer reviewers are trusted experts who can support you when brainstorming metrics, interpreting results, and writing the report. Ideally this chapter will have multiple reviewers who can promote a diversity of perspectives. You currently have 1 peer reviewer.

👉 AI (@paulcalvano): Finalize which metrics you might like to include in an annual "state of third parties" report powered by HTTP Archive. Community contributors have initially sketched out a few ideas to get the ball rolling, but it's up to you, the subject matter experts, to know exactly which metrics we should be looking at. You can use the brainstorming doc to explore ideas.

The metrics should paint a holistic, data-driven picture of the third party landscape. The HTTP Archive does have its limitations and blind spots, so if there are metrics out of scope it's still good to identify them now during the brainstorming phase. We can make a note of them in the final report so readers understand why they're not discussed and the HTTP Archive team can make an effort to improve our telemetry for next year's Almanac.

Next steps: Over the next couple of months analysts will write the queries and generate the results, then hand everything off to you to write up your interpretation of the data.

Additional resources:

mnot · 2019-05-28T07:26:12Z

Would be interesting to see metrics on:

Availability of Last-Modified vs. ETag validators
Use of Cache-Control: max-age vs. Expires
Use of Vary (how many dimensions, what headers, etc.)
Use of other Cache-Control directives (e.g., public, private, immutable)

paulcalvano · 2019-06-03T16:29:44Z

Few more ideas

1st Party vs 3rd Party Caching
Public vs Private
Use of must-revalidate
Service Worker caching
AppCache usage (hopefully low)

rviscomi · 2019-06-04T01:00:26Z

@paulcalvano @yoavweiss @colinbendell we're hoping to finalize the metrics for each chapter today. Could you edit #18 (comment) and update it with anything that's missing? I see a bunch of other metrics were discussed in the comments. When that's done please tick the last TODO checkbox item and close this issue. Thanks!

tunetheweb · 2019-06-04T21:54:33Z

Sorry I'm late, and know this is closed, but any thought in measuring whether ETags actually work?

They don't work in Apache for example if gzip or br is used (as I would hope they would be!) and you won't ever get 304 responses. Try it at www.apache.org for example - gzipped resources return 200 on refresh but images (which are not gzipped) correctly return a 304. So they should be turned off and Last-Modified should be used instead. Apache is pretty popular so imagine this affects a non-trival number of servers since ETags are enabled by default and most people turn on compression for performance reasons. Other servers may also have similar issues with them not actually working.

Also in the past ETags were often based on the inode which caused issues with load balanced servers, but not aware of anyone doing that anymore so not too worried about that. More worried about other implementation issues like Apache has. Though if can measure both together then why not.

It would require hitting at least one resource twice though (once with no cache, and then again with it cached) to see if 200 or 304 is returned so not sure how doable that is.

rviscomi · 2019-06-04T22:18:03Z

Not too late to add a metric if @paulcalvano sees fit. Just update the first comment.

mnot · 2019-06-04T23:00:33Z

Investigating how well Etag validation is supported would be great. Just to note -- that apache bug is specific to mod_deflate; if you use Multiviews for negotiating encoding, it works fine (e.g., see www.mnot.net). That said, it'd be interesting to see how widespread that is.

Looking over https://cache-tests.fyi for inspiration, a few other things come to mind:

How common is it for sites to use non-lowercase cache-control parameters?
How common is it for sites to use invalid dates?
How common is it for sites to use Cache-Control: public (even though it usually isn't required)?
How common is it for sites to serve a Date and Age that don't make sense (see this paper)?
How many sites still use Pragma in responses (even though it doesn't mean anything)?
Do any sites put Set-Cookie on cacheable responses?

colinbendell · 2019-06-05T04:38:40Z

One additional thought: Might worth adding an experimental headers section and include in-the-wild uses of Variance or Key (if any)

paulcalvano · 2019-06-05T16:02:16Z

I think ETag validation would be out of scope for this because we aren;'t making a repeat request. I agree it would definitely be interesting to explore whether servers are returning 304 status codes to requests with valid ETags.

@mnot - great idea to look at the cache tests. I'll add some of these to the list.

On the topic of valid dates - I ran into many invalid Date and Last-Modified headers in a recent analysis I did, so it would be interesting to explore what is going on there.

paulcalvano · 2019-06-05T16:03:20Z

@colinbendell - do you have an example of Variance or Key headers? I'm not familiar with those.

rviscomi · 2019-06-06T17:49:55Z

Hoping we can resolve the open questions about metrics and close this issue ASAP.

rviscomi · 2019-06-07T15:43:37Z

Last call for metrics. @paulcalvano please update the final list and close this issue today. (sorry, couldn't think of a caching pun)

rviscomi assigned paulcalvano May 21, 2019

rviscomi transferred this issue from HTTPArchive/httparchive.org May 21, 2019

rviscomi added this to the Chapter planning complete milestone May 21, 2019

rviscomi changed the title ~~[Web Almanac] Finalize assignments: Chapter 16. Caching~~ Finalize assignments: Chapter 16. Caching May 21, 2019

rviscomi mentioned this issue May 23, 2019

Assign subject matter experts and peer reviewers to each chapter #2

Closed

paulcalvano closed this as completed Jun 4, 2019

paulcalvano reopened this Jun 5, 2019

rviscomi added the ASAP This issue is blocking progress label Jun 6, 2019

paulcalvano closed this as completed Jun 8, 2019

rviscomi mentioned this issue Jul 23, 2019

Query metrics: Chapter 16. Caching #97

Closed

14 tasks

rviscomi mentioned this issue Sep 25, 2019

Write content: Chapter 16: Caching #172

Closed

3 tasks

rviscomi removed the ASAP This issue is blocking progress label Sep 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finalize assignments: Chapter 16. Caching #18

Finalize assignments: Chapter 16. Caching #18

rviscomi commented May 21, 2019 •

edited by paulcalvano

Loading

mnot commented May 28, 2019

paulcalvano commented Jun 3, 2019

rviscomi commented Jun 4, 2019

tunetheweb commented Jun 4, 2019

rviscomi commented Jun 4, 2019

mnot commented Jun 4, 2019

colinbendell commented Jun 5, 2019

paulcalvano commented Jun 5, 2019

paulcalvano commented Jun 5, 2019

rviscomi commented Jun 6, 2019

rviscomi commented Jun 7, 2019

Finalize assignments: Chapter 16. Caching #18

Finalize assignments: Chapter 16. Caching #18

Comments

rviscomi commented May 21, 2019 • edited by paulcalvano Loading

mnot commented May 28, 2019

paulcalvano commented Jun 3, 2019

rviscomi commented Jun 4, 2019

tunetheweb commented Jun 4, 2019

rviscomi commented Jun 4, 2019

mnot commented Jun 4, 2019

colinbendell commented Jun 5, 2019

paulcalvano commented Jun 5, 2019

paulcalvano commented Jun 5, 2019

rviscomi commented Jun 6, 2019

rviscomi commented Jun 7, 2019

rviscomi commented May 21, 2019 •

edited by paulcalvano

Loading