Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize assignments: Chapter 16. Caching #18

Closed
3 tasks done
rviscomi opened this issue May 21, 2019 · 11 comments
Closed
3 tasks done

Finalize assignments: Chapter 16. Caching #18

rviscomi opened this issue May 21, 2019 · 11 comments
Assignees

Comments

@rviscomi
Copy link
Member

rviscomi commented May 21, 2019

Section Chapter Author Reviewers
IV. Content Distribution 16. Caching @paulcalvano @yoavweiss @colinbendell

Due date: To help us stay on schedule, please complete the action items in this issue by June 3.

To do:

  • Assign subject matter expert (author)
  • Finalize peer reviewers
  • Finalize metrics

Current list of metrics:

  • TTL by resource
  • Resources served without cache
  • Cache strategy?
  • Cache TTL vs Content Age
  • Availability of Last-Modified vs. ETag validators
  • Validity of Dates in Last-Modified and Date headers
  • Set-Cookie on cacheable responses?
  • Use of Cache-Control: max-age vs. Expires
  • Use of Vary (how many dimensions, what headers, etc.)
  • Use of other Cache-Control directives (e.g., public, private, immutable)
  • 1st Party vs 3rd Party Caching
  • Public vs Private
  • Use of must-revalidate
  • Service Worker caching
  • AppCache

👉Optional AI (@paulcalvano): Peer reviewers are trusted experts who can support you when brainstorming metrics, interpreting results, and writing the report. Ideally this chapter will have multiple reviewers who can promote a diversity of perspectives. You currently have 1 peer reviewer.

👉 AI (@paulcalvano): Finalize which metrics you might like to include in an annual "state of third parties" report powered by HTTP Archive. Community contributors have initially sketched out a few ideas to get the ball rolling, but it's up to you, the subject matter experts, to know exactly which metrics we should be looking at. You can use the brainstorming doc to explore ideas.

The metrics should paint a holistic, data-driven picture of the third party landscape. The HTTP Archive does have its limitations and blind spots, so if there are metrics out of scope it's still good to identify them now during the brainstorming phase. We can make a note of them in the final report so readers understand why they're not discussed and the HTTP Archive team can make an effort to improve our telemetry for next year's Almanac.

Next steps: Over the next couple of months analysts will write the queries and generate the results, then hand everything off to you to write up your interpretation of the data.

Additional resources:

@rviscomi rviscomi transferred this issue from HTTPArchive/httparchive.org May 21, 2019
@rviscomi rviscomi added this to the Chapter planning complete milestone May 21, 2019
@rviscomi rviscomi changed the title [Web Almanac] Finalize assignments: Chapter 16. Caching Finalize assignments: Chapter 16. Caching May 21, 2019
@mnot
Copy link

mnot commented May 28, 2019

Would be interesting to see metrics on:

  • Availability of Last-Modified vs. ETag validators
  • Use of Cache-Control: max-age vs. Expires
  • Use of Vary (how many dimensions, what headers, etc.)
  • Use of other Cache-Control directives (e.g., public, private, immutable)

@paulcalvano
Copy link
Contributor

Few more ideas

  • 1st Party vs 3rd Party Caching
  • Public vs Private
  • Use of must-revalidate
  • Service Worker caching
  • AppCache usage (hopefully low)

@rviscomi
Copy link
Member Author

rviscomi commented Jun 4, 2019

@paulcalvano @yoavweiss @colinbendell we're hoping to finalize the metrics for each chapter today. Could you edit #18 (comment) and update it with anything that's missing? I see a bunch of other metrics were discussed in the comments. When that's done please tick the last TODO checkbox item and close this issue. Thanks!

@tunetheweb
Copy link
Member

Sorry I'm late, and know this is closed, but any thought in measuring whether ETags actually work?

They don't work in Apache for example if gzip or br is used (as I would hope they would be!) and you won't ever get 304 responses. Try it at www.apache.org for example - gzipped resources return 200 on refresh but images (which are not gzipped) correctly return a 304. So they should be turned off and Last-Modified should be used instead. Apache is pretty popular so imagine this affects a non-trival number of servers since ETags are enabled by default and most people turn on compression for performance reasons. Other servers may also have similar issues with them not actually working.

Also in the past ETags were often based on the inode which caused issues with load balanced servers, but not aware of anyone doing that anymore so not too worried about that. More worried about other implementation issues like Apache has. Though if can measure both together then why not.

It would require hitting at least one resource twice though (once with no cache, and then again with it cached) to see if 200 or 304 is returned so not sure how doable that is.

@rviscomi
Copy link
Member Author

rviscomi commented Jun 4, 2019

Not too late to add a metric if @paulcalvano sees fit. Just update the first comment.

@mnot
Copy link

mnot commented Jun 4, 2019

Investigating how well Etag validation is supported would be great. Just to note -- that apache bug is specific to mod_deflate; if you use Multiviews for negotiating encoding, it works fine (e.g., see www.mnot.net). That said, it'd be interesting to see how widespread that is.

Looking over https://cache-tests.fyi for inspiration, a few other things come to mind:

  • How common is it for sites to use non-lowercase cache-control parameters?
  • How common is it for sites to use invalid dates?
  • How common is it for sites to use Cache-Control: public (even though it usually isn't required)?
  • How common is it for sites to serve a Date and Age that don't make sense (see this paper)?
  • How many sites still use Pragma in responses (even though it doesn't mean anything)?
  • Do any sites put Set-Cookie on cacheable responses?

@colinbendell
Copy link

One additional thought: Might worth adding an experimental headers section and include in-the-wild uses of Variance or Key (if any)

@paulcalvano
Copy link
Contributor

I think ETag validation would be out of scope for this because we aren;'t making a repeat request. I agree it would definitely be interesting to explore whether servers are returning 304 status codes to requests with valid ETags.

@mnot - great idea to look at the cache tests. I'll add some of these to the list.

On the topic of valid dates - I ran into many invalid Date and Last-Modified headers in a recent analysis I did, so it would be interesting to explore what is going on there.

@paulcalvano paulcalvano reopened this Jun 5, 2019
@paulcalvano
Copy link
Contributor

@colinbendell - do you have an example of Variance or Key headers? I'm not familiar with those.

@rviscomi rviscomi added the ASAP This issue is blocking progress label Jun 6, 2019
@rviscomi
Copy link
Member Author

rviscomi commented Jun 6, 2019

Hoping we can resolve the open questions about metrics and close this issue ASAP.

@rviscomi
Copy link
Member Author

rviscomi commented Jun 7, 2019

Last call for metrics. @paulcalvano please update the final list and close this issue today. (sorry, couldn't think of a caching pun)

@rviscomi rviscomi removed the ASAP This issue is blocking progress label Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants