Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize assignments: Chapter 17. CDN #19

Closed
3 tasks done
rviscomi opened this issue May 21, 2019 · 23 comments
Closed
3 tasks done

Finalize assignments: Chapter 17. CDN #19

rviscomi opened this issue May 21, 2019 · 23 comments
Assignees

Comments

@rviscomi
Copy link
Member

rviscomi commented May 21, 2019

Section Chapter Authors Reviewers
IV. Content Distribution 17. CDN @andydavies @colinbendell @yoavweiss @paulcalvano @pmeenan

Due date: To help us stay on schedule, please complete the action items in this issue by June 3.

To do:

  • Assign subject matter expert (author)
  • Assign peer reviewers
  • Finalize metrics

Current list of metrics:

  • What are the top CDNs (by number of sites using rather requests?)

  • What % of sites use a CDN

  • % of sites that use a CDN for primary domain i.e. www

  • % of sites that use a CDN for secondary domain e.g. static. media.

  • Usage of 3rd-party public CDNs e.g jQuery, apis.google etc.

  • CDN TTFB

  • HTTP things (not necessarily cdn related): header volume, STS, Timing-Allow-Origin, Via, Keep-Alive, Server-Timing metrics/presence, Vary, Content-Disposition, etc

  • TLS negotiation time

  • TLS Certificate size

  • OCSP stapling support

  • Dns v. anycast ip use

  • Cwnd growth rate (not sure this will be measurable)

  • TLS connection coalescing with H2 connections

  • Number of CDN's used per page

  • H2 push?

  • HTTPS uses 1.1 or 2?

  • Use of CDN header directives (s-max-age, stale-while-revalidate, nopush, stale-while-error, pre-check and Surrogate-Control)

  • How have these patterns changed over the last year / two years

👉 AI (reviewers): Finalize which metrics you might like to include in an annual "state of CDNs" report powered by HTTP Archive. Community contributors have initially sketched out a few ideas to get the ball rolling, but it's up to the subject matter experts to know exactly which metrics we should be looking at. You can use the brainstorming doc to explore ideas.

The metrics should paint a holistic, data-driven picture of the CDN landscape. The HTTP Archive does have its limitations and blind spots, so if there are metrics out of scope it's still good to identify them now during the brainstorming phase. We can make a note of them in the final report so readers understand why they're not discussed and the HTTP Archive team can make an effort to improve our telemetry for next year's Almanac.

Next steps: Over the next couple of months analysts will write the queries and generate the results, then hand everything off to you to write up your interpretation of the data.

Additional resources:

@rviscomi rviscomi transferred this issue from HTTPArchive/httparchive.org May 21, 2019
@rviscomi rviscomi added this to the Chapter planning complete milestone May 21, 2019
@rviscomi rviscomi changed the title [Web Almanac] Finalize assignments: Chapter 17. CDN Finalize assignments: Chapter 17. CDN May 21, 2019
@rviscomi
Copy link
Member Author

Added Andy as an author and updated the list of metrics.

@mnot
Copy link

mnot commented May 28, 2019

It would be cool if you could measure different kinds of CDN --

  • "offsite" CDNs for images, JS (e.g., jsDeliver)
  • DNS-based CDNs (e.g., Akamai)
  • Anycast-based CDNs (e.g., Edgecast, Fastly)

To differentiate the latter two, you'd need to make requests from two different points on the network; if the IP addresses are different, it's DNS-based; if they're the same but the latency is too low to be a single DC, it's anycast.

You might also be able to fingerprint CDNs based upon their response headers, etc.; would it be worth it to try to do this for the top N CDNs, to identify which sites are using them?

Cache hit rate is the other thing to measure; it's tricky to do so remotely, but some CDNs leak this in headers (e.g., X-Cache, or just Age).

@andydavies
Copy link
Collaborator

Some other metrics to consider

  • OCSP Support (is it universal amongst CDNs now?)
  • main site vs static CDN for other resources? (some serve all their content through CDN, some still split)

@ghedo
Copy link
Member

ghedo commented Jun 1, 2019

To expand somewhat on the "main site vs static CDN" point, it might also be useful to see how many different CDNs sites use (when different resources are served by different CDNs).

@ghedo
Copy link
Member

ghedo commented Jun 1, 2019

And on the "Cwnd growth rate" point, also having initial cwnd size might give a better picture.

In addition having some form of RTT measurement would probably help put the other timing metrics into perspective.

@bkardell
Copy link
Contributor

bkardell commented Jun 1, 2019

Cache hit rate is the other thing to measure; it's tricky to do so remotely...

@mnot could we check the request header to see if there is a modified-since or etag or something?

@mnot
Copy link

mnot commented Jun 2, 2019

Well, you can look at Age and Date to get a sense of whether it was cached at some point, but of course that only applies to this request; the overall hit rate is something you can really only measure from the cache's point of view.

I'm not sure whether that's a useful thing to measure or not; it certainly would be interesting, but the wouldn't be much signal, and people would likely misinterpret it...

@bkardell
Copy link
Contributor

bkardell commented Jun 3, 2019

Oh wait.. none of these are coming from real use anyways..nm, what I was saying makes absolutely no sense.

@rviscomi
Copy link
Member Author

rviscomi commented Jun 4, 2019

@andydavies @colinbendell@yoavweiss @paulcalvano @pmeenan we're hoping to finalize the metrics for each chapter today. Please take one last look at your official list of metrics in #19 (comment) and add anything that's missing. Once done, please tick the last TODO checkbox and close this issue. Thanks!

@andydavies
Copy link
Collaborator

@colinbendell @yoavweiss @paulcalvano @pmeenan

I've updated but feel free to adjust what I've done

@pmeenan
Copy link
Member

pmeenan commented Jun 4, 2019

I added a few more. Otherwise mostly looks good (not sure cwnd will be measurable though). Also have to be careful with ttfb (static vs dynamic resources and how busy the browser is at the time if it isn't the base page).

@paulcalvano
Copy link
Contributor

This looks good. A few quick suggestions though:

  • I would remove CDN TTFB since this is largely dependent on each site's application performance. Also since this is a single measurement per measured site the performance numbers might be misleading.
  • I'm don't think we'll be able to look at OCSP staplling, Cwnd growth rate or H2 connection coalescing in HA data.
  • Vary header will be discussed in the Caching chapter.

@andydavies
Copy link
Collaborator

I wasn’t sure about OCSP stapling but I’d like to try as it’s important and Digicert with their I stapled certs are a PITA

@pmeenan
Copy link
Member

pmeenan commented Jun 4, 2019

Yeah, I just looked through the security info and netlog and can't see anything around stapling (just transparency logs). Might not be possible without decoding the TLS handshake itself (which isn't something that is going to happen in time).

@colinbendell
Copy link

@paulcalvano will you be tracking the CC: smax, SWE and SWR and pre-check header usage?

@rviscomi rviscomi added the ASAP This issue is blocking progress label Jun 6, 2019
@rviscomi
Copy link
Member Author

rviscomi commented Jun 6, 2019

Ping to get the metrics finalized ASAP, there are still some open questions in the comments. You can close this issue when the metrics list is final.

@rviscomi
Copy link
Member Author

rviscomi commented Jun 7, 2019

@andydavies @colinbendell is this issue ready to close?

@andydavies
Copy link
Collaborator

With the exception of @colinbendell's cryptic acronyms I think all the other comments are addressed in the origin comment

@colinbendell
Copy link

updated and closing.

@rviscomi rviscomi removed the ASAP This issue is blocking progress label Jun 10, 2019
@raghuramakrishnan71
Copy link
Contributor

raghuramakrishnan71 commented Jun 28, 2019

@paulcalvano @colinbendell I am not very clear about the metric "header volume" (Content Distribution/CDN). Does it refer to the size of the HTTP headers?
Is it reqHeadersSize or respHeadersSize (in summary_requests)?

@raghuramakrishnan71
Copy link
Contributor

@colinbendell @andydavies Does the metric TLS Certificate size (Content Distribution/CDN) refer to the size of the public key in the SSL certificate? The certificate in the HAR can be parsed and BASE64 decoded, and then the required attribute extracted. But this approach will need the HAR files to be downloaded prior to that. An alternative may be to add a custom metric.
-----BEGIN CERTIFICATE-----
......
.....
-----END CERTIFICATE-----

@raghuramakrishnan71
Copy link
Contributor

@rviscomi saw that you were Able to Query TLS Certificate size and TLS negotiation time. Was curious to know,

  1. TLS negotiation time: did we use requests.YYYY_MM_DD_desktop.payload (ssl_ms, _ssl_start, _ssl_end, ssl) or some other metric?
  2. Was there an already available WPT metric for TLS Certificate size?

@rviscomi
Copy link
Member Author

Negotiation time: this should be timings.ssl in the requests payload.

Certificate size, per @pmeenan:

The TLS certificate size should be available in the _securityInfo block (the raw certificates are, just need to sum the lengths probably)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants