-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query metrics: Chapter 20. HTTP/2 #101
Comments
@paulcalvano any progress on this? |
Working on some of these queries tonight. Quick question on the yearly trend - each yearly trend on these queries will process more than 100 TB of data. Is this ok @rviscomi ? |
8 of these queries were added to this PR - #127 Results from the H2 queries are here - https://docs.google.com/spreadsheets/d/1z1gdS3YVpe8J9K3g2UdrtdSPhRywVQRBz5kgBeqCnbw/edit?usp=sharing |
100 TB per query is too expensive. Maybe just compare July 2019 vs July 2018? |
Good idea. I think doing a 1 year comparison is still very useful, and it will certainly keep the query cost down. |
Some of the stats are already available as a yearly trend. For example:
Happy to use those. Are they cheaper as only use summary tables or are they generated in a different way? Either way happy to use that data where we have it though will the State Of The Web still hang around for future years or is the idea the Web Almanac replaces it? For other data that isn’t as easily and cheaply queried then a year on year comparison sounds fine. |
20.15 is easily trendable since it uses the summary tables. Most of the H2 analysis requires the requests table, which is currently the only place where we can query the protocol. Since it's part of the monthly pipeline, the trend data has accumulated over time instead of being run at once. I think working with the curated report results should be fine for your needs on this 20.02 (H2 requests over time). But 20.01 is not covered by that report since we need to look at H2 base page requests. |
OK that's what I thought. And to be clear you meant 20.01 part 2 (total HTTP/2 requests) is fine, but 20.01 part 1 (total sites - aka home pages only) is not fine? You said 20.02 in your comment but 20.02 was dropped. I guess I should really have separated 20.01 out into two asks to avoid this confusion :-) |
Both will coexist indefinitely. The distinction being that httparchive.org is the live historical view of the state of the web while almanac.httparchive.org is a companion report that elaborates on what that year's results actually mean. |
@paulcalvano I've checked off the metrics that have been covered by #127 although there are still 7 remaining. Once those are finalized we can pass this over to @bazzadp to start writing. |
@bazzadp - I should have these done soon. Few questions on the remaining queries: 20.07 - I'm not clear on how to detect H2 prioritization issues within the HTTP Archive data. One way we could estimate it is by categorizing sites as using web servers that pass/fail, as well as CDNs that pass/fail. If we did that we could consider a fail to be:
20.13 - Are you just interested in HTTP headers with preload, or are HTTP response bodies containing preload of interest here? 20.15, do you want the average number of TCP connections? Or a breakdown of how many sites have 1 connection, 2 connections, n connections? |
Hey @paulcalvano, 20.07 - I think we should limit this to CDNs as those are the only ones we have a definitive list for and they should be reasonably consistent compared to server setup which varies a lot in installed version, config and O/S TCP stacks...etc. So if we could report like this based on Andy's list (with additional "Not using CDN" and "Other CDN" lines at the top):
Not sure how easy to import Andy's table or if we have do this lookup in Google Sheets afterwards? 20.13 Just HTTP Headers. Preload in HTML is not a signal to HTTP/2 push. Plus for this stat I'm actually explicitly looking at usage of "nopush" in the HTTP Header to prevent HTTP/2 push. 20.15 Interesting question. I guess what I'm trying to show is, is adoption of HTTP/2 resulting in 1) less connections (as it uses 1 connection rather than 6) and 2) less usage of sharding (e.g. static.example.com type domains). It is probably most simply measured with the current stat on TCP connections per page though split by HTTP/1 and HTTP/2 home pages (where a HTTP/2 site is based on whether the main index.html page (or equivalent) is served over HTTP/2 or not). I think your suggestion of breakdown by number of connections would be too influenced by marketing stuff (e.g. example.com stopped using 6 connections but still loading 100 ad tech things means we go down from 106 connections to 100 connections so not really that noticeable). BTW did you see my comment #22 (comment) on some of the stats you dropped off? Is it possible to look at adding 20.02 and 20.17 back in based on those comments? |
Thanks. I added |
Just submitted a PR with more H2 queries. Some notes
I'll review comment #22 in the morning and see if we can add 20.02 and 20.17 back in. |
All of the H2 query results are here - https://docs.google.com/spreadsheets/d/1z1gdS3YVpe8J9K3g2UdrtdSPhRywVQRBz5kgBeqCnbw/edit?usp=sharing |
Added 20.02. Will take a look at 20.17 now... 20.2 - Measure of all HTTP versions (0.9, 1.0, 1.1, 2, QUIC) for main page of all sites, and for HTTPS sites. Table for last crawl. |
Thanks @paulcalvano! Marked the results with a few questions. Big one being that I'm not seeing QUIC anywhere - or is the blank protocol QUIC? |
READ ME!
All of the metrics in the table below have been marked as
Able To Query
during the metrics triage. The analyst assigned to each metric is expected to write the corresponding query and submit a PR to have it reviewed and added to the repo.In order to stay on schedule and have the data ready for authors, please have all metrics reviewed and merged by August 5.
Assignments
Checklist of metrics to be merged
The text was updated successfully, but these errors were encountered: