-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query metrics: Chapter 5. Third Parties #86
Comments
Hey @patrickhulce, I'm going through your chapter to create the data viz you requested, but some of the query results don't match up with the values you're writing about. For example:
Looking at the results of 05_11, I'm not seeing analytics with 76%. The median In your text you're mentioning the percent of sites having analytics (as opposed to requests), which sounds more accurate. But still, I don't know where you got that number for reference. |
It will be a little more difficult to see the results I'm talking about with this change. I'm saying that "the most common third-party category with 76% of sites including at least one analytics request", not that analytics requests make up 76% of requests. If you look at the 25th percentile desktop you'll see analytics requests make up 0.93%, meaning they have at least 1, meaning 75% of pages have at least 1. |
Sorry, what's special about 0.93% to indicate that there is at least 1 request? Is 1% == 1 request? Is there a more straightforward way to query this metric? Even if there isn't time to rewrite the query, how can we visualize the current 05_11 results to show what you're referring to? |
Well it's not possible to make fractional requests, so anything non-zero indicates that sites at that percentile made at least one request. My goal with the analysis was to point out interesting tidbits that weren't just regurgitating what could be obviously seen from a graph on first glance, but it sounds like this reached a little too far from obvious insights and I should have been a little more aligned with the pure results of the query?
This is the query that was optimized to be the most flexible and can show the widest range of insights. I get that it makes cajoling the raw data into a visualization that matches what can be said about that data difficult though. If we just want to match the analysis then we can basically throw out the quantiles and repeat the line for pages with a third party for each named category (https://github.com/HTTPArchive/almanac.httparchive.org/pull/107/files#diff-561fc8f885c05295633879f79753feffR6) |
Reverted my query change. I'll close this out and open a new PR with any changes. |
Ok sounds good sorry for the trouble @rviscomi thanks very much for tackling those! Let me know if there's something specific I can knock out :) (will be flying starting at ~6pm PST today though, see ya soon!) |
@patrickhulce just want to make sure you didn't misinterpret your own data. Here's how one of the data points in 05_11 is queried: APPROX_QUANTILES(numberOfThirdPartyRequests / numberOfRequests, 100) Each percentile is a percent of requests (count / total), not the number of requests. That's why I was asking about 1% != 1 request. |
READ ME!
All of the metrics in the table below have been marked as
Able To Query
during the metrics triage. The analyst assigned to each metric is expected to write the corresponding query and submit a PR to have it reviewed and added to the repo.In order to stay on schedule and have the data ready for authors, please have all metrics reviewed and merged by August 5.
Assignments
Checklist of metrics to be merged
The text was updated successfully, but these errors were encountered: