-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markup 2020 TODOs #1614
Markup 2020 TODOs #1614
Changes from 22 commits
d582cf8
ee4c97e
b7846b4
50cd23b
51f03d6
87041d0
8c9e781
69ca37b
66e28ea
750c94b
4af6840
980e21f
079d2ec
0c708bf
0da6a62
c7d99ee
37659b4
abaa49c
2b0dba1
9fa288e
5514c08
5ba7822
39d61e6
79d36b9
c93c67b
b157073
dbf3a26
f261f63
13d6c58
bca5e67
7475b53
f6e171c
88c936a
2def826
ccf22c0
570a0c3
d3f2d19
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -80,8 +80,7 @@ A page's document size refers to the amount of HTML bytes transferred over the n | |
* The largest document by far weighs 64.16 _MB_, almost deserving its own analysis and chapter in the Web Almanac. | ||
|
||
{# TODO(analysts): Should 25,237 bytes be divided by 1000 or 1024 to convert to KB? 1000 seems to be used here but most chapters use 1024. Are the stats above also off? #} | ||
Tiggerito marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{# TODO(authors): What's the implication and your interpretation of this value? Is this a surprisingly big number? Or does it align with your expectations? #} | ||
How is this situation in general, then? The median document weighs 25.24 KB: | ||
How is this situation in general, then? The median document weighs 25.24 KB, which comes [without surprises](https://httparchive.org/reports/page-weight): | ||
|
||
{{ figure_markup( | ||
image="document-size.png", | ||
|
@@ -121,8 +120,6 @@ Here are the 10 most popular (normalized) languages in our sample. At first we c | |
<figcaption>{{ figure_link(caption="Top 10 <code>lang</code> attribute values.", sheets_gid="2047285366", sql_file="pages_almanac_by_device_and_html_lang.sql") }}</figcaption> | ||
</figure> | ||
|
||
{# TODO(authors): Add an interpretation of the lang results. #} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Background for removal: I’d argue there’s little to interpret, or to defer to methodology as anything we see here may have been introduced on that end. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What conclusions would you like to see readers draw from this figure? If there's nothing to say about it, do we need it at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This may raise a very good point here! If the data set suggests that this data may not be representative for the wider Web then we should probably take this out, because what is there to see? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The dat set is crawled from a US crawler with Additionally, the set of URLs is based on the CrUX data based off of Chrome data - is that swayed towards western users (therefore explaining the low numbers of Asia sites other than Japanese)? Therefore I think it raises a very good question as to whether we can rely on this data? At the very least we should add a caveat explaining these influences. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I remember the point of this section was to help the readers better understand the data set they're looking at: e.g. where do most of the pages come from and what is the main language used/detected on them. In my opinion, I think a pie chart with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't look like I can make suggestions on a deleted line, but I would suggest something like this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the page (with just minor edits). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion to remove SGTM. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove? Or keep the update? (We had updated this section.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was referring to the removal of the table. Either way, LGTM. |
||
|
||
### Comments | ||
|
||
Adding comments to code is generally a good practice and HTML comments are there to add notes to HTML documents, without having them rendered by user agents. | ||
|
@@ -131,9 +128,7 @@ Adding comments to code is generally a good practice and HTML comments are there | |
<!-- This is a comment in HTML --> | ||
``` | ||
|
||
Although many pages will have been stripped of comments for production, we found that index pages in the 90th percentile are using about 73 comments on mobile, respectively 79 comments on desktop, while in the 10th percentile the number of the comments is about 2. | ||
|
||
{# TODO(authors): How about the median number for a typical website? #} | ||
Although many pages will have been stripped of comments for production, we found that index pages in the 90th percentile are using about 73 comments on mobile, respectively 79 comments on desktop, while in the 10th percentile the number of the comments is about 2. The median page uses 16 (mobile) or 17 comments (desktop). | ||
|
||
Around 89% of pages contain at least one HTML comment, while about 46% of them contain a conditional comment. | ||
|
||
|
@@ -151,7 +146,7 @@ Still, on the above percentile extremes, we found that web pages are using about | |
|
||
For production, HTML comments are usually stripped by build tools. Considering all the above counts and percentages, and referring to the use of comments in general, we suppose that lots of pages are served without involving an HTML minifier. | ||
|
||
### Script use | ||
### Script use | ||
|
||
As shown in the [Top elements](#top-elements) section below, the `script` element is the 6th most frequently used HTML element. For the purposes of this chapter, we were interested in the ways the `script` element is used across these millions of pages from the data set. | ||
|
||
|
@@ -173,7 +168,7 @@ At the opposite end of the spectrum, the numbers show that about 97% of pages co | |
) | ||
}} | ||
|
||
When scripting is unsupported or turned off in the browser, the `noscript` element helps to add an HTML section within a page. Considering the above script numbers, we were curious about the `noscript` element as well. | ||
When scripting is unsupported or turned off in the browser, the `noscript` element helps to add an HTML section within a page. Considering the above script numbers, we were curious about the `noscript` element as well. | ||
|
||
Following the analysis, we found that about 49% of pages are using a `noscript` element. At the same time, about 16% of `noscript` elements were containing an `iframe` with a `src` value referring to "googletagmanager.com". | ||
|
||
|
@@ -183,13 +178,11 @@ This seems to confirm the theory that the total number of `noscript` elements in | |
|
||
What `type` attribute values are used with `script` elements? | ||
|
||
{# TODO(authors, analysts): Should this be a figure? #} | ||
rviscomi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{# TODO(authors): Explain the significance of the "!" in text. #} | ||
- `text/javascript`: 60.03% | ||
- `application/ld+json`: 1.68% | ||
- `application/json`: 0.41% | ||
- `text/template`: 0.41% | ||
- `text/html` (!) 0.27% | ||
- `text/html` 0.27% | ||
|
||
When it comes to loading [JavaScript module scripts](https://jakearchibald.com/2017/es-modules-in-browsers/) using `type="module"`, we found that 0.13% of `script` elements currently specify this attribute-value combination. `nomodule` is used by 0.95% of all tested pages. (Note that one metric relates to elements, the other to pages.) | ||
|
||
|
@@ -352,13 +345,11 @@ Standard elements are those that are or were part of the HTML specification. Whi | |
<figcaption>{{ figure_link(caption="Low probabilities of finding a given element in pages of the sample.", sheets_gid="184700688", sql_file="pages_element_count_by_device_and_element_type_present.sql") }}</figcaption> | ||
</figure> | ||
|
||
{# TODO(authors): Interpet results. #} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Background: Suggesting to skip this, or ask whether @catalinred or @iandevlin like to draft something. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, I’d look at @catalinred or @iandevlin whether they like to expand on this. Other than that, as opposed to the lang section here I don’t agree. Unless you’re coming from an Almanac convention that requires every data point to be interpreted, I wouldn’t be convinced that data can’t stand by itself. I’d suggest that not only are our readers capable of drawing conclusions, but that this can even be refreshing from an editorial perspective, if not to suggest—“wait a minute; what’s this, what does this mean?” If I misunderstand you, please let me know, otherwise I’d appreciate a bit of leeway here for us to decide on what also not to interpret. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My opinion is we should be selective to what we put in the chapter. We have a lot of data - some of it will be interesting and some not. I think we need to pick and choose what to include to keep the chapter interesting. The point of the almanac is "The Web Almanac is a comprehensive report on the state of the web, backed by real data and trusted web experts" rather than "a list of stats, with experts interpreting them". Saying that, I think these particular stats ARE interesting. Why are they not used? Are they old elements which are not useful? Have they been replaced by better alternatives? Or are they new elements that haven't taken off yet? Looking at MDN two if them are obsoleted ( However, I do think this is the authors work, so if you still feel this is not necessary explanation and the stats stand on their own, then I can accept this. But think it's right for @rviscomi to at least ask these questions in case you hadn't considered that interpretation or were making assumptions as to what readers might know because you are HTML experts (and other readers might not be). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion to skip SGTM. Could you update the md to remove the content? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually felt like I want to add info here. Added a paragraph. What do you think? |
||
|
||
### Custom elements | ||
|
||
The 2019 edition of the Web Almanac handled [custom elements](../2019/markup#custom-elements) by discussing several non-standard elements. This year, we found it valuable to have a closer look at custom elements. How did we determine these? Roughly by looking at [their definition](https://html.spec.whatwg.org/multipage/custom-elements.html#custom-elements-core-concepts), notably their use of a hyphen. Let's focus on the top elements, in this case elements used on ≥1% of all URLs in the sample: | ||
|
||
{# TODO(authors, analysts): Clarify occurrences and percentages _of what_. Pages? Elements? And for desktop or mobile? #} | ||
{# TODO(authors, analysts): Clarify occurrences and percentages _of what_. Pages? Elements? #} | ||
rviscomi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
<figure markdown> | ||
| Element | Occurrences | Percentage | | ||
|
@@ -697,7 +688,7 @@ Using `target="_blank"` has been known to be a [security vulnerability](https:// | |
<figcaption>{{ figure_link(caption="Blank relationships.", sheets_gid="1876528165", sql_file="pages_wpt_bodies_by_device.sql") }}</figcaption> | ||
</figure> | ||
|
||
As a rule of thumb and for [usability reasons](https://www.nngroup.com/articles/new-browser-windows-and-tabs/), prefer not to use `target="_blank"` in the first place. | ||
As a rule of thumb and for [usability reasons](https://www.nngroup.com/articles/new-browser-windows-and-tabs/), prefer not to use `target="_blank"` in the first place. | ||
|
||
<p class="note">Within the latest Safari and Firefox versions, setting <code>target="_blank"</code> on <code>a</code> elements implicitly provides the same <code>rel</code> behavior as setting <code>rel="noopener"</code>. This is already <a href="https://chromium-review.googlesource.com/c/chromium/src/+/1630010">implemented in Chromium</a> as well and will land in Chrome 88.</p> | ||
|
||
|
@@ -713,7 +704,6 @@ We've touched on some observations throughout the chapter, but as a reflection o | |
sql_file="summary_pages_by_device_and_doctype.sql" | ||
) }} | ||
|
||
{# TODO(authors): Changed Simon's quote to a paraphrase, since it's not clear which part is verbatim. If there's a quote, let's wrap it in quotes. #} | ||
Fewer pages land in quirks mode. In 2016, that number was at [around 7.4%](https://discuss.httparchive.org/t/how-many-and-which-pages-are-in-quirks-mode/777). At the end of 2019, we observed [4.85%](https://twitter.com/zcorpan/status/1205242913908838400). And now, we're at about 3.97%. This trend, to paraphrase [Simon Pieters](./contributors#zcorpan) in his review of this chapter, seems clear and encouraging. | ||
|
||
Although we lack historic data to draw the full development picture, "meaningless" `div`, `span`, and `i` markup has pretty much [replaced](#top-elements) the `table` markup we've observed in the 1990s and early 2000s. While one may question whether `div` and `span` elements are always used without there being a semantically more appropriate alternative, these elements are still preferable to `table` markup, though, as during the heyday of the old web, these were seemingly used for everything but tabular data. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tiggerito?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, 1024 should be used. I'm not confident on the above stats. I'd like to use an example we can prove. I think the problem was the bytesHtmlDoc which I think it more true to what we would call the document got cut off at 16,777,215.
I used Screaming Frog to crawl all the ones in the list, and double checked the big ones in Chrome. Drum roll...
https://www.linkshops.com/ contains a js bundle that is 28.1MB.
https://www.boonterm.com/web/index1.php has 22.8MB of html (embedded images) and references a 34.7MB mp4 file.
https://www.aci.edu.sg/ has 33.7MB of html and took me over a minute to load. Also embedded images.