-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markup 2020 TODOs #1614
Markup 2020 TODOs #1614
Conversation
chore: sync
chore: sync
chore: sync
chore: sync
chore: sync
chore: sync
chore: sync
chore: sync
Signed-off-by: Jens Oliver Meiert <[email protected]>
chore: sync
chore: sync
Co-authored-by: Barry Pollard <[email protected]>
Co-authored-by: Barry Pollard <[email protected]>
chore: sync
chore: sync
chore: sync
chore: sync
chore: sync
chore: sync
chore: sync
Signed-off-by: Jens Oliver Meiert <[email protected]>
@@ -121,8 +120,6 @@ Here are the 10 most popular (normalized) languages in our sample. At first we c | |||
<figcaption>{{ figure_link(caption="Top 10 <code>lang</code> attribute values.", sheets_gid="2047285366", sql_file="pages_almanac_by_device_and_html_lang.sql") }}</figcaption> | |||
</figure> | |||
|
|||
{# TODO(authors): Add an interpretation of the lang results. #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Background for removal: I’d argue there’s little to interpret, or to defer to methodology as anything we see here may have been introduced on that end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What conclusions would you like to see readers draw from this figure? If there's nothing to say about it, do we need it at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may raise a very good point here! If the data set suggests that this data may not be representative for the wider Web then we should probably take this out, because what is there to see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dat set is crawled from a US crawler with en-US
set as the preferred language. Not sure how common it is to redirect to a home page based on that locale?
Additionally, the set of URLs is based on the CrUX data based off of Chrome data - is that swayed towards western users (therefore explaining the low numbers of Asia sites other than Japanese)?
Therefore I think it raises a very good question as to whether we can rely on this data? At the very least we should add a caveat explaining these influences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember the point of this section was to help the readers better understand the data set they're looking at: e.g. where do most of the pages come from and what is the main language used/detected on them.
In my opinion, I think a pie chart with en
, en-us
, ja
, es
etc would do it here. I'd also add in the chart the 22.36%
of all documents that specify no lang
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like I can make suggestions on a deleted line, but I would suggest something like this:
Here are the 10 most popular (normalized on case) languages in our sample. It is important to note that the HTTP Archive crawls from US data centres with English language settings, so looking at the language pages are written in will be skewed towards English. Nevertheless we present the lang attributes seen to give some context to the sites analysed.
{{ figure_markup(
image="top-html-lang.png",
alt="The top HTML lang attritbues.",
caption="The top HTML `lang` attritutes.",
description="Bar chart showing the top 10 `lang` attributes using in our crawl with 22.82% of desktop and 22.36% of mobile not setting this, `en` being used on 20.09% and 18.08% respectively, `ja` on 15.17% and 13.27%, `es` on 4.86% and 4.09% , `pt-br` on 2.65% and 2.84%, `ru` on 2.21% 2.53%, `en-gb` on 2.35% and 2.19%, `de` on 1.50% and 1.92%, and finally `fr` being used on 1.55% and 1.43% respectively",
sheets_gid="2047285366",
chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQPKzFb574UnGTcfw5mcD1qR7RYHyGjQTc2hiMuYix0QoTH1DPe54Q2JucXL8bfZ6kjRoAfhk3ckudc/pubchart?oid=1873310240&format=interactive",
width=600,
height=371,
sql_file="pages_almanac_by_device_and_html_lang.sql"
)
}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the page (with just minor edits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to remove SGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove? Or keep the update? (We had updated this section.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the removal of the table. Either way, LGTM.
@@ -352,13 +345,11 @@ Standard elements are those that are or were part of the HTML specification. Whi | |||
<figcaption>{{ figure_link(caption="Low probabilities of finding a given element in pages of the sample.", sheets_gid="184700688", sql_file="pages_element_count_by_device_and_element_type_present.sql") }}</figcaption> | |||
</figure> | |||
|
|||
{# TODO(authors): Interpet results. #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Background: Suggesting to skip this, or ask whether @catalinred or @iandevlin like to draft something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I’d look at @catalinred or @iandevlin whether they like to expand on this.
Other than that, as opposed to the lang section here I don’t agree. Unless you’re coming from an Almanac convention that requires every data point to be interpreted, I wouldn’t be convinced that data can’t stand by itself. I’d suggest that not only are our readers capable of drawing conclusions, but that this can even be refreshing from an editorial perspective, if not to suggest—“wait a minute; what’s this, what does this mean?”
If I misunderstand you, please let me know, otherwise I’d appreciate a bit of leeway here for us to decide on what also not to interpret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is we should be selective to what we put in the chapter. We have a lot of data - some of it will be interesting and some not. I think we need to pick and choose what to include to keep the chapter interesting.
The point of the almanac is "The Web Almanac is a comprehensive report on the state of the web, backed by real data and trusted web experts" rather than "a list of stats, with experts interpreting them".
Saying that, I think these particular stats ARE interesting. Why are they not used? Are they old elements which are not useful? Have they been replaced by better alternatives? Or are they new elements that haven't taken off yet?
Looking at MDN two if them are obsoleted (dir
and basefont
) and one (rb
) is Ruby specific - to me there's a question if that should really be a standard HTML element since it's Ruby specific? Ultimately, I'd never heard of these particular elements and had to look them up to figure this information out, so think a summary of this explaining this would be useful here to save other readers doing the same.
However, I do think this is the authors work, so if you still feel this is not necessary explanation and the stats stand on their own, then I can accept this. But think it's right for @rviscomi to at least ask these questions in case you hadn't considered that interpretation or were making assumptions as to what readers might know because you are HTML experts (and other readers might not be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to skip SGTM. Could you update the md to remove the content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually felt like I want to add info here. Added a paragraph. What do you think?
src/content/en/2020/markup.md
Outdated
@@ -80,8 +80,7 @@ A page's document size refers to the amount of HTML bytes transferred over the n | |||
* The largest document by far weighs 64.16 _MB_, almost deserving its own analysis and chapter in the Web Almanac. | |||
|
|||
{# TODO(analysts): Should 25,237 bytes be divided by 1000 or 1024 to convert to KB? 1000 seems to be used here but most chapters use 1024. Are the stats above also off? #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, 1024 should be used. I'm not confident on the above stats. I'd like to use an example we can prove. I think the problem was the bytesHtmlDoc which I think it more true to what we would call the document got cut off at 16,777,215.
I used Screaming Frog to crawl all the ones in the list, and double checked the big ones in Chrome. Drum roll...
https://www.linkshops.com/ contains a js bundle that is 28.1MB.
https://www.boonterm.com/web/index1.php has 22.8MB of html (embedded images) and references a 34.7MB mp4 file.
https://www.aci.edu.sg/ has 33.7MB of html and took me over a minute to load. Also embedded images.
@@ -121,8 +120,6 @@ Here are the 10 most popular (normalized) languages in our sample. At first we c | |||
<figcaption>{{ figure_link(caption="Top 10 <code>lang</code> attribute values.", sheets_gid="2047285366", sql_file="pages_almanac_by_device_and_html_lang.sql") }}</figcaption> | |||
</figure> | |||
|
|||
{# TODO(authors): Add an interpretation of the lang results. #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What conclusions would you like to see readers draw from this figure? If there's nothing to say about it, do we need it at all?
@@ -352,13 +345,11 @@ Standard elements are those that are or were part of the HTML specification. Whi | |||
<figcaption>{{ figure_link(caption="Low probabilities of finding a given element in pages of the sample.", sheets_gid="184700688", sql_file="pages_element_count_by_device_and_element_type_present.sql") }}</figcaption> | |||
</figure> | |||
|
|||
{# TODO(authors): Interpet results. #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
Signed-off-by: Jens Oliver Meiert <[email protected]>
chore: sync
chore: sync
Signed-off-by: Jens Oliver Meiert <[email protected]>
Signed-off-by: Jens Oliver Meiert <[email protected]>
Signed-off-by: Jens Oliver Meiert <[email protected]>
Signed-off-by: Jens Oliver Meiert <[email protected]>
chore: sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@j9t this is looking great and ready for the unedited: true
label to be removed. Can you do the honors?
@@ -352,13 +345,11 @@ Standard elements are those that are or were part of the HTML specification. Whi | |||
<figcaption>{{ figure_link(caption="Low probabilities of finding a given element in pages of the sample.", sheets_gid="184700688", sql_file="pages_element_count_by_device_and_element_type_present.sql") }}</figcaption> | |||
</figure> | |||
|
|||
{# TODO(authors): Interpet results. #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to skip SGTM. Could you update the md to remove the content?
@@ -121,8 +120,6 @@ Here are the 10 most popular (normalized) languages in our sample. At first we c | |||
<figcaption>{{ figure_link(caption="Top 10 <code>lang</code> attribute values.", sheets_gid="2047285366", sql_file="pages_almanac_by_device_and_html_lang.sql") }}</figcaption> | |||
</figure> | |||
|
|||
{# TODO(authors): Add an interpretation of the lang results. #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to remove SGTM.
Signed-off-by: Jens Oliver Meiert <[email protected]>
Signed-off-by: Jens Oliver Meiert <[email protected]>
Included another update. Checked the other TODOs but these may be okay right now. @catalinred, @iandevlin, @Tiggerito, do you have a chance to review those maybe later? Removed the “unedited” flag with this PR. |
@@ -121,8 +120,6 @@ Here are the 10 most popular (normalized) languages in our sample. At first we c | |||
<figcaption>{{ figure_link(caption="Top 10 <code>lang</code> attribute values.", sheets_gid="2047285366", sql_file="pages_almanac_by_device_and_html_lang.sql") }}</figcaption> | |||
</figure> | |||
|
|||
{# TODO(authors): Add an interpretation of the lang results. #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the removal of the table. Either way, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it should be 61.19MB, but what's a few bytes between friends.
Signed-off-by: Jens Oliver Meiert <[email protected]>
chore: sync
😅 Maybe I missed that. Updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to go! 🚀
Thanks @j9t and everyone for your help!
Progress on #899 #1432