Finalize assignments: Chapter 3. Markup #5

rviscomi · 2019-05-20T22:53:23Z

Section	Chapter	Author	Reviewers
I. Page Content	3. Markup	@bkardell	@zcorpan

Due date: To help us stay on schedule, please complete the action items in this issue by June 3.

To do:

Assign subject matter expert (author)
Assign peer reviewers
Finalize metrics

Current list of metrics:

Deprecated elements
Popular elements
Custom elements (“slang”)
Attribute usage (stretch goal)
count of shadowRoots

👉AI (@bkardell): Assign peer reviewers. These are trusted experts who can support you when brainstorming metrics, interpreting results, and writing the report. Ideally this chapter will have 2 or more reviewers who can promote a diversity of perspectives.

👉 AI (@bkardell): Finalize which metrics you might like to include in an annual "state of markup" report powered by HTTP Archive. Community contributors have initially sketched out a few ideas to get the ball rolling, but it's up to you, the subject matter experts, to know exactly which metrics we should be looking at. You can use the brainstorming doc to explore ideas.

The metrics should paint a holistic, data-driven picture of the markup landscape. The HTTP Archive does have its limitations and blind spots, so if there are metrics out of scope it's still good to identify them now during the brainstorming phase. We can make a note of them in the final report so readers understand why they're not discussed and the HTTP Archive team can make an effort to improve our telemetry for next year's Almanac.

Next steps: Over the next couple of months analysts will write the queries and generate the results, then hand everything off to you to write up your interpretation of the data.

Additional resources:

rviscomi · 2019-05-23T20:38:06Z

@bkardell can you think of anyone who might be interested in reviewing this chapter?

jaredcwhite · 2019-05-24T01:47:42Z

I'm not sure if this is the best place to post a suggestion, but I think it would be very interesting to chronicle the advent of non-semantic HTML that's essentially auto-generated by build tools (think the project Twitter's using to take React Native code and build it for the web…resulting in div/span tag soup. Example here: https://twitter.com/jaredcwhite/status/1090283063320276992). This isn't new of course—I remember "tag soup" tool-generated HTML being a thing since the 90s, but it feels like that sort of went away in the HTML5 era and now it's rearing its (IMHO) ugly head again.

rviscomi · 2019-05-24T12:46:04Z

@zcorpan would you be interested in reviewing this chapter?

@jaredcwhite +1 I think that's a great idea. Is that something that would require looking back at older datasets or do you think it'd be sufficient to look at the current dataset and do something like measure the proportion of div/span against all tags? We're adding instrumentation in HTTPArchive/legacy.httparchive.org#159 to extract tags from the document, so this will only be something we can get easily going forward. Also, we'd love to have you as a reviewer if you're up for it!

zcorpan · 2019-05-24T12:59:10Z

Looking up context I checked the readme of this repo, and also found https://discuss.httparchive.org/t/planning-the-web-almanac-2019/1553

Exciting! I'm happy to review.

Is the thing that needs review written yet? If so, where? When does the review need to be done?

rviscomi · 2019-05-24T13:13:16Z

Great! Glad to have you on board. The current status is that we're planning which metrics to include in each chapter. If you have any ideas or feedback we'd love to hear them. Hoping to lock the metrics down by June 3. The writing phase will start in ~August.

zcorpan · 2019-05-24T15:07:17Z

Ideas:

Quirks mode
Character encoding

bkardell · 2019-05-25T21:17:04Z

Nice! I see that I never suggested @zcorpan here apparently, but I know I did somewhere! Glad this worked out as I think he'd have been a better author even 😁

zcorpan · 2019-06-03T23:43:34Z

Another idea for a metric, though I don't know if it would be difficult to implement:

Parse errors (by error code)

One way would be to add use counters for each parse error in Chromium's HTML parser. Another way (likely simpler) would be to run the response body through an HTML parser that can log parse errors.

rviscomi · 2019-06-03T23:46:49Z

Yeah that sounds good. Use counters would actually be the easiest thing, from the analysis perspective. Any kind of secondary data pass would be adding a lot of complexity.

rviscomi · 2019-06-03T23:54:09Z

@bkardell hoping to have the list of metrics finalized today. Could you take one last pass through the list of metrics (here or in the doc) and update it with whatever we're missing? We're aiming for 10+ metrics for each chapter.

zcorpan · 2019-06-04T07:17:47Z

OK, though it might not be trivial to add error reporting to Chromium's HTML parser since it currently doesn't care about any errors (I believe). For example, it could require adding new branches in the state machine to be able to differentiate between things that are errors vs non-errors but otherwise have the same effect. So there's a risk of regressing performance and risk of introducing new bugs to the parser, on top of just implementing the error reporting.

To make it more worthwhile, maybe we could check if browser devtools would also want to make use of HTML parse errors? (Firefox highlights parse errors in View Source, but doesn't show them in devtools, AFAIK.)

rviscomi · 2019-06-06T17:17:19Z

@bkardell could you sign off on this chapter? If you think we have enough metrics and reviewers you can close this issue.

zcorpan · 2019-06-06T22:07:05Z

I filed https://bugs.chromium.org/p/chromium/issues/detail?id=971851 about implementing HTML parse error reporting.

rviscomi · 2019-06-06T22:10:37Z

Thanks @zcorpan!

bkardell · 2019-06-07T01:26:02Z

lgtm.

raghuramakrishnan71 · 2019-06-28T18:02:10Z

@zcorpan @bkardell Does the metric "Attribute usage (stretch goal)" (Page Content/Markup) refer to the usage of HTML attributes. In that case we may be able to find out distribution (https://discuss.httparchive.org/t/usage-of-aria-attributes/778)
The same was marked as "Custom Metric Required" as I was not very clear initially.

raghuramakrishnan71 · 2019-06-29T18:36:13Z

@rviscomi @zcorpan @bkardell going ahead (as the July crawl is scheduled to start) with the "list of valid HTML attributes" interpretation for the metric "Attribute usage (stretch goal)" (Page Content/Markup)". In that case, we should be able to extract from response_bodies.
Example:

bkardell · 2019-06-29T20:38:40Z

The new custom metrics for markup collect data from the parsed tree is the thing so you wind up with far less data to deal with and far more accurate than a regexp across HTML. We had discussed whether somehow it would make sense to do the same for attributes but given all the questions people are interested in asking about them and their relationship to tags or urls or parent elements or.. Whatever.. Even what to collect was unclear. I had proposed a few potentials I think but I believe we were worried about this exploding the size, defeating the purpose or just being not that useful.

raghuramakrishnan71 · 2019-06-30T10:16:26Z

@rviscomi was a bit of struggle to understand adding custom metric but was able to finally test a sample/almanac.js using instructions in #33.
@bkardell in the custom metric are we looking only at various attributes and their counts or various element+attribute and their counts - you mentioned that you had proposed a few potentials, are the same in some other thread? Not sure i will be able to include them before the July crawl kicks in. but can give it a try.

rviscomi · 2019-06-30T13:08:46Z

Some more context here: https://discuss.httparchive.org/t/use-of-custom-elements-with-attributes/1592 and here: HTTPArchive/httparchive.org#138

I'd recommend marking this one as Not Feasible due to the complexity.

@allemas

* start traduction * process trad * # This is a combination of 9 commits. # This is the 1st commit message: update # The commit message #2 will be skipped: # review # The commit message #3 will be skipped: # review #2 # The commit message #4 will be skipped: # advance # The commit message #5 will be skipped: # update # The commit message #6 will be skipped: # update translation # The commit message #7 will be skipped: # update # The commit message #8 will be skipped: # update # # update # The commit message #9 will be skipped: # update * First quick review (typofixes, translating alternatives) * Preserve original line numbers To facilitate the review of original text vs. translation side-by-side. Also: microtypo fixes. * Review => l338 * End of fine review * Adding @allemas to translators * Rename mise-en-cache to caching * final updates * update accessibility * merge line * Update src/content/fr/2019/caching.md Co-Authored-By: Barry Pollard <[email protected]> * Update src/content/fr/2019/caching.md If it's not too much effort, could you also fix this in the English version as part of this PR as looks wrong there: 6% of requests have a time to time (TTL) should be: 6% of requests have a Time to Live (TTL) Co-Authored-By: Barry Pollard <[email protected]> * Update src/content/fr/2019/caching.md Do we need to state that all the directives are English language terms or is that overkill? If so need to check this doesn't mess up the markdown->HTML script. Co-Authored-By: Barry Pollard <[email protected]> Co-authored-by: Boris SCHAPIRA <[email protected]> Co-authored-by: Barry Pollard <[email protected]>

@allemas

* start traduction * process trad * # This is a combination of 9 commits. # This is the 1st commit message: update # The commit message #2 will be skipped: # review # The commit message #3 will be skipped: # review #2 # The commit message #4 will be skipped: # advance # The commit message #5 will be skipped: # update # The commit message #6 will be skipped: # update translation # The commit message #7 will be skipped: # update # The commit message #8 will be skipped: # update # # update # The commit message #9 will be skipped: # update * First quick review (typofixes, translating alternatives) * Preserve original line numbers To facilitate the review of original text vs. translation side-by-side. Also: microtypo fixes. * Review => l338 * End of fine review * Adding @allemas to translators * Rename mise-en-cache to caching * final updates * update accessibility * merge line * Update src/content/fr/2019/caching.md Co-Authored-By: Barry Pollard <[email protected]> * Update src/content/fr/2019/caching.md If it's not too much effort, could you also fix this in the English version as part of this PR as looks wrong there: 6% of requests have a time to time (TTL) should be: 6% of requests have a Time to Live (TTL) Co-Authored-By: Barry Pollard <[email protected]> * Update src/content/fr/2019/caching.md Do we need to state that all the directives are English language terms or is that overkill? If so need to check this doesn't mess up the markdown->HTML script. Co-Authored-By: Barry Pollard <[email protected]> * generate caching content in french * Update src/content/fr/2019/caching.md Co-Authored-By: Barry Pollard <[email protected]> * Update src/content/fr/2019/caching.md Co-Authored-By: Barry Pollard <[email protected]> Co-authored-by: Boris SCHAPIRA <[email protected]> Co-authored-by: Barry Pollard <[email protected]>

rviscomi assigned bkardell May 20, 2019

rviscomi transferred this issue from HTTPArchive/httparchive.org May 21, 2019

rviscomi added this to the Chapter planning complete milestone May 21, 2019

rviscomi changed the title ~~[Web Almanac] Finalize assignments: Chapter 3. Markup~~ Finalize assignments: Chapter 3. Markup May 21, 2019

rviscomi mentioned this issue May 23, 2019

Assign subject matter experts and peer reviewers to each chapter #2

Closed

rviscomi added the ASAP This issue is blocking progress label Jun 6, 2019

bkardell closed this as completed Jun 7, 2019

rviscomi removed ASAP This issue is blocking progress labels Jun 7, 2019

rviscomi mentioned this issue Jun 28, 2019

Finalize assignments: Chapter 7. Performance #9

Closed

3 tasks

rviscomi mentioned this issue Jul 23, 2019

Query metrics: Chapter 3. Markup #84

Closed

4 tasks

rviscomi mentioned this issue Sep 8, 2019

Write content: Chapter 3. Markup #133

Closed

3 tasks

tunetheweb mentioned this issue Jul 3, 2020

Markup 2020 #899

Closed

10 tasks

gregorywolf mentioned this issue Sep 12, 2020

HTTP/2 2020 queries #1098

Merged

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finalize assignments: Chapter 3. Markup #5

Finalize assignments: Chapter 3. Markup #5

rviscomi commented May 20, 2019 •

edited by bkardell

Loading

rviscomi commented May 23, 2019

jaredcwhite commented May 24, 2019

rviscomi commented May 24, 2019 •

edited

Loading

zcorpan commented May 24, 2019

rviscomi commented May 24, 2019

zcorpan commented May 24, 2019

bkardell commented May 25, 2019

zcorpan commented Jun 3, 2019

rviscomi commented Jun 3, 2019

rviscomi commented Jun 3, 2019 •

edited

Loading

zcorpan commented Jun 4, 2019

rviscomi commented Jun 6, 2019

zcorpan commented Jun 6, 2019

rviscomi commented Jun 6, 2019

bkardell commented Jun 7, 2019

raghuramakrishnan71 commented Jun 28, 2019

raghuramakrishnan71 commented Jun 29, 2019 •

edited

Loading

bkardell commented Jun 29, 2019

raghuramakrishnan71 commented Jun 30, 2019

rviscomi commented Jun 30, 2019

Finalize assignments: Chapter 3. Markup #5

Finalize assignments: Chapter 3. Markup #5

Comments

rviscomi commented May 20, 2019 • edited by bkardell Loading

rviscomi commented May 23, 2019

jaredcwhite commented May 24, 2019

rviscomi commented May 24, 2019 • edited Loading

zcorpan commented May 24, 2019

rviscomi commented May 24, 2019

zcorpan commented May 24, 2019

bkardell commented May 25, 2019

zcorpan commented Jun 3, 2019

rviscomi commented Jun 3, 2019

rviscomi commented Jun 3, 2019 • edited Loading

zcorpan commented Jun 4, 2019

rviscomi commented Jun 6, 2019

zcorpan commented Jun 6, 2019

rviscomi commented Jun 6, 2019

bkardell commented Jun 7, 2019

raghuramakrishnan71 commented Jun 28, 2019

raghuramakrishnan71 commented Jun 29, 2019 • edited Loading

bkardell commented Jun 29, 2019

raghuramakrishnan71 commented Jun 30, 2019

rviscomi commented Jun 30, 2019

rviscomi commented May 20, 2019 •

edited by bkardell

Loading

rviscomi commented May 24, 2019 •

edited

Loading

rviscomi commented Jun 3, 2019 •

edited

Loading

raghuramakrishnan71 commented Jun 29, 2019 •

edited

Loading