Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put phrasing content into paragraphs #447

Merged
merged 2 commits into from
May 15, 2018
Merged

Put phrasing content into paragraphs #447

merged 2 commits into from
May 15, 2018

Conversation

davidar
Copy link
Contributor

@davidar davidar commented May 12, 2018

This removes the need for p.readability-styled elements.

@@ -1,5 +1,6 @@
<div id="readability-page-1" class="page">
<div id="textArea">
<p>Reviewed by <a onclick="return sl(this,'','prog-lnk');" href="http://www.webmd.com/hansa-bhargava">Hansa D. Bhargava, MD</a> </p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types of small divs etc. are now getting "promoted" to P tags, which means they no longer get cleaned conditionally. This has unfortunate effects e.g. in the wikipedia case, where all the "see also" text etc. now ends up as part of the main body text, and even gets used for the excerpt.

Is this just because we're dropping the classes immediately or something, or because the change from div to p means we no longer critically check the node? It would be nice not to regress this, because otherwise this looks like a positive change. :-)

Readability.js Outdated
@@ -134,8 +134,17 @@ Readability.prototype = {

DEPRECATED_SIZE_ATTRIBUTE_ELEMS: [ "TABLE", "TH", "TD", "HR", "PRE" ],

PHRASING_ELEMS: [
// "CANVAS", "IFRAME", "SVG", "VIDEO",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking they qualify as phrasing content, but readability tends to remove them when they're put into paragraphs (which was causing several test regressions). Should I add a comment to explain, or just remove that line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just adding a comment would be fine. Thanks!

@gijsk
Copy link
Contributor

gijsk commented May 14, 2018

This generally looks OK, but I'd like to avoid the re-inclusion of the "note" type <div>s in e.g. the webmd and wikipedia cases, if that's possible.

@davidar davidar force-pushed the p branch 2 times, most recently from c0144d8 to 8d43511 Compare May 15, 2018 08:20
@@ -787,29 +811,38 @@ Readability.prototype = {

// Turn all divs that don't have children block level elements into p's
if (node.tagName === "DIV") {
// Put phrasing content into paragraphs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this is basically finished now, besides the comment for the commented-out contents - the only thing I did notice is that we're putting <img> tags into <p> tags, even if there's nothing else in the <p> (or at least, nothing besides whitespace). Is that intentional? I guess I could be convinced either way, but I wonder if we could avoid it - it feels like it clutters up the markup without really achieving anything...

FWIW, I don't really want to block this from merging, so we could deal with that in a follow-up if you prefer. On the other hand, it might make the diff cleaner if we fix it here. Up to you and what you would prefer! :-)

Copy link
Contributor Author

@davidar davidar May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument in favour of putting standalone images in their own paragraph is that it makes it clear that it's not inline with the surrounding elements, but I guess that's largely a matter of personal preference. It should be easy to avoid if it's problematic though (happy to deal with it in a follow-up in that case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. Looking at it again, it seems we have styling that deals specifically with these cases anyway, https://searchfox.org/mozilla-central/rev/2b9779c59390ecc47be7a70d99753653d8eb5afc/toolkit/themes/shared/aboutReader.css#689-694 . So let's do this, then. Thanks again for the PR! :-)

davidar added 2 commits May 15, 2018 21:26
@gijsk gijsk merged commit 5ae9093 into mozilla:master May 15, 2018
@davidar davidar deleted the p branch May 16, 2018 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants