-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put phrasing content into paragraphs #447
Conversation
@@ -1,5 +1,6 @@ | |||
<div id="readability-page-1" class="page"> | |||
<div id="textArea"> | |||
<p>Reviewed by <a onclick="return sl(this,'','prog-lnk');" href="http://www.webmd.com/hansa-bhargava">Hansa D. Bhargava, MD</a> </p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These types of small divs etc. are now getting "promoted" to P
tags, which means they no longer get cleaned conditionally. This has unfortunate effects e.g. in the wikipedia case, where all the "see also" text etc. now ends up as part of the main body text, and even gets used for the excerpt.
Is this just because we're dropping the classes immediately or something, or because the change from div
to p
means we no longer critically check the node? It would be nice not to regress this, because otherwise this looks like a positive change. :-)
Readability.js
Outdated
@@ -134,8 +134,17 @@ Readability.prototype = { | |||
|
|||
DEPRECATED_SIZE_ATTRIBUTE_ELEMS: [ "TABLE", "TH", "TD", "HR", "PRE" ], | |||
|
|||
PHRASING_ELEMS: [ | |||
// "CANVAS", "IFRAME", "SVG", "VIDEO", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking they qualify as phrasing content, but readability tends to remove them when they're put into paragraphs (which was causing several test regressions). Should I add a comment to explain, or just remove that line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just adding a comment would be fine. Thanks!
This generally looks OK, but I'd like to avoid the re-inclusion of the "note" type |
c0144d8
to
8d43511
Compare
@@ -787,29 +811,38 @@ Readability.prototype = { | |||
|
|||
// Turn all divs that don't have children block level elements into p's | |||
if (node.tagName === "DIV") { | |||
// Put phrasing content into paragraphs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this is basically finished now, besides the comment for the commented-out contents - the only thing I did notice is that we're putting <img>
tags into <p>
tags, even if there's nothing else in the <p>
(or at least, nothing besides whitespace). Is that intentional? I guess I could be convinced either way, but I wonder if we could avoid it - it feels like it clutters up the markup without really achieving anything...
FWIW, I don't really want to block this from merging, so we could deal with that in a follow-up if you prefer. On the other hand, it might make the diff cleaner if we fix it here. Up to you and what you would prefer! :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument in favour of putting standalone images in their own paragraph is that it makes it clear that it's not inline with the surrounding elements, but I guess that's largely a matter of personal preference. It should be easy to avoid if it's problematic though (happy to deal with it in a follow-up in that case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. Looking at it again, it seems we have styling that deals specifically with these cases anyway, https://searchfox.org/mozilla-central/rev/2b9779c59390ecc47be7a70d99753653d8eb5afc/toolkit/themes/shared/aboutReader.css#689-694 . So let's do this, then. Thanks again for the PR! :-)
This removes the need for `p.readability-styled` elements.
This removes the need for
p.readability-styled
elements.