Skip to content

Commit

Permalink
Don't convert DIVs with more links than text to Ps
Browse files Browse the repository at this point in the history
  • Loading branch information
davidar committed May 15, 2018
1 parent f4ee380 commit c0144d8
Show file tree
Hide file tree
Showing 13 changed files with 85 additions and 46 deletions.
2 changes: 1 addition & 1 deletion Readability.js
Original file line number Diff line number Diff line change
Expand Up @@ -835,7 +835,7 @@ Readability.prototype = {
// element. DIVs with only a P element inside and no text content can be
// safely converted into plain P elements to avoid confusing the scoring
// algorithm with DIVs with are, in practice, paragraphs.
if (this._hasSinglePInsideElement(node)) {
if (this._hasSinglePInsideElement(node) && this._getLinkDensity(node) < 0.5) {
var newNode = node.children[0];
node.parentNode.replaceChild(newNode, node);
node = newNode;
Expand Down
4 changes: 1 addition & 3 deletions test/test-pages/ars-1/expected.html
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
<div id="readability-page-1" class="page">
<div itemprop="articleBody">
<figure> <img src="http://cdn.arstechnica.net/wp-content/uploads/2015/04/server-crash-640x426.jpg" width="640" height="331" />
<figcaption class="caption">
<p><a rel="nofollow" href="https://en.wikipedia.org/wiki/Kernel_panic#/media/File:Kernel-panic.jpg">Kevin</a></p>
</figcaption>
<figcaption class="caption"> </figcaption>
</figure>
<p>A flaw in the wildly popular online game <em>Minecraft</em> makes it easy for just about anyone to crash the server hosting the game, according to a computer programmer who has released proof-of-concept code that exploits the vulnerability.</p>
<p>"I thought a lot before writing this post," Pakistan-based developer Ammar Askar wrote in a <a href="http://blog.ammaraskar.com/minecraft-vulnerability-advisory">blog post published Thursday</a>, 21 months, he said, after privately reporting the bug to <em>Minecraft</em> developer Mojang. "On the one hand I don't want to expose thousands of servers to a major vulnerability, yet on the other hand Mojang has failed to act on it."</p>
Expand Down
14 changes: 13 additions & 1 deletion test/test-pages/cnn/expected.html
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,18 @@
<h2>The U.S. has long been heralded as a land of opportunity -- a place where anyone can succeed regardless of the economic class they were born into.</h2>
<p> But a new report released on Monday by <a href="http://web.stanford.edu/group/scspi-dev/cgi-bin/" target="_blank">Stanford University's Center on Poverty and Inequality</a> calls that into question. </p>
<p> The report assessed poverty levels, income and wealth inequality, economic mobility and unemployment levels among 10 wealthy countries with social welfare programs. </p>
<div id="smartassetcontainer">
<div>
<div>
<div id="smartasset-article">
<div>
<p> Powered by SmartAsset.com </p>
<p><img src="https://smrt.as/ck" /> </p>
</div>
</div>
</div>
</div>
</div>
<p> Among its key findings: the class you're born into matters much more in the U.S. than many of the other countries. </p>
<p> As the <a href="http://web.stanford.edu/group/scspi-dev/cgi-bin/publications/state-union-report" target="_blank">report states</a>: "[T]he birth lottery matters more in the U.S. than in most well-off countries." </p>
<p> But this wasn't the only finding that suggests the U.S. isn't quite living up to its reputation as a country where everyone has an equal chance to get ahead through sheer will and hard work. </p>
Expand All @@ -37,6 +49,6 @@ <h2>The U.S. has long been heralded as a land of opportunity -- a place where an
<p> The low ranking the U.S. received was due to its extreme levels of wealth and income inequality and the ineffectiveness of its "safety net" -- social programs aimed at reducing poverty. </p>
<p> <a href="http://money.cnn.com/2016/01/05/news/economy/chicago-segregated/index.html?iid=EL"><span>Related: Chicago is America's most segregated city</span></a> </p>
<p> The report concluded that the American safety net was ineffective because it provides only half the financial help people need. Additionally, the levels of assistance in the U.S. are generally lower than in other countries. </p>
<p> <span> CNNMoney (New York) </span> <span>First published February 1, 2016: 1:28 AM ET</span> </p>
<p><span> CNNMoney (New York) </span> <span>First published February 1, 2016: 1:28 AM ET</span> </p>
</div>
</div>
54 changes: 36 additions & 18 deletions test/test-pages/ehow-2/expected.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,25 @@
<header>
<div data-type="AuthorProfile">
<div>
<p> <a id="img-follow-tip" href="http://fakehost/contributor/gina_robertsgrey/" target="_top">
<p><a id="img-follow-tip" href="http://fakehost/contributor/gina_robertsgrey/" target="_top">
<img src="http://img-aws.ehowcdn.com/60x60/cme/cme_public_images/www_demandstudios_com/sitelife.studiod.com/ver1.0/Content/images/store/9/2/d9dd6f61-b183-4893-927f-5b540e45be91.Small.jpg" data-failover="//img-aws.ehowcdn.com/60x60/ehow-cdn-assets/test15/media/images/authors/missing-author-image.png" onerror="var failover = this.getAttribute('data-failover');
if (failover) failover = failover.replace(/^https?:/,'');
var src = this.src ? this.src.replace(/^https?:/,'') : '';
if (src != failover){
this.src = failover;
}"/> </a> </p>
</div>
<p> <time datetime="2016-09-14T07:07:00-04:00" itemprop="dateModified">Last updated September 14, 2016</time> </p>
<div id="author_powertip" data-author-url="/contributor/gina_robertsgrey/">
<p><a href="http://fakehost/contributor/gina_robertsgrey/" target="_top">
<img src="http://img-aws.ehowcdn.com/60x60/cme/cme_public_images/www_demandstudios_com/sitelife.studiod.com/ver1.0/Content/images/store/9/2/d9dd6f61-b183-4893-927f-5b540e45be91.Small.jpg" data-failover="//img-aws.ehowcdn.com/60x60/ehow-cdn-assets/test15/media/images/authors/missing-author-image.png" onerror="var failover = this.getAttribute('data-failover');
if (failover) failover = failover.replace(/^https?:/,'');
var src = this.src ? this.src.replace(/^https?:/,'') : '';
if (src != failover){
this.src = failover;
}"/> </a> </p>
<p>Follow</p>
</div>
<p><time datetime="2016-09-14T07:07:00-04:00" itemprop="dateModified">Last updated September 14, 2016</time> </p>
</div>
</header>
<div>
Expand All @@ -28,17 +38,19 @@
</div>
</div> <span>
<span>
<div><div><p>
<span><p>Parties hosted at restaurants, clubhouses and country clubs eliminate the need to spend hours cleaning up once party guests have gone home. But that convenience comes with a price tag. A country club may charge as much as $2,000 for room rental and restaurant food and beverage will almost always cost more than food prepped and served at home.</p></span> </p>
<div>
<div>
<p><span><p>Parties hosted at restaurants, clubhouses and country clubs eliminate the need to spend hours cleaning up once party guests have gone home. But that convenience comes with a price tag. A country club may charge as much as $2,000 for room rental and restaurant food and beverage will almost always cost more than food prepped and served at home.</p></span> </p>
<figure> <img src="http://img-aws.ehowcdn.com/640/cme/cme_public_images/www_ehow_com/cdn-write.demandstudios.com/upload/image/FE/CB/121569D2-6984-4B2F-83C4-9D2D9A27CBFE/121569D2-6984-4B2F-83C4-9D2D9A27CBFE.jpg" alt="Save money hosting the party at home." data-credit="Thomas Jackson/Digital Vision/Getty Images" data-pin-ehow-hover="true" data-pin-no-hover="true" /> </figure>
<figcaption class="caption"> Thomas Jackson/Digital Vision/Getty Images </figcaption>
</div>
</div>
</span>
</span> <span>
<span>
<div><div><p>
<span><p>Instead of hiring a DJ, use your iPod or Smartphone to spin the tunes. Both easily hook up to most speakers or mp3 compatible docks to play music from your music library. Or download Pandora, the free online radio app, and play hours of music for free.</p>
<div>
<div>
<p><span><p>Instead of hiring a DJ, use your iPod or Smartphone to spin the tunes. Both easily hook up to most speakers or mp3 compatible docks to play music from your music library. Or download Pandora, the free online radio app, and play hours of music for free.</p>
<p>Personalize the music with a playlist of the grad’s favorite songs or songs that were big hits during his or her years in school.</p></span> </p>
<figure> <img src="http://img-aws.ehowcdn.com/640/cme/cme_public_images/www_ehow_com/cdn-write.demandstudios.com/upload/image/DF/FC/A05B0252-BD73-4BC7-A09A-96F0A504FCDF/A05B0252-BD73-4BC7-A09A-96F0A504FCDF.jpg" alt="Online radio can take the place of a hired DJ." data-credit="Spencer Platt/Getty Images News/Getty Images" data-pin-ehow-hover="true" data-pin-no-hover="true" /> </figure>
<figcaption class="caption"> Spencer Platt/Getty Images News/Getty Images </figcaption>
Expand All @@ -47,35 +59,39 @@
</span>
</span> <span>
<span>
<div><div><p>
<span><p>Avoid canned drinks, which guests often open, but don't finish. Serve pitchers of tap water with lemon and cucumber slices or sliced strawberries for an interesting and refreshing flavor. Opt for punches and non-alcoholic drinks for high school graduates that allow guests to dole out the exact amount they want to drink.</p></span> </p>
<div>
<div>
<p><span><p>Avoid canned drinks, which guests often open, but don't finish. Serve pitchers of tap water with lemon and cucumber slices or sliced strawberries for an interesting and refreshing flavor. Opt for punches and non-alcoholic drinks for high school graduates that allow guests to dole out the exact amount they want to drink.</p></span> </p>
<figure> <img src="http://img-aws.ehowcdn.com/640/cme/cme_public_images/www_ehow_com/cdn-write.demandstudios.com/upload/image/EB/DB/8A04CCA7-3255-4225-B59A-C41441F8DBEB/8A04CCA7-3255-4225-B59A-C41441F8DBEB.jpg" alt="Serve drinks in pitchers, not in cans." data-credit="evgenyb/iStock/Getty Images" data-pin-ehow-hover="true" data-pin-no-hover="true" /> </figure>
<figcaption class="caption"> evgenyb/iStock/Getty Images </figcaption>
</div>
</div>
</span>
</span> <span>
<span>
<div><div><p>
<span><p>Instead of inviting everyone you – and the graduate – know or ever knew, scale back the guest list. Forgo inviting guests that you or your grad haven't seen for eons. There is no reason to provide provisions for people who are essentially out of your lives. Sticking to a small, but personal, guest list allows more time to mingle with loved ones during the party, too.</p></span> </p>
<div>
<div>
<p><span><p>Instead of inviting everyone you – and the graduate – know or ever knew, scale back the guest list. Forgo inviting guests that you or your grad haven't seen for eons. There is no reason to provide provisions for people who are essentially out of your lives. Sticking to a small, but personal, guest list allows more time to mingle with loved ones during the party, too.</p></span> </p>
<figure> <img src="http://img-aws.ehowcdn.com/640/cme/cme_public_images/www_ehow_com/cdn-write.demandstudios.com/upload/image/94/10/08035476-0167-4A03-AADC-13A7E7AA1094/08035476-0167-4A03-AADC-13A7E7AA1094.jpg" alt="Limit guests to those close to the graduate." data-credit="Kane Skennar/Photodisc/Getty Images" data-pin-ehow-hover="true" data-pin-no-hover="true" /> </figure>
<figcaption class="caption"> Kane Skennar/Photodisc/Getty Images </figcaption>
</div>
</div>
</span>
</span> <span>
<span>
<div><div><p>
<span><p>See if your grad and his best friend, girlfriend or close family member would consider hosting a joint party. You can split some of the expenses, especially when the two graduates share mutual friends. You'll also have another parent to bounce ideas off of and to help you stick to your budget when you're tempted to splurge.</p></span> </p>
<div>
<div>
<p><span><p>See if your grad and his best friend, girlfriend or close family member would consider hosting a joint party. You can split some of the expenses, especially when the two graduates share mutual friends. You'll also have another parent to bounce ideas off of and to help you stick to your budget when you're tempted to splurge.</p></span> </p>
<figure> <img src="http://img-aws.ehowcdn.com/640/cme/cme_public_images/www_ehow_com/cdn-write.demandstudios.com/upload/image/06/49/4AD62696-FC95-4DA2-8351-42740C7B4906/4AD62696-FC95-4DA2-8351-42740C7B4906.jpg" alt="Throw a joint bash for big savings." data-credit="Mike Watson Images/Moodboard/Getty" data-pin-ehow-hover="true" data-pin-no-hover="true" /> </figure>
<figcaption class="caption"> Mike Watson Images/Moodboard/Getty </figcaption>
</div>
</div>
</span>
</span> <span>
<span>
<div><div><p>
<span><p>Skip carving stations of prime rib and jumbo shrimp as appetizers, especially for high school graduation parties. Instead, serve some of the graduate's favorite side dishes that are cost effective, like a big pot of spaghetti with breadsticks. Opt for easy and simple food such as pizza, finger food and mini appetizers. </p>
<div>
<div>
<p><span><p>Skip carving stations of prime rib and jumbo shrimp as appetizers, especially for high school graduation parties. Instead, serve some of the graduate's favorite side dishes that are cost effective, like a big pot of spaghetti with breadsticks. Opt for easy and simple food such as pizza, finger food and mini appetizers. </p>
<p>Avoid pre-packaged foods and pre-made deli platters. These can be quite costly. Instead, make your own cheese and deli platters for less than half the cost of pre-made.</p></span> </p>
<figure> <img src="http://img-aws.ehowcdn.com/640/cme/cme_public_images/www_ehow_com/cdn-write.demandstudios.com/upload/image/D0/51/B6AED06C-5E19-4A26-9AAD-0E175F6251D0/B6AED06C-5E19-4A26-9AAD-0E175F6251D0.jpg" alt="Cost effective appetizers are just as satisfying as pre-made deli platters." data-credit="Mark Stout/iStock/Getty Images" data-pin-ehow-hover="true" data-pin-no-hover="true" /> </figure>
<figcaption class="caption"> Mark Stout/iStock/Getty Images </figcaption>
Expand All @@ -84,17 +100,19 @@
</span>
</span> <span>
<span>
<div><div><p>
<span><p>Instead of an evening dinner party, host a grad lunch or all appetizers party. Brunch and lunch fare or finger food costs less than dinner. Guests also tend to consume less alcohol in the middle of the day, which keeps cost down.</p></span> </p>
<div>
<div>
<p><span><p>Instead of an evening dinner party, host a grad lunch or all appetizers party. Brunch and lunch fare or finger food costs less than dinner. Guests also tend to consume less alcohol in the middle of the day, which keeps cost down.</p></span> </p>
<figure> <img src="http://img-aws.ehowcdn.com/640/cme/cme_public_images/www_ehow_com/cdn-write.demandstudios.com/upload/image/35/B4/DD5FD05A-B631-4AFE-BC8F-FDACAD1EB435/DD5FD05A-B631-4AFE-BC8F-FDACAD1EB435.jpg" alt="A brunch gathering will cost less than a dinner party." data-credit="Mark Stout/iStock/Getty Images" data-pin-ehow-hover="true" data-pin-no-hover="true" /> </figure>
<figcaption class="caption"> Mark Stout/iStock/Getty Images </figcaption>
</div>
</div>
</span>
</span> <span>
<span>
<div><div><p>
<span><p>Decorate your party in the graduate's current school colors or the colors of the school he or she will be headed to next. Décor that is not specifically graduation-themed may cost a bit less, and any leftovers can be re-used for future parties, picnics and events.</p></span> </p>
<div>
<div>
<p><span><p>Decorate your party in the graduate's current school colors or the colors of the school he or she will be headed to next. Décor that is not specifically graduation-themed may cost a bit less, and any leftovers can be re-used for future parties, picnics and events.</p></span> </p>
<figure> <img src="http://img-aws.ehowcdn.com/640/cme/cme_public_images/www_ehow_com/cdn-write.demandstudios.com/upload/image/A1/FA/2C368B34-8F6A-45F6-9DFC-0B0C4E33FAA1/2C368B34-8F6A-45F6-9DFC-0B0C4E33FAA1.jpg" alt="Theme the party by color without graduation-specific decor." data-credit="jethuynh/iStock/Getty Images" data-pin-ehow-hover="true" data-pin-no-hover="true" /> </figure>
<figcaption class="caption"> jethuynh/iStock/Getty Images </figcaption>
</div>
Expand Down
12 changes: 12 additions & 0 deletions test/test-pages/engadget/expected.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,18 @@
<p>
<h2> But only hardcore gamers will appreciate it. </h2>
</p>
<div>
<div>
<div>
<div>
<p><a href="http://fakehost/about/editors/devindra-hardawar/">
<img src="https://o.aolcdn.com/images/dims?thumbnail=45%2C45&amp;quality=80&amp;image_uri=http%3A%2F%2Fwww.blogcdn.com%2Fwww.engadget.com%2Fmedia%2F2016%2F03%2Fdevindra-engadget-headshot-small.jpg&amp;client=cbc79c14efcebee57402&amp;signature=e6ffba7468c380581b6589a70ce5d7c1ec40cd1d"/>
</a></p>
</div>
</div>
<p><span>2192</span> <span>Shares</span></p>
</div>
</div>
</header>
<div data-behavior="BreakoutsHandler">
<div>
Expand Down
1 change: 0 additions & 1 deletion test/test-pages/herald-sun-1/expected.html
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
<p>They held meetings with executives from News Corporation and Fairfax, representatives of the TV networks, the ABC top brass and a group from the media union and the Walkley journalism foundation. I was involved as a member of the Walkley board.</p>
<p>The initiative, from Tony Abbott’s office, is evidence that the Government has been alarmed by the strength of criticism from media of the Data Retention Bill it wants passed before Parliament rises in a fortnight. Bosses, journalists, even the Press Council, are up in arms, not only over this measure, but also over aspects of two earlier pieces of national security legislation that interfere with the ability of the media to hold government to account.</p>
<div id="read-more">
<p><a href="">Read more</a> </p>
<div id="read-more-content">
<p>The Bill would require telecommunications service providers to store so-called “metadata” — the who, where, when and how of a communication, but not its content — for two years so security and law enforcement agencies can access it without warrant. Few would argue against the use of such material to catch criminals or terrorists. But, as Parliament’s Joint Committee on Intelligence and Security has pointed out, it would also be used “for the purpose of determining the identity of a journalist’s sources”.</p>
<p>And that should ring warning bells for anyone genuinely concerned with the health of our democracy. Without the ability to protect the identity of sources, journalists would be greatly handicapped in exposing corruption, dishonesty, waste, incompetence and misbehaviour by public officials.</p>
Expand Down
Loading

0 comments on commit c0144d8

Please sign in to comment.