From 8aa70ceccdb32369057de3cd94e488e9e0ef478b Mon Sep 17 00:00:00 2001 From: karinashin Date: Tue, 12 Apr 2022 17:23:46 -0500 Subject: [PATCH] new files --- stopWords.txt | 635 ++++++++++++++++++++++++++++++++++++++ subset/blogs_0000002.json | 1 + subset/blogs_0000010.json | 1 + subset/blogs_0000048.json | 1 + 4 files changed, 638 insertions(+) create mode 100644 stopWords.txt create mode 100644 subset/blogs_0000002.json create mode 100644 subset/blogs_0000010.json create mode 100644 subset/blogs_0000048.json diff --git a/stopWords.txt b/stopWords.txt new file mode 100644 index 0000000..ffd6135 --- /dev/null +++ b/stopWords.txt @@ -0,0 +1,635 @@ +able +about +above +abroad +according +accordingly +across +actually +adj +after +afterwards +again +against +ago +ahead +ain't +all +allow +allows +almost +alone +along +alongside +already +also +although +always +am +amid +amidst +among +amongst +an +and +another +any +anybody +anyhow +anyone +anything +anyway +anyways +anywhere +apart +appear +appreciate +appropriate +are +aren't +around +as +a's +aside +ask +asking +associated +at +available +away +awfully +back +backward +backwards +be +became +because +become +becomes +becoming +been +before +beforehand +begin +behind +being +believe +below +beside +besides +best +better +between +beyond +both +brief +but +by +came +can +cannot +cant +can't +caption +cause +causes +certain +certainly +changes +clearly +c'mon +co +co. +com +come +comes +concerning +consequently +consider +considering +contain +containing +contains +corresponding +could +couldn't +course +c's +currently +dare +daren't +definitely +described +despite +did +didn't +different +directly +do +does +doesn't +doing +done +don't +down +downwards +during +each +edu +eg +eight +eighty +either +else +elsewhere +end +ending +enough +entirely +especially +et +etc +even +ever +evermore +every +everybody +everyone +everything +everywhere +ex +exactly +example +except +fairly +far +farther +few +fewer +fifth +first +five +followed +following +follows +for +forever +former +formerly +forth +forward +found +four +from +further +furthermore +get +gets +getting +given +gives +go +goes +going +gone +got +gotten +greetings +had +hadn't +half +happens +hardly +has +hasn't +have +haven't +having +he +he'd +he'll +hello +help +hence +her +here +hereafter +hereby +herein +here's +hereupon +hers +herself +he's +hi +him +himself +his +hither +hopefully +how +howbeit +however +hundred +i'd +ie +if +ignored +i'll +i'm +immediate +in +inasmuch +inc +inc. +indeed +indicate +indicated +indicates +inner +inside +insofar +instead +into +inward +is +isn't +it +it'd +it'll +its +it's +itself +i've +just +k +keep +keeps +kept +know +known +knows +last +lately +later +latter +latterly +least +less +lest +let +let's +like +liked +likely +likewise +little +look +looking +looks +low +lower +ltd +made +mainly +make +makes +many +may +maybe +mayn't +me +mean +meantime +meanwhile +merely +might +mightn't +mine +minus +miss +more +moreover +most +mostly +mr +mrs +much +must +mustn't +my +myself +name +namely +nd +near +nearly +necessary +need +needn't +needs +neither +never +neverf +neverless +nevertheless +new +next +nine +ninety +no +nobody +non +none +nonetheless +noone +no-one +nor +normally +not +nothing +notwithstanding +novel +now +nowhere +obviously +of +off +often +oh +ok +okay +old +on +once +one +ones +one's +only +onto +opposite +or +other +others +otherwise +ought +oughtn't +our +ours +ourselves +out +outside +over +overall +own +particular +particularly +past +per +perhaps +placed +please +plus +possible +presumably +probably +provided +provides +que +quite +qv +rather +rd +re +really +reasonably +recent +recently +regarding +regardless +regards +relatively +respectively +right +round +said +same +saw +say +saying +says +second +secondly +see +seeing +seem +seemed +seeming +seems +seen +self +selves +sensible +sent +serious +seriously +seven +several +shall +shan't +she +she'd +she'll +she's +should +shouldn't +since +six +so +some +somebody +someday +somehow +someone +something +sometime +sometimes +somewhat +somewhere +soon +sorry +specified +specify +specifying +still +sub +such +sup +sure +take +taken +taking +tell +tends +th +than +thank +thanks +thanx +that +that'll +thats +that's +that've +the +their +theirs +them +themselves +then +thence +there +thereafter +thereby +there'd +therefore +therein +there'll +there're +theres +there's +thereupon +there've +these +they +they'd +they'll +they're +they've +thing +things +think +third +thirty +this +thorough +thoroughly +those +though +three +through +throughout +thru +thus +till +to +together +too +took +toward +towards +tried +tries +truly +try +trying +t's +twice +two +un +under +underneath +undoing +unfortunately +unless +unlike +unlikely +until +unto +up +upon +upwards +us +use +used +useful +uses +using +usually +v +value +various +versus +very +via +viz +vs +want +wants +was +wasn't +way +we +we'd +welcome +well +we'll +went +were +we're +weren't +we've +what +whatever +what'll +what's +what've +when +whence +whenever +where +whereafter +whereas +whereby +wherein +where's +whereupon +wherever +whether +which +whichever +while +whilst +whither +who +who'd +whoever +whole +who'll +whom +whomever +who's +whose +why +will +willing +wish +with +within +without +wonder +won't +would +wouldn't +yes +yet +you +you'd +you'll +your +you're +yours +yourself +yourselves +you've +zero \ No newline at end of file diff --git a/subset/blogs_0000002.json b/subset/blogs_0000002.json new file mode 100644 index 0000000..eb3f94d --- /dev/null +++ b/subset/blogs_0000002.json @@ -0,0 +1 @@ +{"organizations": [], "uuid": "c87033fea6042ddc5d9289deebfaa97d4b745e80", "thread": {"social": {"gplus": {"shares": 0}, "pinterest": {"shares": 0}, "vk": {"shares": 0}, "linkedin": {"shares": 0}, "facebook": {"likes": 1, "shares": 1, "comments": 0}, "stumbledupon": {"shares": 0}}, "site_full": "www.cnbc.com", "main_image": "https://fm.cnbc.com/applications/cnbc.com/resources/img/editorial/2018/01/02/104924846-3ED2-MM-Block-A-010218.600x400.jpg", "site_section": "http://www.cnbc.com/id/10001135/device/rss/rss.html", "section_title": "Stock Picks", "url": "https://www.cnbc.com/video/2018/01/02/cramer-reflects-on-how-trumps-actions-fuel-the-beast-market-rally.html", "country": "US", "domain_rank": 767, "title": "Cramer reflects on how Trump's actions are fueling the 'beast' market rally", "performance_score": 0, "site": "cnbc.com", "participants_count": 0, "title_full": "", "spam_score": 0.0, "site_type": "blogs", "published": "2018-01-03T01:34:00.000+02:00", "replies_count": 0, "uuid": "c87033fea6042ddc5d9289deebfaa97d4b745e80"}, "author": "", "url": "https://www.cnbc.com/video/2018/01/02/cramer-reflects-on-how-trumps-actions-fuel-the-beast-market-rally.html", "ord_in_thread": 0, "title": "Cramer reflects on how Trump's actions are fueling the 'beast' market rally", "locations": [], "entities": {"persons": [{"name": "trump", "sentiment": "negative"}, {"name": "cramer", "sentiment": "negative"}, {"name": "jim cramer", "sentiment": "negative"}], "locations": [], "organizations": []}, "highlightText": "", "language": "english", "persons": [], "text": "Cramer reflects on how Trump's actions are fueling the 'beast' market rally 1 Hour Ago Jim Cramer examined the notion that investors are \"bored\" with the market rally and explained how the president is driving stocks higher.", "external_links": [], "published": "2018-01-03T01:34:00.000+02:00", "crawled": "2018-01-03T01:56:41.007+02:00", "highlightTitle": ""} \ No newline at end of file diff --git a/subset/blogs_0000010.json b/subset/blogs_0000010.json new file mode 100644 index 0000000..4b4da26 --- /dev/null +++ b/subset/blogs_0000010.json @@ -0,0 +1 @@ +{"organizations": [], "uuid": "8d09d12004eea0019e1d35647a6f0389f8a2a2b5", "thread": {"social": {"gplus": {"shares": 0}, "pinterest": {"shares": 0}, "vk": {"shares": 0}, "linkedin": {"shares": 1}, "facebook": {"likes": 3571, "shares": 3571, "comments": 0}, "stumbledupon": {"shares": 0}}, "site_full": "www.wsj.com", "main_image": "http://s.marketwatch.com/public/resources/MWimages/MW-EN848_thiel_ZG_20160526082215.jpg", "site_section": "http://feeds.marketwatch.com/marketwatch/financial/", "section_title": "MarketWatch.com - Financial Services Industry News", "url": "https://www.wsj.com/articles/peter-thiels-founders-fund-makes-big-bet-on-bitcoin-1514917433?mg=prod/accounts-wsj", "country": "US", "domain_rank": 387, "title": "The Wall Street Journal: Peter Thiel’s VC firm has made a monster bet on bitcoin", "performance_score": 10, "site": "wsj.com", "participants_count": 0, "title_full": "", "spam_score": 0.0, "site_type": "blogs", "published": "2018-01-03T00:59:00.000+02:00", "replies_count": 0, "uuid": "8d09d12004eea0019e1d35647a6f0389f8a2a2b5"}, "author": "", "url": "https://www.wsj.com/articles/peter-thiels-founders-fund-makes-big-bet-on-bitcoin-1514917433?mg=prod/accounts-wsj", "ord_in_thread": 0, "title": "The Wall Street Journal: Peter Thiel’s VC firm has made a monster bet on bitcoin", "locations": [], "entities": {"persons": [{"name": "peter thiel", "sentiment": "negative"}, {"name": "rob copeland", "sentiment": "none"}, {"name": "thiel", "sentiment": "none"}], "locations": [{"name": "silicon valley", "sentiment": "none"}], "organizations": [{"name": "wall street journal", "sentiment": "negative"}, {"name": "facebook inc", "sentiment": "none"}]}, "highlightText": "", "language": "english", "persons": [], "text": "Published: Jan 2, 2018 5:59 p.m. ET Share \nFew mainstream investors have bought large sums of bitcoin, scared off by concerns about cybersecurity and liquidity Getty Images \nBy Rob Copeland \nOne of the biggest names in Silicon Valley is placing a moonshot bet on bitcoin BTCUSD, +0.72% . \nFounders Fund, the venture-capital firm co-founded by Peter Thiel, has amassed hundreds of millions of dollars of the volatile cryptocurrency, people familiar with the matter said. The bet has been spread across several of the firm’s most recent funds, the people said, including one that began investing in mid-2017 and made bitcoin one of its first investments. \nFounders and Thiel, 50 years old, are well-known for early investments in companies like Facebook Inc. FB, +2.81% that sometimes take years to come to fruition. The bitcoin bet is quickly showing promise. Founders bought around $15 million to $20 million in bitcoin, and it has told investors the firm’s haul is now worth hundreds of millions of dollars after the digital currency’s ripping rise in the past year. \nIt isn’t clear if Founders has sold any of its holdings yet. The bet hasn’t been previously reported.", "external_links": [], "published": "2018-01-03T00:59:00.000+02:00", "crawled": "2018-01-03T01:02:54.010+02:00", "highlightTitle": ""} \ No newline at end of file diff --git a/subset/blogs_0000048.json b/subset/blogs_0000048.json new file mode 100644 index 0000000..f93157b --- /dev/null +++ b/subset/blogs_0000048.json @@ -0,0 +1 @@ +{"organizations": [], "uuid": "73c9950b97e666b2505c8563a483e4dd34b89378", "thread": {"social": {"gplus": {"shares": 0}, "pinterest": {"shares": 0}, "vk": {"shares": 0}, "linkedin": {"shares": 0}, "facebook": {"likes": 0, "shares": 0, "comments": 0}, "stumbledupon": {"shares": 0}}, "site_full": "fortune.com", "main_image": "", "site_section": "http://fortune.com", "section_title": "Hoda Kotb Will Replace Matt Lauer on NBC’s ‘Today’ Show – Fortune", "url": "http://fortune.com/2018/01/02/nbc-today-show-hoda-kotb-matt-lauer/", "country": "US", "domain_rank": 1196, "title": "Hoda Kotb Will Replace Matt Lauer on NBC’s ‘Today’ Show", "performance_score": 0, "site": "fortune.com", "participants_count": 1, "title_full": "", "spam_score": 0.0, "site_type": "blogs", "published": "2018-01-02T15:23:00.000+02:00", "replies_count": 0, "uuid": "73c9950b97e666b2505c8563a483e4dd34b89378"}, "author": "Reuters", "url": "http://fortune.com/2018/01/02/nbc-today-show-hoda-kotb-matt-lauer/", "ord_in_thread": 0, "title": "Hoda Kotb Will Replace Matt Lauer on NBC’s ‘Today’ Show", "locations": [], "entities": {"persons": [{"name": "hoda kotb", "sentiment": "negative"}, {"name": "matt lauer", "sentiment": "negative"}, {"name": "kotb", "sentiment": "none"}, {"name": "lauer", "sentiment": "none"}, {"name": "al roker", "sentiment": "none"}, {"name": "hoda", "sentiment": "none"}, {"name": "carson daly", "sentiment": "none"}, {"name": "savannah guthrie", "sentiment": "none"}, {"name": "guthrie", "sentiment": "none"}, {"name": "kathie lee gifford", "sentiment": "none"}], "locations": [{"name": "sochi", "sentiment": "none"}, {"name": "russia", "sentiment": "none"}], "organizations": [{"name": "nbc news today", "sentiment": "negative"}, {"name": "nbc", "sentiment": "negative"}, {"name": "orange room", "sentiment": "none"}]}, "highlightText": "", "language": "english", "persons": [], "text": "By Reuters 8:23 AM EST \nTelevision host Hoda Kotb was named the new co-anchor of the NBC News Today show on Tuesday, replacing former co-host Matt Lauer several weeks after the longtime anchor was fired for inappropriate sexual behavior , according to a network statement. \nKotb will join Savannah Guthrie during the first two hours of the popular program, starting at 7 a.m. EST (noon GMT), and become the first pair of women to host the show along with weatherman Al Roker and Orange Room host Carson Daly. Kotb will continue co-hosting the 10 a.m. hour of Today with Kathie Lee Gifford. \n“It’s 2018 and we are kicking off the year right because Hoda is officially the co-anchor of Today,” said Guthrie, 46, who announced the news during the program’s opening moments. \nKotb, 53, quickly filled in as co-host when Lauer was fired on Nov. 28 after a female colleague complained to NBC officials about a pattern of inappropriate sexual behavior that began while they were on assignment at the 2014 Sochi Winter Olympics in Russia , according to the network. \nAt least two other women went to NBC with similar complaints against Lauer following the first allegation. None of the women has been publicly identified. \n“Repairing the damage will take a lot of time and soul searching and I’m committed to beginning that effort,” Lauer said after he was fired . \nReuters has not independently verified the accusations. \nKotb joined NBC News in 1998 as a correspondent and has co-hosted the fourth hour of Today with Gifford since 2008. She also hosts a program on Sirius XM satellite radio. \n“Over the past several weeks, Hoda has seamlessly stepped into the co-anchor role alongside Savannah, and the two have quickly hit the ground running,” NBC News Chairman Andrew Lack said in an email to staff, according to Today.com. SPONSORED FINANCIAL CONTENT ", "external_links": [], "published": "2018-01-02T15:23:00.000+02:00", "crawled": "2018-01-02T15:23:50.003+02:00", "highlightTitle": ""} \ No newline at end of file