Skip to content

dvinesett/article_mine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web Article Mining

This program takes a set of urls from most news sites and returns a data matrix for the total count of words in all articles. Supported news sites are included in this list. The format of the data matrix is a comma-separated value (csv) file.

Requirements

$ pip install newspaper3k

How to use

Requires argument pointing to file. Contents of the file should be urls separated by newlines

 $ cnn_parser.py ~/urls.txt 

If you want to save the output of the program to a file, this will work in most unix shells:

 $ cnn_scrape.py ~/urls.txt > ~/output.csv

About

Web Article Mining

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages