News Clustering

Text mining and in particular text classification is a mandatory skill for a Data Scientist. In this proyect I develop an algorithm in order to classify news of digital newspapers in different languages, English and Spanish, in order to classify its in the correct label based on their topics.

The pipeline for the current proyect is: Scraping the articles, clean and extract the important features for the task and preform and agglomerative clustering. The main tool for this proyect is Python with the packages of NLTK, News Article3k, Beautiful Soup and scikit-learn.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Lematizacion		Lematizacion
Named entities		Named entities
Steamming		Steamming
n-grams		n-grams
README.md		README.md
Results.md		Results.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Clustering

About

Releases

Packages

Languages

jcrespoortega/Text-Clustering

Folders and files

Latest commit

History

Repository files navigation

News Clustering

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages