Onion or Not?

Dataset of The Onion articles and real "Onion-like" news articles from the subreddit r/NotTheOnion. The Onion articles are labeled 1 and the r/NotTheOnion articles are labeled 0.

I decided to do this project as my first foray into NLP because I am a fan of both the onion and r/nottheonion.

The dataset was extracted using pushshift API and the article titles were cleaned and one-hot encoded before being used to train some models.

Models Used:

Global Pooling
LTSM
Bidirectional LSTM
CNN

After the model were trained the training and validation accuracy and losses were plotted and the accuracy was calculated to evaluate the model results.

Currently, the best performing models are Bidirectional LSTM (0.855 accuracy on test set) and CNN (0.831 accuracy on test set)

Next Steps:

I would like to try using Word2Vec to create the word vectors instead of just one-hot encoding to see if I can improve the model accuracy
I would like to set up a website to let users classify whether articles are from the onion or not and compare their performance to that of the trained models

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Onion.ipynb		Onion.ipynb
OnionOrNot.csv		OnionOrNot.csv
OnionOrNotClean.csv		OnionOrNotClean.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Onion or Not?

About

Releases

Packages

Languages

Cesium-Ice/onion

Folders and files

Latest commit

History

Repository files navigation

Onion or Not?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages