Email Classifier

Random forest classifier for spam or ham emails (deploied on linode sever)

This Classifier was created as part of a home assignment at the 'Israeli Tech Challenge' Bootcamp.
The main purpose of this classifier is to determine if an email is spam or ham.

The model predictions are based on the 'Enron' database provided by the NLP group at the Athens University of Economics and Business AUEB .
I've used this data to train a spam filter, using a processed version of the Enron dataset including labels for "ham" (non-spam) and spam emails.
I this case I've used the AUEB predictions as the true label of the data and classified the data for ham or spam myself.

First I've used 'CountVectorizer' from 'Sklearn' to create Vectorize the words in the dataset into 500 different features that were created from 1-2 words.
After trying different prediction models the one how to produce the best score with 97% of precision is 'Random Forest Classifier'.
To prefect the classifier I have used 'GridSearchCV' from 'Sklearn' to find the best parameters on the train dataset.
Then, to deploy the Classifier to an online server I have used the 'Pickle' package to dump ('zip') them.
When the application is activated the models are loaded and can be used to create prediction in last than 1 sec!
One of the latest features that was added to the application is a API request options. Can be used as single request with param or as multi request using json file.

Moreover, I have created an SQLite database for user accounts, classified email archives, and API statistics.
For that, I have mainly used 'flask' extensions

I have deployed the model to a Linux server provided by 'Linode'.
To do so I have used 'Nginx', 'Gunicorn' ,'flask' extensions and bash scripting

Hope you enjoy my application and wish you good luck,

yours, Nir Barazida

Application Screenshots

Homepage for visitors:

Homepage for users:

Classifier:

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.idea		.idea
documentation		documentation
email_classifier		email_classifier
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email Classifier

Random forest classifier for spam or ham emails (deploied on linode sever)

Application Screenshots

Sources:

About

Releases

Packages

Languages

nirbarazida/Email_classifier

Folders and files

Latest commit

History

Repository files navigation

Email Classifier

Random forest classifier for spam or ham emails (deploied on linode sever)

Application Screenshots

Sources:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages