This Classifier was created as part of a home assignment at the 'Israeli Tech Challenge' Bootcamp.
The main purpose of this classifier is to determine if an email is spam or ham.
The model predictions are based on the 'Enron' database provided by the NLP group at the Athens University of Economics and Business AUEB .
I've used this data to train a spam filter, using a processed version of the Enron dataset including labels for "ham" (non-spam) and spam emails.
I this case I've used the AUEB predictions as the true label of the data and classified the data for ham or spam myself.
First I've used 'CountVectorizer' from 'Sklearn' to create Vectorize the words in the dataset into 500 different features that were created from 1-2 words.
After trying different prediction models the one how to produce the best score with 97% of precision is 'Random Forest Classifier'.
To prefect the classifier I have used 'GridSearchCV' from 'Sklearn' to find the best parameters on the train dataset.
Then, to deploy the Classifier to an online server I have used the 'Pickle' package to dump ('zip') them.
When the application is activated the models are loaded and can be used to create prediction in last than 1 sec!
One of the latest features that was added to the application is a API request options. Can be used as single request with param or as multi request using json file.
Moreover, I have created an SQLite database for user accounts, classified email archives, and API statistics.
For that, I have mainly used 'flask' extensions
I have deployed the model to a Linux server provided by 'Linode'.
To do so I have used 'Nginx', 'Gunicorn' ,'flask' extensions and bash scripting
Hope you enjoy my application and wish you good luck,
yours, Nir Barazida
- Homepage for visitors:
- Homepage for users:
- Classifier: