This project is a Streamlit web application that classifies SMS or email messages as Spam or Not Spam using a machine learning model. The application provides a user-friendly interface for real-time classification.
- Objective: Build a robust machine learning model to classify text messages as spam or not spam.
- Approach:
- Preprocess the input text data.
- Train a classifier using a labeled dataset.
- Deploy the model as a web application using Streamlit.
- Categories:
- Spam: Messages that are unsolicited or promotional in nature.
- Not Spam: Legitimate and personal messages.
- Source:
- Public datasets such as the UCI SMS Spam Collection.
- Structure:
- Text: The SMS/email content.
- Label: Classification (
spam
ornot spam
).
-
Text Preprocessing:
- Lowercasing text.
- Tokenization.
- Stopword removal.
- Stemming using the Porter Stemmer.
-
Vectorization:
- Text is converted into numerical format using
TfidfVectorizer
.
- Text is converted into numerical format using
-
Machine Learning Model:
- Trained using Naive Bayes or another classifier for text classification.
- Saved as a serialized model (
model.pkl
).
-
Interactive Web Application:
- Built using Streamlit for real-time message classification.
- Users can input any SMS/email text to check if it's spam or not.
Ensure the following are installed:
- Python Version: Python 3.7+
- Python Libraries (listed in
requirements.txt
):streamlit scikit-learn nltk pandas numpy matplotlib
- Pretrained model files:
vectorizer.pkl
andmodel.pkl
.
git clone https://github.com/rahulpoojith/SMS-spam-classifier.git
cd SMS-spam-classifier
Install the required Python libraries using:
pip install -r requirements.txt
Download NLTK data for text preprocessing:
python -m nltk.downloader punkt stopwords
Ensure the following files are in the project directory:
vectorizer.pkl
: Contains the trainedTfidfVectorizer
.model.pkl
: Contains the trained classification model.
Run the application using:
streamlit run app.py
- Open your browser and navigate to the URL provided in the terminal (usually
http://localhost:8501
). - Enter the SMS or email text in the input box.
- Click the "Predict" button to classify the message.
- The application will display the result as Spam or Not Spam.
SMS-spam-classifier/
├── app.py # Main Streamlit application
├── model.pkl # Trained classification model
├── vectorizer.pkl # TfidfVectorizer for text vectorization
├── requirements.txt # Python dependencies
├── README.md # Project documentation
- Model: Replace
model.pkl
with any new model for experimentation. - Text Preprocessing: Modify the
transform_text
function inapp.py
for additional preprocessing steps. - Dataset: Use a different labeled dataset for retraining the model.
- UCI SMS Spam Collection Dataset
- Libraries:
Streamlit
,Scikit-learn
,NLTK
,TfidfVectorizer
This project is licensed under the MIT License. Feel free to use and modify it as needed.