Skip to content

rahulpoojith/SMS-spam-classifier

Repository files navigation

SMS Spam Classifier

This project is a Streamlit web application that classifies SMS or email messages as Spam or Not Spam using a machine learning model. The application provides a user-friendly interface for real-time classification.


Project Overview

  • Objective: Build a robust machine learning model to classify text messages as spam or not spam.
  • Approach:
    1. Preprocess the input text data.
    2. Train a classifier using a labeled dataset.
    3. Deploy the model as a web application using Streamlit.

Dataset Details

  • Categories:
    • Spam: Messages that are unsolicited or promotional in nature.
    • Not Spam: Legitimate and personal messages.
  • Source:
  • Structure:
    • Text: The SMS/email content.
    • Label: Classification (spam or not spam).

Key Features

  1. Text Preprocessing:

    • Lowercasing text.
    • Tokenization.
    • Stopword removal.
    • Stemming using the Porter Stemmer.
  2. Vectorization:

    • Text is converted into numerical format using TfidfVectorizer.
  3. Machine Learning Model:

    • Trained using Naive Bayes or another classifier for text classification.
    • Saved as a serialized model (model.pkl).
  4. Interactive Web Application:

    • Built using Streamlit for real-time message classification.
    • Users can input any SMS/email text to check if it's spam or not.

Requirements

Ensure the following are installed:

  • Python Version: Python 3.7+
  • Python Libraries (listed in requirements.txt):
    streamlit
    scikit-learn
    nltk
    pandas
    numpy
    matplotlib
    
  • Pretrained model files: vectorizer.pkl and model.pkl.

Setup and Installation

1. Clone the Repository

git clone https://github.com/rahulpoojith/SMS-spam-classifier.git
cd SMS-spam-classifier

2. Install Dependencies

Install the required Python libraries using:

pip install -r requirements.txt

3. Download NLTK Data

Download NLTK data for text preprocessing:

python -m nltk.downloader punkt stopwords

4. Add Model Files

Ensure the following files are in the project directory:

  • vectorizer.pkl: Contains the trained TfidfVectorizer.
  • model.pkl: Contains the trained classification model.

How to Run the Application

Start the Streamlit App

Run the application using:

streamlit run app.py

Interact with the Application

  1. Open your browser and navigate to the URL provided in the terminal (usually http://localhost:8501).
  2. Enter the SMS or email text in the input box.
  3. Click the "Predict" button to classify the message.
  4. The application will display the result as Spam or Not Spam.

Folder Structure

SMS-spam-classifier/
├── app.py              # Main Streamlit application
├── model.pkl           # Trained classification model
├── vectorizer.pkl      # TfidfVectorizer for text vectorization
├── requirements.txt    # Python dependencies
├── README.md           # Project documentation

Customization

  • Model: Replace model.pkl with any new model for experimentation.
  • Text Preprocessing: Modify the transform_text function in app.py for additional preprocessing steps.
  • Dataset: Use a different labeled dataset for retraining the model.

Acknowledgements


License

This project is licensed under the MIT License. Feel free to use and modify it as needed.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published