Skip to content

Disaster Recovery project as part of Udacity's Nano Degree Data Science course, covering the aspects of Data Engineering like ETL pipeline, NLP & Machine Learning pipeline

License

Notifications You must be signed in to change notification settings

prasannakr/Disaster_Response_Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disaster Response Pipeline Project

Table of Contents:

  1. Project Introduction
  2. File Description
  3. Instructions
  4. Libraries used
  5. Results
  6. Licensing, Acknowledgements

Project Introduction

In this project, I will analyzing data provided by Figure Eight.
Data contains pre-labeled tweets and text messages that are received during real life disasters.
Objective is to prepare the data with ETL(Extract, Transform, Load) pipeline & then use a ML(Machine Learning) pipeline to build a supervised learning model to categorize the events and look out for any trends.
This will help emergency workers to classify/categorize the events and send the messages to appropriate disaster relief agency.

File Description

There are two jupyter notebook files which contain code executed successfully:

  1. ETL Pipeline preparation > Contains code for extracting/cleaning/wrangling/loading final data into sqlite database.
  2. ML Pipeline preparation > Contains code for modeling, using pipeline, gridsearch & few models were run & model with better f1 score chosen.

There are three python scripts used to deploy on the workspace:
3) process_data > Contains functions with executed code from ETL Pipeline
4) train_classifer > contains functions with executed code from ML Pipeline
5) run > contains model pkl file and code for visualizations to run the web app

Instructions

  1. To run ETL pipeline that cleans data and stores in database:
    python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
  2. To run ML pipeline that trains classifier and saves
    python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
  3. To run web app in the app's directory
    python run.py
  4. URL to see visualization
    http://localhost:3001/

Libraries Used

Following libraries were used:
Plotly
joblib
Pandas
Numpy
nltk
flask
sqlalchemy
sys
scikit-learn

Results

End result is a web app powered by supervised machine learning model (LinearSVC) which:
A) Contains Visualizations B) When a message is entered, classfies into different groups. Ex: Type: we are more than 50 people on the street. Please help us find tent and food & Click "Classify Message" Button

Screenshot Screenshot Screenshot Screenshot

Licensing, Acknowledgements

Thanks to real life disaster messages data from Figure Eight.
Thanks to Udacity for providing knowledge on Data Engineering (ETL/NLP/ML Pipelines) and a platform to work on this project.

About

Disaster Recovery project as part of Udacity's Nano Degree Data Science course, covering the aspects of Data Engineering like ETL pipeline, NLP & Machine Learning pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published