- Project Introduction
- File Description
- Instructions
- Libraries used
- Results
- Licensing, Acknowledgements
In this project, I will analyzing data provided by Figure Eight.
Data contains pre-labeled tweets and text messages that are received during real life disasters.
Objective is to prepare the data with ETL(Extract, Transform, Load) pipeline & then use a ML(Machine Learning) pipeline to build a supervised learning model to categorize the events and look out for any trends.
This will help emergency workers to classify/categorize the events and send the messages to appropriate disaster relief agency.
There are two jupyter notebook files which contain code executed successfully:
- ETL Pipeline preparation > Contains code for extracting/cleaning/wrangling/loading final data into sqlite database.
- ML Pipeline preparation > Contains code for modeling, using pipeline, gridsearch & few models were run & model with better f1 score chosen.
There are three python scripts used to deploy on the workspace:
3) process_data > Contains functions with executed code from ETL Pipeline
4) train_classifer > contains functions with executed code from ML Pipeline
5) run > contains model pkl file and code for visualizations to run the web app
- To run ETL pipeline that cleans data and stores in database:
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db - To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl - To run web app in the app's directory
python run.py - URL to see visualization
http://localhost:3001/
Following libraries were used:
Plotly
joblib
Pandas
Numpy
nltk
flask
sqlalchemy
sys
scikit-learn
End result is a web app powered by supervised machine learning model (LinearSVC) which:
A) Contains Visualizations
B) When a message is entered, classfies into different groups.
Ex: Type: we are more than 50 people on the street. Please help us find tent and food & Click "Classify Message" Button
Thanks to real life disaster messages data from Figure Eight.
Thanks to Udacity for providing knowledge on Data Engineering (ETL/NLP/ML Pipelines) and a platform to work on this project.