c. Instructions
d. Licensing, Authors, and Acknowledgements
The motivation behind executing this project is to put data engineering skills into practice ie implementing an ETL pipeline followed by a machine learning pipeline to classify disaster messages provided by Figure 8 now acquired by Appen.
There are three main folders:
- data
- disaster_categories.csv: dataset including all the categories
- disaster_messages.csv: dataset including all the messages in original language and trasnlated to English
- process_data.py: ETL pipeline scripts to read, clean, and save data into a database
- DisasterResponse.db: output of the ETL pipeline, i.e. SQLite database containing messages and categories data
- models
- train_classifier.py: machine learning pipeline scripts to train and export a classifier
- classifier.pkl: output of the machine learning pipeline, i.e. a trained classifer
- app
- run.py: Flask file to run the web application
- templates contains html file for the web applicatin
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
Run the following command in the app's directory to run your web app.
python run.py
Go to
As mentioned, the data was obtained from Figure 8. Further, I would also like to acknowledge Udacity Data Scientist Nanodegree instructors for their efforts towards making the students understand various steps of an ETL and ML pipeline.