Author: Sajid Al Sanai
License: MIT License
The motivation for this particular project is to construct a web-based application and API that is able to classify social media messages into critical disaster response-related categories. There are 36 predefined natural disasters and disaster response-related categories under which social media messages may be filed including earthquakes, floods, shelter, food, medical aid, search & rescue et cetera.
By classifying these messages, we can determine critical information on specific locations, users, and aid resources required for disaster relief efforts and forward the contents of these identified messages to the appropriate disaster relief agencies or governmental authorities for disaster response.
The utilisation of machine learning and natural language processing ensures faster alert and detection of social media content for disasters and identifying where, for whom, and what sort of relief response will be essential and effective in those circumstances.
The web-based application and API uses an ETL pipeline that processes social media disaster related messages from Figure Eight and subsequently uses the cleaned dataset in a NLP pipeline for a multi-class classification model.
The directory structure for this repository is detailed below:
/
├── Documentation/
│ ├── process_data.html
│ └── train_classifier.html
├── app/
│ ├── templates/
│ │ ├── go.html
│ │ └── master.html
│ └── run.py
├── data/
│ ├── process_data.py
│ ├── categories.csv
│ ├── messages.py
│ └── DisasterResponse.db
├── model/
│ ├── train_classifier.py
│ └── classifier.pkl
├── .gitignore
├── requirements.txt
├── LICENSE
└── README.md
-
Ensure that you have initialised your local Python environment and installed all relevant Python packages required by this project.
git clone https://github.com/sajidsarker/disaster-response.git cd disaster-response pip install virtualenv virtualenv env source env/bin/activate pip install -r requirements.txt
-
Run the following commands in the project's root directory to set up your database and model.
-
To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/messages.csv data/categories.csv data/DisasterResponse.db
-
To run ML pipeline that trains classifier and saves it in a pickle file
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
-
-
Run the following command in the
./app
directory to run your web app.python app/run.py
-
Navigate to
http://0.0.0.0:3001/
in your web browser.
Navigate to ./Documentation
to find formatted documentation for the relevant Python scripts in this project.
The classification model was a Multiple Output classifier wrapper for individual Ada Boost classifiers trained on the message data using a NLP pipeline. Training occurred additionally through Grid Search Cross Validation over a set of pre-defined tuning parameters to derive the best final model parameters.
Prior to this trained model, a Random Forest classifier was used. This yielded model parameters totalling 1.0 GB
in file size. In contrast, Ada Boost, which is also an ensemble learning model, yielded similar results with model parameters totalling 2.0 MB
in file size.
Model diagnostics are listed below.
Model Accuracy: 94.846%
Target Variable 0: related
Class Labels: [0 1]
precision recall f1-score support
0 0.68 0.26 0.37 1253
1 0.80 0.96 0.88 3983
accuracy 0.79 5236
macro avg 0.74 0.61 0.62 5236
weighted avg 0.77 0.79 0.76 5236
Target Variable 1: request
Class Labels: [0 1]
precision recall f1-score support
0 0.91 0.96 0.94 4358
1 0.73 0.53 0.62 878
accuracy 0.89 5236
macro avg 0.82 0.75 0.78 5236
weighted avg 0.88 0.89 0.88 5236
Target Variable 2: offer
Class Labels: [0 1]
precision recall f1-score support
0 1.00 1.00 1.00 5222
1 0.00 0.00 0.00 14
accuracy 1.00 5236
macro avg 0.50 0.50 0.50 5236
weighted avg 0.99 1.00 0.99 5236
Target Variable 3: aid_related
Class Labels: [0 1]
precision recall f1-score support
0 0.77 0.86 0.81 3104
1 0.75 0.62 0.68 2132
accuracy 0.76 5236
macro avg 0.76 0.74 0.75 5236
weighted avg 0.76 0.76 0.76 5236
Target Variable 4: medical_help
Class Labels: [0 1]
precision recall f1-score support
0 0.93 0.98 0.96 4788
1 0.58 0.26 0.36 448
accuracy 0.92 5236
macro avg 0.76 0.62 0.66 5236
weighted avg 0.90 0.92 0.91 5236
Target Variable 5: medical_products
Class Labels: [0 1]
precision recall f1-score support
0 0.96 0.99 0.98 4956
1 0.67 0.33 0.44 280
accuracy 0.96 5236
macro avg 0.82 0.66 0.71 5236
weighted avg 0.95 0.96 0.95 5236
Target Variable 6: search_and_rescue
Class Labels: [0 1]
precision recall f1-score support
0 0.98 1.00 0.99 5099
1 0.54 0.15 0.24 137
accuracy 0.97 5236
macro avg 0.76 0.57 0.61 5236
weighted avg 0.97 0.97 0.97 5236
Target Variable 7: security
Class Labels: [0 1]
precision recall f1-score support
0 0.98 1.00 0.99 5151
1 0.36 0.06 0.10 85
accuracy 0.98 5236
macro avg 0.67 0.53 0.55 5236
weighted avg 0.97 0.98 0.98 5236
Target Variable 8: military
Class Labels: [0 1]
precision recall f1-score support
0 0.98 0.99 0.99 5071
1 0.58 0.39 0.47 165
accuracy 0.97 5236
macro avg 0.78 0.69 0.73 5236
weighted avg 0.97 0.97 0.97 5236
Target Variable 9: child_alone
Class Labels: [0]
precision recall f1-score support
0 1.00 1.00 1.00 5236
accuracy 1.00 5236
macro avg 1.00 1.00 1.00 5236
weighted avg 1.00 1.00 1.00 5236
Target Variable 10: water
Class Labels: [0 1]
precision recall f1-score support
0 0.97 0.98 0.98 4896
1 0.74 0.62 0.68 340
accuracy 0.96 5236
macro avg 0.86 0.80 0.83 5236
weighted avg 0.96 0.96 0.96 5236
Target Variable 11: food
Class Labels: [0 1]
precision recall f1-score support
0 0.96 0.98 0.97 4655
1 0.81 0.69 0.75 581
accuracy 0.95 5236
macro avg 0.89 0.83 0.86 5236
weighted avg 0.95 0.95 0.95 5236
Target Variable 12: shelter
Class Labels: [0 1]
precision recall f1-score support
0 0.96 0.98 0.97 4771
1 0.77 0.53 0.63 465
accuracy 0.94 5236
macro avg 0.86 0.76 0.80 5236
weighted avg 0.94 0.94 0.94 5236
Target Variable 13: clothing
Class Labels: [0 1]
precision recall f1-score support
0 0.99 1.00 0.99 5157
1 0.76 0.47 0.58 79
accuracy 0.99 5236
macro avg 0.87 0.73 0.79 5236
weighted avg 0.99 0.99 0.99 5236
Target Variable 14: money
Class Labels: [0 1]
precision recall f1-score support
0 0.98 1.00 0.99 5118
1 0.59 0.31 0.40 118
accuracy 0.98 5236
macro avg 0.79 0.65 0.70 5236
weighted avg 0.98 0.98 0.98 5236
Target Variable 15: missing_people
Class Labels: [0 1]
precision recall f1-score support
0 0.99 1.00 0.99 5173
1 0.18 0.03 0.05 63
accuracy 0.99 5236
macro avg 0.59 0.52 0.52 5236
weighted avg 0.98 0.99 0.98 5236
Target Variable 16: refugees
Class Labels: [0 1]
precision recall f1-score support
0 0.97 0.99 0.98 5046
1 0.62 0.25 0.35 190
accuracy 0.97 5236
macro avg 0.80 0.62 0.67 5236
weighted avg 0.96 0.97 0.96 5236
Target Variable 17: death
Class Labels: [0 1]
precision recall f1-score support
0 0.97 0.99 0.98 4997
1 0.71 0.44 0.55 239
accuracy 0.97 5236
macro avg 0.84 0.72 0.76 5236
weighted avg 0.96 0.97 0.96 5236
Target Variable 18: other_aid
Class Labels: [0 1]
precision recall f1-score support
0 0.89 0.98 0.93 4576
1 0.51 0.15 0.23 660
accuracy 0.87 5236
macro avg 0.70 0.56 0.58 5236
weighted avg 0.84 0.87 0.84 5236
Target Variable 19: infrastructure_related
Class Labels: [0 1]
precision recall f1-score support
0 0.95 0.99 0.97 4927
1 0.39 0.12 0.19 309
accuracy 0.94 5236
macro avg 0.67 0.56 0.58 5236
weighted avg 0.91 0.94 0.92 5236
Target Variable 20: transport
Class Labels: [0 1]
precision recall f1-score support
0 0.96 1.00 0.98 5001
1 0.71 0.22 0.33 235
accuracy 0.96 5236
macro avg 0.84 0.61 0.66 5236
weighted avg 0.95 0.96 0.95 5236
Target Variable 21: buildings
Class Labels: [0 1]
precision recall f1-score support
0 0.97 0.99 0.98 4974
1 0.68 0.39 0.50 262
accuracy 0.96 5236
macro avg 0.82 0.69 0.74 5236
weighted avg 0.95 0.96 0.96 5236
Target Variable 22: electricity
Class Labels: [0 1]
precision recall f1-score support
0 0.99 1.00 0.99 5136
1 0.54 0.25 0.34 100
accuracy 0.98 5236
macro avg 0.76 0.62 0.67 5236
weighted avg 0.98 0.98 0.98 5236
Target Variable 23: tools
Class Labels: [0 1]
precision recall f1-score support
0 0.99 1.00 1.00 5205
1 0.33 0.03 0.06 31
accuracy 0.99 5236
macro avg 0.66 0.52 0.53 5236
weighted avg 0.99 0.99 0.99 5236
Target Variable 24: hospitals
Class Labels: [0 1]
precision recall f1-score support
0 0.99 1.00 0.99 5177
1 0.23 0.10 0.14 59
accuracy 0.99 5236
macro avg 0.61 0.55 0.57 5236
weighted avg 0.98 0.99 0.98 5236
Target Variable 25: shops
Class Labels: [0 1]
precision recall f1-score support
0 1.00 1.00 1.00 5221
1 0.40 0.13 0.20 15
accuracy 1.00 5236
macro avg 0.70 0.57 0.60 5236
weighted avg 1.00 1.00 1.00 5236
Target Variable 26: aid_centers
Class Labels: [0 1]
precision recall f1-score support
0 0.99 1.00 0.99 5187
1 0.35 0.16 0.22 49
accuracy 0.99 5236
macro avg 0.67 0.58 0.61 5236
weighted avg 0.99 0.99 0.99 5236
Target Variable 27: other_infrastructure
Class Labels: [0 1]
precision recall f1-score support
0 0.96 0.99 0.98 5019
1 0.31 0.09 0.14 217
accuracy 0.95 5236
macro avg 0.64 0.54 0.56 5236
weighted avg 0.93 0.95 0.94 5236
Target Variable 28: weather_related
Class Labels: [0 1]
precision recall f1-score support
0 0.89 0.95 0.92 3799
1 0.85 0.70 0.77 1437
accuracy 0.88 5236
macro avg 0.87 0.83 0.85 5236
weighted avg 0.88 0.88 0.88 5236
Target Variable 29: floods
Class Labels: [0 1]
precision recall f1-score support
0 0.96 0.99 0.98 4814
1 0.84 0.55 0.66 422
accuracy 0.96 5236
macro avg 0.90 0.77 0.82 5236
weighted avg 0.95 0.96 0.95 5236
Target Variable 30: storm
Class Labels: [0 1]
precision recall f1-score support
0 0.95 0.98 0.97 4750
1 0.75 0.52 0.61 486
accuracy 0.94 5236
macro avg 0.85 0.75 0.79 5236
weighted avg 0.93 0.94 0.93 5236
Target Variable 31: fire
Class Labels: [0 1]
precision recall f1-score support
0 0.99 1.00 1.00 5183
1 0.67 0.26 0.38 53
accuracy 0.99 5236
macro avg 0.83 0.63 0.69 5236
weighted avg 0.99 0.99 0.99 5236
Target Variable 32: earthquake
Class Labels: [0 1]
precision recall f1-score support
0 0.98 0.99 0.98 4749
1 0.89 0.78 0.83 487
accuracy 0.97 5236
macro avg 0.93 0.88 0.91 5236
weighted avg 0.97 0.97 0.97 5236
Target Variable 33: cold
Class Labels: [0 1]
precision recall f1-score support
0 0.99 1.00 0.99 5140
1 0.53 0.24 0.33 96
accuracy 0.98 5236
macro avg 0.76 0.62 0.66 5236
weighted avg 0.98 0.98 0.98 5236
Target Variable 34: other_weather
Class Labels: [0 1]
precision recall f1-score support
0 0.95 0.99 0.97 4951
1 0.54 0.18 0.27 285
accuracy 0.95 5236
macro avg 0.75 0.59 0.62 5236
weighted avg 0.93 0.95 0.93 5236
Target Variable 35: direct_report
Class Labels: [0 1]
precision recall f1-score support
0 0.87 0.96 0.91 4246
1 0.68 0.41 0.51 990
accuracy 0.85 5236
macro avg 0.78 0.68 0.71 5236
weighted avg 0.84 0.85 0.84 5236