📦Restaurant_Segmentation_Analysis ┣ 📂data ┃ ┣ 📂Smoothie King: Contains raw data ┃ ┣ 📂Smoothie_King_Preprocessed: Contains processed data ┃ ┣ 📂Subway CAN ┃ ┣ 📂Subway USA: Contains raw data ┃ ┗ 📂Subway_USA_Preprocessed: Contains processed data ┣ 📂doc ┃ ┣ 📜EDA.ipynb ┃ ┣ 📜Final_Presentation.pptx ┃ ┣ 📜Final_Report.Rmd ┃ ┣ 📜Final_Report.pdf ┃ ┣ 📜Proposal_Presentation.pptx ┃ ┣ 📜Proposal_Report.Rmd ┃ ┗ 📜Proposal_Report.pdf ┣ 📂img ┃ ┣ 📂eda ┃ ┣ 📂info ┃ ┣ 📂smoothie_king: Generated image for Smoothie King model feature interpretation ┃ ┃ ┣ 📂l1_reg_random_forest ┃ ┃ ┣ 📂l1_reg_random_forest_ovr ┃ ┃ ┣ 📂random_forest ┃ ┗ 📂subway_usa: Generated image for Subway US model feature interpretation ┣ 📂src ┃ ┣ 📜helper_create_eda_figure.py ┃ ┣ 📜helper_evaluation.py ┃ ┣ 📜helper_plotting_functions.py ┃ ┣ 📜smoothie_king_build_model.py ┃ ┣ 📜smoothie_king_model_interpret.py ┃ ┣ 📜smoothie_king_preprocess_data.py ┃ ┣ 📜subway_usa_build_model.py ┃ ┣ 📜subway_usa_cluster_verify.html ┃ ┣ 📜subway_usa_cluster_verify.ipynb ┃ ┣ 📜subway_usa_cluster_verify.py ┃ ┣ 📜subway_usa_model_interpret.py ┃ ┗ 📜subway_usa_preprocess_data.py ┣ 📂test ┃ ┣ 📜test_smoothie_king.py ┃ ┗ 📜test_subway_usa.py ┣ 📜.gitignore ┣ 📜LICENSE ┣ 📜Makefile ┣ 📜README.md ┣ 📜sitewise_python38_UBC2023.yaml ┗ 📜sitewise_python38_UBC2023_mac.yaml
- Chen Lin
- Eric Tsai
- Morris Zhao
- Xinru Lu
The Restaurant Segmentation Analysis project is a collaboration between Sitewise Analytics and MDS students Chen Lin, Eric Tsai, Morris Zhao, and Xinru Lu. This project aims to use machine learning methods to determine factors that drive traffic to a particular location and identify clusters of similar store locations.
Restaurants seeking to open new stores in a region need to make marketing plans according to the major customer group. Therefore, restaurant franchise owners need to know the factors that drive traffic to a location, such as the surrounding population demographic and consumer behavior in the region, as well as trade area and nearby competitor/sister store information. By having a strong grasp of these factors, owners can plan future expansions and market the new location strategically based on the demand of the region. The Restaurant Segmentation Analysis project will address this problem by using data from Smoothie King locations in the United States and Subway locations in the United States to build machine learning data pipelines for Sitewise Analytics to incorporate into their consulting service. At the end of the project, we expect to have human-interpretable machine learning models that cluster similar store locations, which will be helpful for Sitewise Analytics clients to identify factors that drive traffic in those similar locations.
Project Final Report (Sitewise)
Given that Smoothie King and Subway US are all different clients of Sitewise, it is necessary to build three separate models for each respective client. Ultimately, the factors that drive traffic to each of the three restaurants may be different as well. Thus, the three machine learning pipelines are as follows:
-
A supervised machine learning pipeline using data from Smoothie King locations to predict a store's category from one of five pre-labeled categories:
- Home
- Shopping
- Work
- Travel
- Other
The prediction will be human-interpretable in that users can identify features that determine the prediction of a store location's category.
- An unsupervised machine learning pipeline based on data of US Subway locations that cluster locations by similar features.
The unsupervised machine learning pipeline will also have human-interpretable results, including ways to identify similar features that caused different locations to be clustered together.
We received five datasets for each of the 2 popular chain restaurants: Smoothie King and Subway US. The datasets consist of CSV files for demographic, point of interest, store-specific data, competition sister store data, and trade area, where each row represents a single store location and the columns represent the variables/features of that store. All features in the demographic, point of interest, competitor/sister store, and trade area files are numeric, whereas the store-specific data files contain categorical features such as state and market size.
- For Smoothie King, there are over 1000 features combined for 796 stores.
- For Subway US, there are over 1000 features combined for approximately 14,000 stores.
To replicate the analysis, clone this GitHub repository along with installing the dependencies using the environment file for Mac and the environment file for Windows.
# For Mac
conda env create -n <ENVNAME> --file sitewise_python38_UBC2023_mac.yaml
# For Windows
conda env create -n <ENVNAME> --file sitewise_python38_UBC2023.yaml
conda activate <ENVNAME>
Run the following command at the command line/terminal in the project root directory:
make all
To reset the project by cleaning the file path/directory, without any intermediate plot images or results .csv files, run the following command at the command line/terminal in the project root directory:
make clean
To train the Smoothie King classification model and get the interpretation outputs, run the following command:
make smoothie_king
To reset the Smoothie King model outputs, run the following command:
make clean_sk
To train the Subway USA clustering model, get the interpretation outputs, run the following command:
make subway_usa
To reset the Subway USA model outputs, run the following command:
make clean_sb
NOTE:
To make the cluster verification script work, users need to install Selenium to interact with the Chrome browser.
pip install selenium
Also, need to download chromedriver from here. Ensure the driver version matches the Chrome browser version and save it under this path for Mac users.
'/usr/local/bin/chromedriver'
The Restaurant Segmentation Analysis project here is licensed under the MIT License. Please provide attribution and a link to this webpage if re-using/re-mixing any of these materials.