Restaurant_Segmentation_Analysis

Repo Structure

📦Restaurant_Segmentation_Analysis
 ┣ 📂data
 ┃ ┣ 📂Smoothie King: Contains raw data
 ┃ ┣ 📂Smoothie_King_Preprocessed: Contains processed data
 ┃ ┣ 📂Subway CAN
 ┃ ┣ 📂Subway USA: Contains raw data
 ┃ ┗ 📂Subway_USA_Preprocessed: Contains processed data
 ┣ 📂doc
 ┃ ┣ 📜EDA.ipynb
 ┃ ┣ 📜Final_Presentation.pptx
 ┃ ┣ 📜Final_Report.Rmd
 ┃ ┣ 📜Final_Report.pdf
 ┃ ┣ 📜Proposal_Presentation.pptx
 ┃ ┣ 📜Proposal_Report.Rmd
 ┃ ┗ 📜Proposal_Report.pdf
 ┣ 📂img
 ┃ ┣ 📂eda
 ┃ ┣ 📂info
 ┃ ┣ 📂smoothie_king: Generated image for Smoothie King model feature interpretation
 ┃ ┃ ┣ 📂l1_reg_random_forest
 ┃ ┃ ┣ 📂l1_reg_random_forest_ovr
 ┃ ┃ ┣ 📂random_forest
 ┃ ┗ 📂subway_usa: Generated image for Subway US model feature interpretation
 ┣ 📂src
 ┃ ┣ 📜helper_create_eda_figure.py
 ┃ ┣ 📜helper_evaluation.py
 ┃ ┣ 📜helper_plotting_functions.py
 ┃ ┣ 📜smoothie_king_build_model.py
 ┃ ┣ 📜smoothie_king_model_interpret.py
 ┃ ┣ 📜smoothie_king_preprocess_data.py
 ┃ ┣ 📜subway_usa_build_model.py
 ┃ ┣ 📜subway_usa_cluster_verify.html
 ┃ ┣ 📜subway_usa_cluster_verify.ipynb
 ┃ ┣ 📜subway_usa_cluster_verify.py
 ┃ ┣ 📜subway_usa_model_interpret.py
 ┃ ┗ 📜subway_usa_preprocess_data.py
 ┣ 📂test
 ┃ ┣ 📜test_smoothie_king.py
 ┃ ┗ 📜test_subway_usa.py
 ┣ 📜.gitignore
 ┣ 📜LICENSE
 ┣ 📜Makefile
 ┣ 📜README.md
 ┣ 📜sitewise_python38_UBC2023.yaml
 ┗ 📜sitewise_python38_UBC2023_mac.yaml

Contributors and Maintainers

Chen Lin
Eric Tsai
Morris Zhao
Xinru Lu

Motivation

The Restaurant Segmentation Analysis project is a collaboration between Sitewise Analytics and MDS students Chen Lin, Eric Tsai, Morris Zhao, and Xinru Lu. This project aims to use machine learning methods to determine factors that drive traffic to a particular location and identify clusters of similar store locations.

Restaurants seeking to open new stores in a region need to make marketing plans according to the major customer group. Therefore, restaurant franchise owners need to know the factors that drive traffic to a location, such as the surrounding population demographic and consumer behavior in the region, as well as trade area and nearby competitor/sister store information. By having a strong grasp of these factors, owners can plan future expansions and market the new location strategically based on the demand of the region. The Restaurant Segmentation Analysis project will address this problem by using data from Smoothie King locations in the United States and Subway locations in the United States to build machine learning data pipelines for Sitewise Analytics to incorporate into their consulting service. At the end of the project, we expect to have human-interpretable machine learning models that cluster similar store locations, which will be helpful for Sitewise Analytics clients to identify factors that drive traffic in those similar locations.

Project Proposal (Sitewise)

Project Final Report (Sitewise)

Objectives

Given that Smoothie King and Subway US are all different clients of Sitewise, it is necessary to build three separate models for each respective client. Ultimately, the factors that drive traffic to each of the three restaurants may be different as well. Thus, the three machine learning pipelines are as follows:

A supervised machine learning pipeline using data from Smoothie King locations to predict a store's category from one of five pre-labeled categories:
- Home
- Shopping
- Work
- Travel
- Other

The prediction will be human-interpretable in that users can identify features that determine the prediction of a store location's category.

An unsupervised machine learning pipeline based on data of US Subway locations that cluster locations by similar features.

The unsupervised machine learning pipeline will also have human-interpretable results, including ways to identify similar features that caused different locations to be clustered together.

Data Summary

We received five datasets for each of the 2 popular chain restaurants: Smoothie King and Subway US. The datasets consist of CSV files for demographic, point of interest, store-specific data, competition sister store data, and trade area, where each row represents a single store location and the columns represent the variables/features of that store. All features in the demographic, point of interest, competitor/sister store, and trade area files are numeric, whereas the store-specific data files contain categorical features such as state and market size.

For Smoothie King, there are over 1000 features combined for 796 stores.
For Subway US, there are over 1000 features combined for approximately 14,000 stores.

Usage

To replicate the analysis, clone this GitHub repository along with installing the dependencies using the environment file for Mac and the environment file for Windows.

Create Conda environment

# For Mac
conda env create -n <ENVNAME> --file sitewise_python38_UBC2023_mac.yaml

# For Windows
conda env create -n <ENVNAME> --file sitewise_python38_UBC2023.yaml

Activate Conda environment

conda activate <ENVNAME>

1. Using Makefile to generate the final report

Run the following command at the command line/terminal in the project root directory:

make all

To reset the project by cleaning the file path/directory, without any intermediate plot images or results .csv files, run the following command at the command line/terminal in the project root directory:

make clean

2. Using Makefile to train the Smoothie King classification model

To train the Smoothie King classification model and get the interpretation outputs, run the following command:

make smoothie_king

To reset the Smoothie King model outputs, run the following command:

make clean_sk

3. Using Makefile to train the Subway USA clustering model

To train the Subway USA clustering model, get the interpretation outputs, run the following command:

make subway_usa

To reset the Subway USA model outputs, run the following command:

make clean_sb

NOTE:
To make the cluster verification script work, users need to install Selenium to interact with the Chrome browser.

pip install selenium

Also, need to download chromedriver from here. Ensure the driver version matches the Chrome browser version and save it under this path for Mac users.

'/usr/local/bin/chromedriver'

Licenses

The Restaurant Segmentation Analysis project here is licensed under the MIT License. Please provide attribution and a link to this webpage if re-using/re-mixing any of these materials.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Restaurant_Segmentation_Analysis

Repo Structure

Contributors and Maintainers

Motivation

Objectives

Data Summary

Usage

Create Conda environment

Activate Conda environment

1. Using Makefile to generate the final report

2. Using Makefile to train the Smoothie King classification model

3. Using Makefile to train the Subway USA clustering model

Licenses

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 385 Commits
data		data
doc		doc
img		img
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
sitewise_python38_UBC2023.yaml		sitewise_python38_UBC2023.yaml
sitewise_python38_UBC2023_mac.yaml		sitewise_python38_UBC2023_mac.yaml

License

mozhao0331/Restaurant_Segmentation_Analysis

Folders and files

Latest commit

History

Repository files navigation

Restaurant_Segmentation_Analysis

Repo Structure

Contributors and Maintainers

Motivation

Objectives

Data Summary

Usage

Create Conda environment

Activate Conda environment

1. Using Makefile to generate the final report

2. Using Makefile to train the Smoothie King classification model

3. Using Makefile to train the Subway USA clustering model

Licenses

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages