Predicting the Grade of Restaurants in New York City

Authors (ordered alphabetically by last name) from Group 18:
- Nikita Susan Easow
- Sneha Sunil
- Edward (Yukun) Zhang
- Lauren Zung

A data analysis project for DSCI 522 (Data Science workflows); a course in the Master of Data Science program at the University of British Columbia.

URL of the project repo: https://github.com/UBC-MDS/newyork_restaurant_grading.git

Project Summary

In this project, we build a classification model using logistic regression and support vector machines which uses health inspection data to predict whether a restaurant will be graded A (i.e., the restaurant is clean, up to code, and free of violations.) or F (i.e., the restaurant has some issues that must be fixed or is a public risk on the verge of closure).

Our best model was a balanced logistic regressor with a C value of 0.024947, 1 numeric feature, 130 text features and 47 categorical features. On a test set of 10000 samples, we returned an F1 score of 0.975 and precision and recall scores of 0.999 and 0.999 respectively, indicating that our model is highly effective at classifying both grade A and F restaurants. We also computed the area under a receiver operating characteristic curve which was found to be 1.00. This is the optimum value which also supports that the predictions from our model are close to 100% correct.

We chose the data set, DOHMH New York City Restaurant Inspection Results sourced from NYC OpenData Portal. It is retrieved from the tidytuesday repository by Thomas Mock, and can be sourced here. The original data set can be found here. It contains the violation citations from every inspection conducted for restaurants in New York City from 2012 to 2018. Each row represents a restaurant that has been assessed by a health inspector, including information about their business such as the restaurant name, phone number, location and type of cuisine, as well as the details about their inspection. The restaurants can be assigned an official grade of A, B, or C, otherwise they are assigned Z or P for pending review.

Report

Here is the link to the Project Report.

Usage

To replicate this analysis, clone the GitHub repository and download the environment found here to install the necessary dependencies.

Clone the repository

git clone https://github.com/UBC-MDS/newyork_restaurant_grading.git

Navigate to the repository
```
cd newyork_restaurant_grading
```

Run the analysis using Make

To run the analysis using the Makefile, follow the instructions below:

Create the environment
```
conda env create -f environment.yaml
```
Assuming that the environment was created successfully, you can activate the environment as follows:
```
conda activate nyc_rest
```
To run the entire data analysis, use the command below at the command line/terminal from the root directory of this project:
```
make all
```
To reset the repository to a clean state, with no intermediate or results files, run the following command:
```
make clean
```

Run the analysis using Docker

To run the analysis using a docker image, follow the below instructions -

Install Docker

Pull the docker image

docker pull lzung/newyork_restaurant_grading

After pulling the image, run the below command depending on your Operating system :

For Windows OS :

Run the following command to reset the repository to a clean state with no intermediate or results files:

docker run --rm -v "/$(pwd)://home//newyork_restaurant_grading" lzung/newyork_restaurant_grading:latest make -C //home//newyork_restaurant_grading clean

Then, run the following command to replicate the analysis (note: this may take some time)**:

docker run --rm -v "/$(pwd)://home//newyork_restaurant_grading" lzung/newyork_restaurant_grading:latest make -C //home//newyork_restaurant_grading all

For Mac OS :

Run the following command to reset the repository to a clean state with no intermediates or results files:

docker run --rm -it --platform linux/amd64 -v /$(pwd):/home/newyork_restaurant_grading lzung/newyork_restaurant_grading make -C /home/newyork_restaurant_grading clean

Then, run the following command to replicate the analysis (note: this may take some time)**:

docker run --rm -it --platform linux/amd64 -v /$(pwd):/home/newyork_restaurant_grading lzung/newyork_restaurant_grading make -C /home/newyork_restaurant_grading all

**There is currently a warning, "findfont: Font family ‘Helvetica’ not found." which impacts the tables; the release of dataframe_image v0.1.3 is not currently available for linux/amd64, thus the tables are exported without headers.

Makefile Dependency Diagram

Click the image to enlarge for better visibility:

Dependencies

The associated environment with all dependencies required for this project can be found here.

python==3.9.*
ipykernel
ipython>=7.15
vega_datasets=0.9.0
altair_saver
selenium<4.3.0
scikit-learn>=1.0
pandas>=1.3.*
requests>=2.24.0
dataframe_image=0.1.1
scipy=1.9.3
matplotlib=3.6.2
matplotlib-base=3.6.2
matplotlib-inline=0.1.6
pip:
- docopt-ng==0.8.*
- joblib==1.1.0
- mglearn
- psutil>=5.7.2
- vl-convert-python==0.5.0
- lxml

License

The New York City Restaurant dataset was adapted from tidytuesday dataset made available under the license Creative Commons Zero v1.0 Universal which was originally taken from the Department of Health and Mental Hygiene (DOHMH) owned by NYC OpenData.

For more details about the License of this project, please click here.

References

Mock, T (2022). Tidy Tuesday: A weekly data project aimed at the R ecosystem. https://github.com/rfordatascience/tidytuesday.

NYC Open Data Portal (2022). DOHMH New York City Restaurant Inspection Results. https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j/data

SmartSense. (2018, August 17). Restaurant letter grading: What does a B really mean? Connected Insights Blog. Retrieved November 19, 2022, from https://blog.smartsense.co/restaurant-letter-grading#:~:text=GRADE%20A%3A%20The%20restaurant%20is,and%20on%20verge%20of%20closure.

Staff, E. (2020, June 10). A running list of new restaurants that opened during the pandemic. Eater NY. Retrieved November 19, 2022, from https://ny.eater.com/2020/6/10/21270665/nyc-new-restaurant-openings-coronavirus

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
.github/workflows		.github/workflows
data		data
doc		doc
results		results
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
Makefile.png		Makefile.png
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting the Grade of Restaurants in New York City

Project Summary

Report

Usage

Run the analysis using Make

Run the analysis using Docker

Makefile Dependency Diagram

Dependencies

License

References

About

Releases 4

Packages

Contributors 4

Languages

License

UBC-MDS/newyork_restaurant_grading

Folders and files

Latest commit

History

Repository files navigation

Predicting the Grade of Restaurants in New York City

Project Summary

Report

Usage

Run the analysis using Make

Run the analysis using Docker

Makefile Dependency Diagram

Dependencies

License

References

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 4

Languages

Packages