Yelp-Recommender-System

Implementation of collaborative filtering techniques incorporating ALS, LSH, Minhash and using Jaccard similarity measure.

Programming Environment

Python: 3.6
Spark: 2.3.2
Scala: 2.11
spark.driver.memory : 4g
spark.executor.memory: 4g

Hardware Specs:

Model: Lenovo Legion Y530 15
OS: Windows 10, 64-bit
Processor: Intel® Core™ i7-8750H CPU @ 2.20GHz x 6
Memory: 16 GB

Generating dataset from Yelp Dataset

Generated the following two datasets from the original Yelp review dataset with some filters such as the condition: “state” == “CA”. We randomly took 60% of the data as the training dataset, 20% of the data as the validation dataset.

a. yelp_train.csv: the training data, which only include the columns: user_id, business_id, and stars.
b. yelp_val.csv: the validation data, which are in the same format as training data

Running the algorithm

Task1: Jaccard based LSH

Command to run:

spark-submit Mansi_Ganatra_task1.py <input_file_path> <output_file_path>

Implemented the Locality Sensitive Hashing algorithm with Jaccard similarity using yelp_train.csv with focus on the “0 or 1” ratings rather than the actual ratings/stars from the users. Specifically, if a user has rated a business, the user’s contribution in the characteristic matrix is 1. If the user hasn’t rated the business, the contribution is 0. Identified similar businesses whose similarity >=0.5. The generated results are compared to the ground truth file pure_jaccard_similarity.csv.

Task2: Recommendation System

Command to run:

spark-submit Mansi_Ganatra_task2.py <train_filename> <tes_filename.py> <case_id> <output_filename>

This implementation generates recommendations using:
Case 1: Model-based CF recommendation system with Spark MLlib(using ALS implementation) Case 2: User-based CF recommendation system
Case 3: Item-based CF recommendation system
Case 4: Item-based CF recommendation system with Jaccard based LSH generated in task 1

Validation Set Baselines:

Case	RMSE	Run Time(sec)
Case1	1.24	39
Case2	1.15	95
Case3	1.15	109
Case4	1.15	150

My experiments to use fast.ai library to build recommender engine for yelp is present at: Recommender system using fast.ai

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Input		Input
Results		Results
Mansi_Ganatra_task1.py		Mansi_Ganatra_task1.py
Mansi_Ganatra_task2.py		Mansi_Ganatra_task2.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yelp-Recommender-System

Programming Environment

Hardware Specs:

Generating dataset from Yelp Dataset

Running the algorithm

Task1: Jaccard based LSH

Command to run:

Task2: Recommendation System

Command to run:

Validation Set Baselines:

About

Releases

Packages

Languages

mansiganatra/Yelp-Recommender-System

Folders and files

Latest commit

History

Repository files navigation

Yelp-Recommender-System

Programming Environment

Hardware Specs:

Generating dataset from Yelp Dataset

Running the algorithm

Task1: Jaccard based LSH

Command to run:

Task2: Recommendation System

Command to run:

Validation Set Baselines:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages