Skip to content

Latest commit

 

History

History
191 lines (155 loc) · 4.91 KB

README.md

File metadata and controls

191 lines (155 loc) · 4.91 KB

DJANGO RECOMMENDER SYSTEMS PROJECT

project image

This project's purpose is to showcase recommendation system using machine learning by making an interactive website.

  • Database used: AWS RDS Postgresql
  • Recommender system: Content-based filter (collab filter will not work well because the data used only have user ratings up to year 2020 and this project shows updated movies)
  • Currently deployed on: aws ec2

Access to the url is available upon request to prevent my recommender system api currently deployed on aws lambda to get abused since I am trying to prevent getting past free tier limitations.

TABLE OF CONTENTS

  1. Django Setup
    1. Virtual Environment Setup
    2. Simplify setup (windows)
    3. Create Django App
    4. Run Django Server
  2. AWS Database Setup

DJANGO SETUP

Virtual Environment Setup

conda create --no-default-packages -n <env_name>
conda activate <env_name>
conda install python=3.9

Create Django Project

django-admin startproject simple_django

Simplify setup (windows)

move simple_django\manage.py .\
move simple_django\simple_django\* simple_django
rmdir /s /q simple_django\simple_django\

Create Django App

python manage.py startapp simple_app

Run Django Server

python manage.py runserver

AWS Database Setup

1. Create AWS RDS PostgreSQL Database

  • Create necessary tables
CREATE TABLE temp_storage (
            user_id INTEGER NOT NULL,
            movie_id INTEGER NOT NULL,
            rating NUMERIC(2,1) NOT NULL,
            timestamp BIGINT NOT NULL
        );

CREATE TABLE temp_storage2 (
            movie_id INTEGER PRIMARY KEY,
            imdb_id INTEGER,
            tmdb_id INTEGER
        );

CREATE TABLE movie_info (
		tmdb_id INTEGER PRIMARY KEY,
		title varchar(255)
);

  • Load CSVs to ratings and movie_info table using psql
\copy temp_storage(user_id, movie_id, rating, timestamp) FROM <path_to_csv> WITH (FORMAT CSV, HEADER);
\copy temp_storage2(movie_id, imdb_id, tmdb_id) FROM <path_to_csv> WITH (FORMAT CSV, HEADER);
\copy movie_info(tmdb_id, title) FROM /home/rjomega/github_new/E2E_PROJECTS/recommender_system/ML/data/movies_metadata_mini.csv WITH (FORMAT CSV, HEADER);

NOTE: This is done so we can load easily the data to our tables. We can drop this later on to save space on AWS RDS.


- Check if there are any null
-- COMBINE TWO TABLES TO GET tmdb_id for movie_id (CHECK IF THERE ARE ANY NULL)
SELECT temp1.user_id, temp1.movie_id, temp2.tmdb_id, temp1.rating
FROM temp_storage temp1
LEFT JOIN temp_storage2 temp2
ON temp1.movie_id = temp2.movie_id
WHERE temp2.tmdb_id IS NULL;

-- COUNT NULL tmdb_id (NULL COUNT = 13503)
SELECT COUNT(*) AS null_tmdb_count
FROM temp_storage temp1
LEFT JOIN temp_storage2 temp2
ON temp1.movie_id = temp2.movie_id
WHERE temp2.tmdb_id IS NULL;

  • Load data to our two tables named user_rating and user_info from ratings table
-- CREATE USER_INFO TABLE
CREATE TABLE user_info (
            user_id INTEGER PRIMARY KEY,
			user_password varchar(20)
        );

-- INSERT TO USER_INFO TABLE user_id with tmdb
INSERT INTO user_info (user_id)
SELECT DISTINCT temp1.user_id
FROM temp_storage temp1
LEFT JOIN temp_storage2 temp2
ON temp1.movie_id = temp2.movie_id
WHERE temp2.tmdb_id IS NOT NULL;

-- CREATE USER_RATING TABLE
CREATE TABLE user_rating (
   	rating_id SERIAL,
        	user_id INTEGER REFERENCES user_info (user_id),
        	tmdb_id INTEGER,
        	rating NUMERIC(2,1),
        	PRIMARY KEY (user_id, tmdb_id)
    	);

-- ADD DATA WITHOUT tmdb_id == NULL
INSERT INTO user_rating (user_id,tmdb_id,rating)
SELECT DISTINCT ON (temp1.user_id, temp2.tmdb_id) temp1.user_id, temp2.tmdb_id, temp1.rating
FROM temp_storage temp1
LEFT JOIN temp_storage2 temp2
ON temp1.movie_id = temp2.movie_id
WHERE temp2.tmdb_id IS NOT NULL;

2. Create AWS RDS PostgreSQL Database

  • Change settings.py in our django project

FROM

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
    }
}

TO

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': <DATABASE_NAME>,
        'USER': <USER>,
        'PASSWORD': <PASSWORD>,
        'HOST': <HOST>,
        'PORT': <PORT>,
    }
}

NOTE: Necessary information can be obtained after creation of database in AWS RDS


  • generate model.py
python manage.py inspectdb > <path to models directory>\models.py

NOTE: Make some changes in models.py


  • migrate
python manage.py migrate