Take2 Interactive Tech Interview Question

The MovieLens 100K Dataset is used for this exercise. It can be downloaded from https://grouplens.org/datasets/movielens/100k/ for a sample glance.

Dataset is available under https://files.grouplens.org/datasets/movielens/ml-100k.zip

Introduction

All the 3 questions have been implemented as 3 separate functions in a single main.py file . Execution of the main file leads to generated outputs in the target folder. Unit tests check for data integrity issues.A file based logger logs messages to a log file.

Steps to run the project:

Method1: Using Docker

The contents of the zip file must be extracted into any location.
Docker must be installed and is a pre-requisite.
Execute the below commands:

    cd Take2Project
    docker build --rm --pull -t "take2projectfeaturedev:latest" .

At this point the Docker image is built and is assigned a random name. To verify, run

    docker images

We should be seeing an image with the name take2projectfeaturedev .Note the IMAGE ID of this image.
To run a container off the image,

    docker run -it IMAGE ID

This spins up a container that runs the ETL job and print the output files onto the terminal.

Method2: Using python and cmd (Windows Terminal,but universal with only minor changes needed)

First, we must create a virtual environment using below command, and also activate it:

    python3 -m venv venv
    venv\Scripts\activate

Next, we must install the required dependencies using:

    pip install -r requirements.txt

Create a directory for storing the output files.

    mkdir target

Download the ml-100k folder from [https://files.grouplens.org/datasets/movielens/ml-100k.zip].Unzip it and place the ml -100k folder in the current directory. If all good then the current directory structure so far should look like this:

Take2Project ├───ml-100k ├───target └───venv
Run tests

    python -m unittest test_etl.py -v

Run main script

    python main_etl.py

The target folder should have 3 files generated corresponding to each question.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
logger_etl.py		logger_etl.py
main_etl.py		main_etl.py
requirements.txt		requirements.txt
test_etl.py		test_etl.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Take2 Interactive Tech Interview Question

Introduction

Steps to run the project:

Method1: Using Docker

Method2: Using python and cmd (Windows Terminal,but universal with only minor changes needed)

About

Releases

Packages

Languages

halfwind22/pythonml100k

Folders and files

Latest commit

History

Repository files navigation

Take2 Interactive Tech Interview Question

Introduction

Steps to run the project:

Method1: Using Docker

Method2: Using python and cmd (Windows Terminal,but universal with only minor changes needed)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages