Skip to content

marklit/recommend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

Requirements

On Ubuntu 14.04.2:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install -yq oracle-java7-installer scala git python-virtualenv python-dev unzip
curl -O http://apache.cs.utah.edu/spark/spark-1.3.0/spark-1.3.0.tgz
tar xvf spark-1.3.0.tgz
cd spark-1.3.0/
build/sbt assembly
virtualenv spark_venv
source spark_venv/bin/activate
git clone https://github.com/marklit/recommend.git
cd recommend
pip install -r requirements.txt

Film ratings data

curl -O http://files.grouplens.org/papers/ml-1m.zip
unzip -j ml-1m.zip "*.dat"

Example outputs

Training

$ ../bin/spark-submit recommend.py train ratings.dat
Ratings:      1,000,209
Users:            6,040
Movies:           3,706

Training:       602,241
Validation:     198,919
Test:           199,049

The best model was trained with:
    Rank:                     12
    Lambda:             0.100000
    Iterations:               20
    RMSE on test set:   0.869235
$ ../bin/spark-submit recommend.py train ratings.dat \
    --ranks=8,9,10 --lambdas=0.31,0.32,0.33 --iterations=3
The best model was trained with:
    Rank:                     10
    Lambda:             0.320000
    Iterations:                3
    RMSE on test set:   0.931992
$ ../bin/spark-submit recommend.py train ratings.dat \
    --ranks=5,10,15,20 --lambdas=0.33,0.5,0.8,0.9 --iterations=3,6,9
The best model was trained with:
    Rank:                     15
    Lambda:             0.330000
    Iterations:                3
    RMSE on test set:   0.939317

Recommending

$ ../bin/spark-submit recommend.py recommend ratings.dat movies.dat
His Girl Friday (1940)
New Jersey Drive (1995)
Breakfast at Tiffany's (1961)
Halloween 5: The Revenge of Michael Myers (1989)
Just the Ticket (1999)
I'll Be Home For Christmas (1998)
Goya in Bordeaux (Goya en Bodeos) (1999)
For the Moment (1994)
Thomas and the Magic Railroad (2000)
Message in a Bottle (1999)
...
$ ../bin/spark-submit recommend.py recommend ratings.dat movies.dat \
    --rank=15 --lambda=0.33 --iteration=3
Goya in Bordeaux (Goya en Bodeos) (1999)
Slums of Beverly Hills, The (1998)
New Jersey Drive (1995)
Bottle Rocket (1996)
I'll Be Home For Christmas (1998)
Big Daddy (1999)
Kurt & Courtney (1998)
Kika (1993)
Omega Man, The (1971)
Boogie Nights (1997)
...

About

Film recommendations with Apache Spark and Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages