Implicit

Fast Python Collaborative Filtering for Implicit Datasets.

This project provides fast Python implementations of the algorithms described in the paper Collaborative Filtering for Implicit Feedback Datasets and in Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering.

To install:

pip install implicit

Basic usage:

import implicit

# initialize a model
model = implicit.als.AlternatingLeastSquares(factors=50)

# train the model on a sparse matrix of item/user/confidence weights
model.fit(item_user_data)

# recommend items for a user
user_items = item_user_data.T.tocsr()
recommendations = model.recommend(userid, user_items)

# find related items
related = model.similar_items(itemid)

The examples folder has a program showing how to use this to compute similar artists on the last.fm dataset.

For more information see the documentation.

Articles about Implicit

Several posts have been written talking about using Implicit to build recommendation systems:

There are also a couple posts talking about the algorithms that power this library:

Requirements

This library requires SciPy version 0.16 or later. Running on OSX requires an OpenMP compiler, which can be installed with homebrew: brew install gcc.

Why Use This?

This library came about because I was looking for an efficient Python implementation of this algorithm for a blog post on matrix factorization. The other python packages were too slow, and integrating with a different language or framework was too cumbersome.

The core of this package is written in Cython, leveraging OpenMP to parallelize computation. Linear Algebra is done using the BLAS and LAPACK libraries distributed with SciPy. This leads to extremely fast matrix factorization.

On a simple benchmark, this library is about 1.8 times faster than the multithreaded C++ implementation provided by Quora's QMF Library and at least 60,000 times faster than implicit-mf.

A follow up post describes further performance improvements based on the Conjugate Gradient method - that further boosts performance by 3x to over 19x depending on the number of factors used.

This library has been tested with Python 2.7 and 3.5. Running 'tox' will run unittests on both versions, and verify that all python files pass flake8.

Optimal Configuration

I'd recommend configure SciPy to use Intel's MKL matrix libraries. One easy way of doing this is by installing the Anaconda Python distribution.

For systems using OpenBLAS, I highly recommend setting 'export OPENBLAS_NUM_THREADS=1'. This disables its internal multithreading ability, which leads to substantial speedups for this package.

Released under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
benchmarks		benchmarks
docs		docs
examples		examples
implicit		implicit
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
appveyor.yml		appveyor.yml
cuda_setup.py		cuda_setup.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implicit

Articles about Implicit

Requirements

Why Use This?

Optimal Configuration

About

Releases

Packages

Languages

License

OleksiiZuiev/implicit

Folders and files

Latest commit

History

Repository files navigation

Implicit

Articles about Implicit

Requirements

Why Use This?

Optimal Configuration

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages