Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realtime recommendations with ALS #126

Open
dmitriyshashkin opened this issue Jun 22, 2018 · 3 comments
Open

Realtime recommendations with ALS #126

dmitriyshashkin opened this issue Jun 22, 2018 · 3 comments

Comments

@dmitriyshashkin
Copy link

So I've found a way to efficiently generate real-time recommendations with ALS model:

  1. Train model & serialize item factors
  2. Spawn workers that will work on generating recommendations. In each create an empty model and fill it with unserialized item factors. User factors can be left empty as they won't be used. With pyarrow's plasma storage it's possible to use the same memory to store item factors for all workers.
  3. For each request populate user items with the items corresponding to the user requesting recommendations. Call recommend method with recalculate_user=True

It works pretty well (0.2 seconds per request on my 100M, 500K unique items dataset), but apparently, the main bottleneck is calculating dot product between user factors and item factors, almost all the time is spent on this operation. This can be easily parallelized so I managed to get an enormous speed up using CUDA to calculate dot product.

So my questions are:

  1. Does my approach to real-time recommendations make sense? Perhaps I'm missing something?
  2. Using CUDA to get dot product in recommend method might improve performance great time. Do you consider implementing it?
  3. Should I describe my approach in a blog post? Do you think it will be useful to other users?
@benfred
Copy link
Owner

benfred commented Jun 26, 2018

Awesome! If you write a blog post I will link to it from the readme.

Your approach makes sense, I've done similar things for serving requests in production before (for news recommendations).

I tried out using CUDA before via cupy, but found it didn't noticeably speed things up. The dot product does take a fair amount of time, but I think the argpartition to get the top N results probably dominates the time. The dot product can easily be ported to CUDA, but getting an efficient selection algorithm is a little tricky (like cupy just sort's the whole input : cupy/cupy#294 ).

@dmitriyshashkin
Copy link
Author

dmitriyshashkin commented Jun 30, 2018

So I did some benchmarking on my actual data & requests from the access logs. https://gist.github.com/dmitriyshashkin/7a85e6fd9a270d999bc79ebe1e398084 It confirms my earlier conclusion that dot product takes far more (30x) time than argpartition. Perhaps it's due to some peculiarities of my data. I guess I should try similar benchmark on lastfm dataset.

Ran same benchmark with GPU acceleration (on Google colab) https://gist.github.com/dmitriyshashkin/8f5d7eb3a36096bf5a6bb304163e0f36 and you're absolutely right. Argpartition in cupy is incredibly slow, it kills all performance gains from fast dot product.

@benfred
Copy link
Owner

benfred commented Jul 16, 2018

I think your data is probably pretty normal - it makes sense that the dot product on the CPU will take more time. I was just noticing that the cupy code for argpartition didn't seem all that efficient =(.

For speeding up generating the results, I think the best take right now is to use something like NMSLIB or FAISS to generate approximate results. This will speed up recommendations by several orders of magnitude in your case, with only a slight loss in precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants