-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Realtime recommendations with ALS #126
Comments
Awesome! If you write a blog post I will link to it from the readme. Your approach makes sense, I've done similar things for serving requests in production before (for news recommendations). I tried out using CUDA before via cupy, but found it didn't noticeably speed things up. The dot product does take a fair amount of time, but I think the argpartition to get the top N results probably dominates the time. The dot product can easily be ported to CUDA, but getting an efficient selection algorithm is a little tricky (like cupy just sort's the whole input : cupy/cupy#294 ). |
So I did some benchmarking on my actual data & requests from the access logs. https://gist.github.com/dmitriyshashkin/7a85e6fd9a270d999bc79ebe1e398084 It confirms my earlier conclusion that dot product takes far more (30x) time than argpartition. Perhaps it's due to some peculiarities of my data. I guess I should try similar benchmark on lastfm dataset. Ran same benchmark with GPU acceleration (on Google colab) https://gist.github.com/dmitriyshashkin/8f5d7eb3a36096bf5a6bb304163e0f36 and you're absolutely right. Argpartition in cupy is incredibly slow, it kills all performance gains from fast dot product. |
I think your data is probably pretty normal - it makes sense that the dot product on the CPU will take more time. I was just noticing that the cupy code for argpartition didn't seem all that efficient =(. For speeding up generating the results, I think the best take right now is to use something like NMSLIB or FAISS to generate approximate results. This will speed up recommendations by several orders of magnitude in your case, with only a slight loss in precision. |
So I've found a way to efficiently generate real-time recommendations with ALS model:
It works pretty well (0.2 seconds per request on my 100M, 500K unique items dataset), but apparently, the main bottleneck is calculating dot product between user factors and item factors, almost all the time is spent on this operation. This can be easily parallelized so I managed to get an enormous speed up using CUDA to calculate dot product.
So my questions are:
The text was updated successfully, but these errors were encountered: