Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curious what you would recommend for real-time training + prediction models? #491

Open
victusfate opened this issue Nov 9, 2021 · 6 comments

Comments

@victusfate
Copy link

victusfate commented Nov 9, 2021

I admire the api, efficiency, and results of implicit.

I'm finding a need for real time training + prediction in some of my company's systems, and started searching around for ideas/implementations. Has anyone had experience working with this?

Realize this is off topic from implicit (totally understand if it's closed).
Starting to look for ideas here:

@victusfate
Copy link
Author

victusfate commented Nov 23, 2021

After spending some time looking at hrnn and implementations, I switched gears to something simpler to support continuous learning https://github.com/online-ml/river

@victusfate
Copy link
Author

If anyone's curious I'm building an open source version here https://github.com/victusfate/concierge
Just hooked up redis pubsub events into updating the model today

Todo: on server startup get all events since last model training and update each model

@benfred
Copy link
Owner

benfred commented Jan 25, 2022

There are two different things you can do here with implicit to get near-realtime updates with the ALS model :

  1. You can set the recalculate_user flag on the model.recommend calls to automatically regenerate the user representation . This lets your recommendations react to changes in what the user has interacted with at inference time.

  2. I've added support for incremental retraining for ALS models just now with PR Add incremental retraining support for ALS models #527 - which will let you update the model with new items or users, as well as let you recalculate existing items with new interactions.

@victusfate
Copy link
Author

victusfate commented Jan 26, 2022

This is great news, I'd love to compare the results to river-ml since I have more experience with implicit.
When it's ready for review, it'd be great to see a small sample program/example with live updates to the model for recommendations Oh it's already ready to try out, I'll get this on my schedule.

Also worth noting I got the deployed system to work great.

I gather all user item ratings hourly for a full training (snapshot model). When new servers come up they load this model and then delta train from a redis ordered set of all user item ratings since the last model snapshot. In addition live models receive real time updates via redis pubsub.

This way at scale, I can have multiple predictor http servers all yielding similar results (can't guarantee they all receive all updates in the same order), but they are generally convergent.
online-ml/river#803

@sorenrife
Copy link

sorenrife commented Feb 14, 2022

In the case where a user is new, but the server is incapable to fit it yet into the model (as @victusfate explained, cause a pub/sub flow to add new users/items should preferably have certain delay for performance optimisation); How could I recommend to this new user?

Should I use the recommend method with a random userid and pass to user_items the few interactions of this new user? If that is true, could make sense to make the userid parameter optional?

(This assumption is made by not knowing the truly relevance of the userid in the recommend method if the recalculate_user flag is true)

@victusfate
Copy link
Author

victusfate commented Feb 15, 2022

@sorenrife I ended up using popular results for new users in my current deployment using implicit (just hourly trained atm), and I think you can take the same approach with live model updates (keep an active popularity rank going as ratings come in)

something like this (grabbing code snippets from my hourly training) -> df is a pandas data set

    pr = df.groupby([constants.ITEM_COLUMN])[constants.RATING_COLUMN].sum()
    pr = (pr-pr.min())/(pr.max()-pr.min())
    self.item_popularity_map = pr.to_dict()
    self.item_popularity_map = {k: v for k, v in sorted(self.item_popularity_map.items(), key=lambda item: item[1],reverse=True)}

and in the rankings method

  def rankings(self,user_id: str,selected_items):
    ranks = {}
    selected_idx = []
    for selected_item in selected_items:
      selected_idx.append(self.inv_item_map[selected_item])

    # handle novel / unknown users with popularity rank
    if user_id not in self.inv_user_map:
      try:
        # print('rankings selected_items',selected_items)
        for k in selected_idx:
          item_name = self.item_map[k]
          score     = self.item_popularity_map[k]
          # print('rankings k',k,'item_name',item_name,'score',score)
          ranks[item_name] = float(score)
      except Exception as e:
        print('ImplicitPredictor.rankings popularity exception',e)
    else:
      user_idx = self.inv_user_map[user_id]
      try:
        rankings = self.model.rank_items(user_idx, self.user_items, selected_idx)
        for item_idx,prob in rankings:
          item_name = self.item_map[item_idx]
          ranks[item_name] = float(prob)
      except Exception as e:
        print('rankings exception',e)
    return ranks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants