-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Cornac BPR deep dive notebook #937
Add Cornac BPR deep dive notebook #937
Conversation
Staging to master
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
Hi @yueguoguo, I see the failed checks. How do I get access to the log to know the reasons? |
We currently don't have a way for external contributors to trigger the tests from the forked branch, so these aren't failures due to your code. Thanks for putting this in. I'll take a look at it soon. |
/azp run |
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
@eisber, I'm trying to set the trick you told me, but I'm getting this error:
Have you seen this error? staging shouldn't be excluded |
This is going to run on the fork not staging right? We would need to enable that somehow, but just disabled automatic triggers at pull request. |
Thanks @tqtg for the contribution! We may have to come back to this after sorting out some of the testing pipeline issues. |
tests/unit/test_cornac_utils.py
Outdated
item = rating_true.iloc[1]["itemID"] | ||
assert preds[(preds["userID"] == user) & (preds["itemID"] == item)][ | ||
"prediction" | ||
].values == pytest.approx(mf.rate(train_set.uid_map[user], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we are cheating :-) because we are comparing two routines that are the same. What we want is to compare the prediction that comes from predict_rating
with a df explicitly created by us as a fixture, so you would need to generate a new df with the result.
Question, any reason to select MF instead of BPR for the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is still valid. We are testing ground-truth training data and the predictions from a trained MF model. In theory, MF should be able to overfit on this small input data, thus, it will be a small approximation error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MF is used instead of BPR because I want to reuse tests from Surprise (MF is equivalent to SVD in Surprise). For the purpose of testing utility functions, it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @miguelgfierro ,
Tests are updated using some evaluation metrics. It might not be that clean since it involves the implementation of metrics. However, I think it's reliable enough to be unit tests.
tests/unit/test_cornac_utils.py
Outdated
train_set = cornac.data.Dataset.from_uir(rating_true.itertuples(index=False), seed=42) | ||
mf.fit(train_set) | ||
|
||
preds = predict_ranking(mf, rating_true, remove_seen=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, you would need to input the result df as a fixture and compare it with the output of predict_ranking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
tests/unit/test_cornac_utils.py
Outdated
assert preds["itemID"].dtypes == rating_true["itemID"].dtypes | ||
user = preds.iloc[1]["userID"] | ||
item = preds.iloc[1]["itemID"] | ||
assert preds[(preds["userID"] == user) & (preds["itemID"] == item)][ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
"\n", | ||
"One usual approach for item recommendation is directly predicting a preference score $\\hat{x}_{u,i}$ given to item $i$ by user $u$. BPR uses a different approach by using item pairs $(i, j)$ and optimizing for the correct ranking given preference of user $u$, thus, there are notions of *positive* and *negative* items. The training data $D_S : U \\times I \\times I$ is defined as:\n", | ||
"\n", | ||
"$$D_S := \\{(u, i, j) \\mid i \\in I^{+}_{u} \\wedge j \\in I \\setminus I^{+}_{u}\\}$$\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you mind to check if the math is rendered correctly? I can't see it on github
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rendered properly on my browser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the formulas are rendered correctly on GitHub.
hey @tqtg I'm really impressed how quickly you were able to understand our ways of working. You did a deep dive, added the tests, even added black. To be honest, not many people outside our team are able to ramp up that fast, even some folks at MS don't do it. I would like to ask you, how much time did you spend reading our documentation to be able to create the PR? also, if you think there is something that can be clearer, also when there is too much or less info about anything |
Thank you @miguelgfierro for your kind words. To be honest, I didn't read the documentation but read through the codebase (it's quite easy to read and not that big though :D). To create the notebook, I mainly based on other available notebooks and existing utilities. For the whole project, to me, it's quite clear and well structured, though, some part of the documentation is not rendered properly I guess. I hope to contribute more to the project causes it's very useful for the community! |
Awesome @tqtg ! Thanks again. which part of the doc was rendered wrongly? |
For example: https://microsoft-recommenders.readthedocs.io/en/latest/dataset.html |
@tqtg good catch - it looks like the readthedoc build needs a pre-configured environment to successfully parse the source files. FYI @miguelgfierro |
good catch @tqtg, thanks |
@@ -0,0 +1,595 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
Cool. Ok @miguelgfierro , ready to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome!
hey @tqtg for large contributions like yours, we are adding the authors to this list: https://github.com/microsoft/recommenders/blob/staging/AUTHORS.md#contributors--sorted-alphabetically Feel free to add your name if you want |
@miguelgfierro: can you tell me more how to add myself in? Would it be another PR from stagging? |
Description
Add Cornac Bayesian Personalized Ranking (BPR) deep dive notebook
Related Issues
#931
Checklist: