-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addition of item similarity measure - python version #1522
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yan this is really nice, I have one point for discussion
Codecov Report
@@ Coverage Diff @@
## staging #1522 +/- ##
===========================================
+ Coverage 62.03% 62.20% +0.17%
===========================================
Files 84 84
Lines 8397 8441 +44
===========================================
+ Hits 5209 5251 +42
- Misses 3188 3190 +2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good Yan, you have changed the code super quickly.
The only thing I see is that these changes also affect the notebooks right? but the tests didn't fail, so are we not testing the diversity notebook?
diversity notebook only use spark version as an example. We do not have example diversity notebook for python version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is super good Yan
@@ -696,46 +698,131 @@ def get_top_k_items( | |||
} | |||
|
|||
# diversity metrics | |||
class PythonDiversityEvaluation: | |||
"""Python Diversity Evaluator""" | |||
def check_column_dtypes_diversity_serendipity(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we have this decorator private (with an '_' at the front of the name)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did not use _ in existing code, e.g. the function "check_column_dtypes" does not have _ in front of the function name. Therefore I don't add _ to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I meant adding just an underscore, without any quotation marks. Currently you see these methods in readthedocs, where they are probably not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another issue with the docstrings. The args are missing in the python functions e.g. here because before they were inside the encapsulating class but now they are required.
|
||
The metric definitions/formulations are based on the following references with modification: | ||
def check_column_dtypes_novelty_coverage(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar, should this be private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, thanks!
There are some long lines (caught by flake). |
I used "black" to format the files. |
Thanks @anargyri for catching many issue! |
Description
Related Issues
Checklist:
staging branch
and not tomain branch
.