Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addition of item similarity measure - python version #1522

Merged
merged 2 commits into from
Sep 20, 2021

Conversation

YanZhangADS
Copy link
Collaborator

Description

Related Issues

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • [x ] I have added tests covering my contributions.
  • [x ] I have updated the documentation accordingly.
  • [x ] This PR is being made to staging branch and not to main branch.

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yan this is really nice, I have one point for discussion

recommenders/evaluation/python_evaluation.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

Codecov Report

Merging #1522 (20b0b53) into staging (1cab0e2) will increase coverage by 0.17%.
The diff coverage is 94.01%.

Impacted file tree graph

@@             Coverage Diff             @@
##           staging    #1522      +/-   ##
===========================================
+ Coverage    62.03%   62.20%   +0.17%     
===========================================
  Files           84       84              
  Lines         8397     8441      +44     
===========================================
+ Hits          5209     5251      +42     
- Misses        3188     3190       +2     
Impacted Files Coverage Δ
recommenders/evaluation/python_evaluation.py 93.68% <94.01%> (-3.21%) ⬇️
recommenders/datasets/movielens.py 77.52% <0.00%> (+1.12%) ⬆️
recommenders/datasets/criteo.py 86.79% <0.00%> (+3.77%) ⬆️
recommenders/datasets/split_utils.py 88.00% <0.00%> (+4.00%) ⬆️
recommenders/utils/spark_utils.py 96.00% <0.00%> (+8.00%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a3a9ac3...20b0b53. Read the comment docs.

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good Yan, you have changed the code super quickly.
The only thing I see is that these changes also affect the notebooks right? but the tests didn't fail, so are we not testing the diversity notebook?

@YanZhangADS
Copy link
Collaborator Author

This is really good Yan, you have changed the code super quickly.
The only thing I see is that these changes also affect the notebooks right? but the tests didn't fail, so are we not testing the diversity notebook?

diversity notebook only use spark version as an example. We do not have example diversity notebook for python version.

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is super good Yan

@@ -696,46 +698,131 @@ def get_top_k_items(
}

# diversity metrics
class PythonDiversityEvaluation:
"""Python Diversity Evaluator"""
def check_column_dtypes_diversity_serendipity(func):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have this decorator private (with an '_' at the front of the name)?

Copy link
Collaborator Author

@YanZhangADS YanZhangADS Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not use _ in existing code, e.g. the function "check_column_dtypes" does not have _ in front of the function name. Therefore I don't add _ to be consistent.

Copy link
Collaborator

@anargyri anargyri Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I meant adding just an underscore, without any quotation marks. Currently you see these methods in readthedocs, where they are probably not needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another issue with the docstrings. The args are missing in the python functions e.g. here because before they were inside the encapsulating class but now they are required.


The metric definitions/formulations are based on the following references with modification:
def check_column_dtypes_novelty_coverage(func):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, should this be private?

Copy link
Collaborator

@anargyri anargyri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, thanks!

@anargyri anargyri merged commit d43e450 into staging Sep 20, 2021
@anargyri
Copy link
Collaborator

There are some long lines (caught by flake).

@YanZhangADS
Copy link
Collaborator Author

There are some long lines (caught by flake).

I used "black" to format the files.

@YanZhangADS
Copy link
Collaborator Author

Thanks @anargyri for catching many issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants