-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add mind utils #1247
add mind utils #1247
Conversation
…nto v-jinyi/add-news-reco-methods
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@miguelgfierro Could you please help review the PR? |
@@ -0,0 +1,528 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor detail, would you mind changing the private function _download_and_extract_globe
to public download_and_extract_globe
. In other notebooks we don't import private functions
Reply via ReviewNB
@@ -0,0 +1,528 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please move all imports to the first cell? In the rest of the notebooks we follow that convention
Reply via ReviewNB
@@ -0,0 +1,528 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind to move this function to the libraries? maybe in the mind utils from reco_utils.dataset.mind
. Would you please explain what the regex does in the docstring?
Reply via ReviewNB
@@ -0,0 +1,528 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, please move this to the utils and add docstrings. Also, it might be easier for our users to understand what this function does if the title is more explicit. Some ideas would be load_glove_matrix
, generate_embeddings
, generate_embedding_matrix
or any other you think is better
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also would be nice to have some comments in the codes what's going there. e.g. what's the data in l[0] vs l[1:] ? People who are not familiar w/ MIND dataset like me :-) will appreciate those comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wordvec = [float(x) for x in l[1:]] if word in word_dict:
This part, seems we don't need to initialize wordvec if word is not in word_dict, meaning we can move wordvec = ... under if statement. That means, we can use 1 if-statement like:
if len(word) > 0 and word in word_dict:do something here
or if we sure word_dict doesn't include any len(word) == 0, simply:
if word in word_dict:
w/o checking len(word) > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good, I have just small format suggestions. Thanks @yjw1029!!
@@ -87,6 +87,19 @@ def test_wikidata_runs(notebooks, tmp): | |||
), | |||
) | |||
|
|||
@pytest.mark.notebooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how long does this test take? if it takes too long (ex. more than a couple of minutes) it might be better to move it to smoke
hi @yjw1029, I would like to follow up to see whether you have seen the comments. Thanks! |
Sorry for the late update. Already make the changes. |
great @yjw1029, thanks for the contribution! |
Description
add examples/01_prepare_data/prepare_mind_utils.ipynb to generate
Related Issues
#1182
#1238
Checklist:
staging
and notmaster
.