Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mind utils #1247

Merged
Merged

Conversation

yjw1029
Copy link
Contributor

@yjw1029 yjw1029 commented Nov 23, 2020

Description

add examples/01_prepare_data/prepare_mind_utils.ipynb to generate

  • embedding.npy
  • word_dict.pkl
  • embedding_all.npy
  • word_dict_all.pkl
  • vert_dict.pkl
  • subvert_dict.pkl
  • uid2index.pkl

Related Issues

#1182
#1238

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • This PR is being made to staging and not master.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@yjw1029
Copy link
Contributor Author

yjw1029 commented Nov 24, 2020

@miguelgfierro Could you please help review the PR?

@@ -0,0 +1,528 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor detail, would you mind changing the private function _download_and_extract_globe to public download_and_extract_globe. In other notebooks we don't import private functions


Reply via ReviewNB

@@ -0,0 +1,528 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please move all imports to the first cell? In the rest of the notebooks we follow that convention


Reply via ReviewNB

@@ -0,0 +1,528 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to move this function to the libraries? maybe in the mind utils from reco_utils.dataset.mind. Would you please explain what the regex does in the docstring?


Reply via ReviewNB

@@ -0,0 +1,528 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, please move this to the utils and add docstrings. Also, it might be easier for our users to understand what this function does if the title is more explicit. Some ideas would be load_glove_matrix, generate_embeddings, generate_embedding_matrix or any other you think is better


Reply via ReviewNB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also would be nice to have some comments in the codes what's going there. e.g. what's the data in l[0] vs l[1:] ? People who are not familiar w/ MIND dataset like me :-) will appreciate those comments.

Copy link
Collaborator

@loomlike loomlike Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wordvec = [float(x) for x in l[1:]]
if word in word_dict:

This part, seems we don't need to initialize wordvec if word is not in word_dict, meaning we can move wordvec = ... under if statement. That means, we can use 1 if-statement like:

if len(word) > 0 and word in word_dict:
do something here

or if we sure word_dict doesn't include any len(word) == 0, simply:

if word in word_dict:

w/o checking len(word) > 0

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good, I have just small format suggestions. Thanks @yjw1029!!

@@ -87,6 +87,19 @@ def test_wikidata_runs(notebooks, tmp):
),
)

@pytest.mark.notebooks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does this test take? if it takes too long (ex. more than a couple of minutes) it might be better to move it to smoke

@miguelgfierro
Copy link
Collaborator

hi @yjw1029, I would like to follow up to see whether you have seen the comments. Thanks!

@yjw1029
Copy link
Contributor Author

yjw1029 commented Jan 9, 2021

Sorry for the late update. Already make the changes.

@miguelgfierro
Copy link
Collaborator

great @yjw1029, thanks for the contribution!

@miguelgfierro miguelgfierro merged commit 2fc0bc3 into recommenders-team:staging Jan 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants