Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

word2vec doc-comment example of KeyedVectors usage broken #2669

Open
gojomo opened this issue Nov 5, 2019 · 6 comments · May be fixed by #2789
Open

word2vec doc-comment example of KeyedVectors usage broken #2669

gojomo opened this issue Nov 5, 2019 · 6 comments · May be fixed by #2789
Labels
difficulty easy Easy issue: required small fix documentation Current issue related to documentation good first issue Issue for new contributors (not required gensim understanding + very simple)

Comments

@gojomo
Copy link
Collaborator

gojomo commented Nov 5, 2019

The usage example in the word2vec.py doc-comment regarding KeyedVectors uses inconsistent paths and thus doesn't work.

https://github.com/RaRe-Technologies/gensim/blob/e859c11f6f57bf3c883a718a9ab7067ac0c2d4cf/gensim/models/word2vec.py#L73

https://github.com/RaRe-Technologies/gensim/blob/e859c11f6f57bf3c883a718a9ab7067ac0c2d4cf/gensim/models/word2vec.py#L76

If vectors were saved to a tmpfile-path based on the filename 'wordvectors.kv', they need to loaded from that same path, not some other local-directory file named 'model.wv'.

(Also, in my opinion the use of get_tmpfile() adds unnecessary extra complexity to this example. People usually don't want their models in a "temp" directory, which some systems will occasionally delete, so the examples might as well do the simplest possible thing: store in the current working directory with simple string filenames. The example code above this is also confused, because it creates a temp-file path, but then doesn't actually use it, choosing to do the simple & right thing with a local file instead.)

@gojomo gojomo added documentation Current issue related to documentation difficulty easy Easy issue: required small fix good first issue Issue for new contributors (not required gensim understanding + very simple) labels Nov 5, 2019
@piskvorky
Copy link
Owner

piskvorky commented Nov 6, 2019

I agree on the simplicity and local-ness. Choosing the file destination path is not the main point of that exposition (even if fixed to be consistent).

IIRC @mpenkov changed this recently, the reason being cleaner testing (I think). Cleaner testing is a distant second to cleaner exposition, but there may have been something else too. @mpenkov what were the pros/cons of that get-temp-dir obfuscation?

@mpenkov
Copy link
Collaborator

mpenkov commented Nov 6, 2019

I don't think this is one of my changes (e.g. git blame).

I'm -1 on using the CWD as scratch space for models. You never know what the CWD is when running the tutorials (it may be different each time they run the examples depending on how they're running them). We risk having gensim crap over the user's filesystem, which is sub-optimal.

We could write all examples to e.g. .gensim-scratch under the user's home, if using the temporary directory sounds like a bad idea. That will give users a permanent scratch space that they control, and can trash easily whenever they want, as opposed to hunting for temporary files gensim may have created in whatever happened to the the CWD at the time.

@piskvorky
Copy link
Owner

piskvorky commented Nov 6, 2019

I don't think this is one of my changes (e.g. git blame).

Sorry, my bad. I must have mixed it up with some other conversation.

I'm -1 on using the CWD as scratch space for models. You never know what the CWD is when running the tutorials …

Yes, that makes sense for comprehensive tutorials. But my understanding is this issue is about inline doc examples instead, little code snippets.

There I wouldn't expect any complex code scaffolding, top-to-bottom continuity and protracted explanations ("scratch spaces" etc). These will
a) fly over the heads of readers who don't understand filesystem paths,
b) obfuscate the actual point of the snippet for readers who do understand file paths.

A lose-lose situation.

@bvinesh
Copy link

bvinesh commented Apr 16, 2020

Much appreciated if someone could review the linked PR. It is my first one. Please let me know if you prefer a different approach to address. Thank you.

@gojomo
Copy link
Collaborator Author

gojomo commented Apr 16, 2020

@bvinesh - looks good, although I'm personally in the camp which thinks even creating/using a temp-dir is distracting overkill, for these tiny illustrative snippets. (I get the theoretical benefit of having them truly be executable and thus auto-testable, but if it requires extra unrealistic scaffolding, that's an offsetting loss in clarity for beginners.)

@bvinesh
Copy link

bvinesh commented Apr 17, 2020

@gojomo if all agree I can get rid of the temp dir altogether. Let me know. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty easy Easy issue: required small fix documentation Current issue related to documentation good first issue Issue for new contributors (not required gensim understanding + very simple)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants