-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
word2vec doc-comment example of KeyedVectors usage broken #2669
Comments
I agree on the simplicity and local-ness. Choosing the file destination path is not the main point of that exposition (even if fixed to be consistent). IIRC @mpenkov changed this recently, the reason being cleaner testing (I think). Cleaner testing is a distant second to cleaner exposition, but there may have been something else too. @mpenkov what were the pros/cons of that get-temp-dir obfuscation? |
I don't think this is one of my changes (e.g. I'm -1 on using the CWD as scratch space for models. You never know what the CWD is when running the tutorials (it may be different each time they run the examples depending on how they're running them). We risk having gensim crap over the user's filesystem, which is sub-optimal. We could write all examples to e.g. .gensim-scratch under the user's home, if using the temporary directory sounds like a bad idea. That will give users a permanent scratch space that they control, and can trash easily whenever they want, as opposed to hunting for temporary files gensim may have created in whatever happened to the the CWD at the time. |
Sorry, my bad. I must have mixed it up with some other conversation.
Yes, that makes sense for comprehensive tutorials. But my understanding is this issue is about inline doc examples instead, little code snippets. There I wouldn't expect any complex code scaffolding, top-to-bottom continuity and protracted explanations ("scratch spaces" etc). These will A lose-lose situation. |
Much appreciated if someone could review the linked PR. It is my first one. Please let me know if you prefer a different approach to address. Thank you. |
@bvinesh - looks good, although I'm personally in the camp which thinks even creating/using a temp-dir is distracting overkill, for these tiny illustrative snippets. (I get the theoretical benefit of having them truly be executable and thus auto-testable, but if it requires extra unrealistic scaffolding, that's an offsetting loss in clarity for beginners.) |
@gojomo if all agree I can get rid of the temp dir altogether. Let me know. Thanks. |
The usage example in the word2vec.py doc-comment regarding
KeyedVectors
uses inconsistent paths and thus doesn't work.https://github.com/RaRe-Technologies/gensim/blob/e859c11f6f57bf3c883a718a9ab7067ac0c2d4cf/gensim/models/word2vec.py#L73
https://github.com/RaRe-Technologies/gensim/blob/e859c11f6f57bf3c883a718a9ab7067ac0c2d4cf/gensim/models/word2vec.py#L76
If vectors were saved to a tmpfile-path based on the filename
'wordvectors.kv'
, they need to loaded from that same path, not some other local-directory file named 'model.wv'.(Also, in my opinion the use of
get_tmpfile()
adds unnecessary extra complexity to this example. People usually don't want their models in a "temp" directory, which some systems will occasionally delete, so the examples might as well do the simplest possible thing: store in the current working directory with simple string filenames. The example code above this is also confused, because it creates a temp-file path, but then doesn't actually use it, choosing to do the simple & right thing with a local file instead.)The text was updated successfully, but these errors were encountered: