Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect CBOW implementation in Gensim leads to inferior performance #3266

Closed
piskvorky opened this issue Nov 9, 2021 · 3 comments
Closed
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills impact LOW Low impact on affected users reach MEDIUM Affects a significant number of users

Comments

@piskvorky
Copy link
Owner

Problem description

According to this article https://aclanthology.org/2021.insights-1.1.pdf:

Screen Shot 2021-11-09 at 15 47 21

Steps/code/corpus to reproduce

I haven't tried to verify / reproduce. Gensim's goal is to follow the original C implementation faithfully, which it does. So this is not a bug per se, more a question of "how whether / how much we want to deviate from the reference implementation". I'm in favour if the result is unambiguous better (more accurate, faster, no downsides).

Versions

All versions since the beginning of word2vec in Gensim.

@piskvorky piskvorky added bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills reach MEDIUM Affects a significant number of users impact LOW Low impact on affected users labels Nov 9, 2021
@piskvorky
Copy link
Owner Author

@gojomo @mpenkov WDYT?

@gojomo
Copy link
Collaborator

gojomo commented Nov 15, 2021

I remain as unconvinced as when this was raised earlier this year. I'll add more comments there. I suggest this be closed as dupe.

@piskvorky
Copy link
Owner Author

Thanks! I remember that discussion now.

Closing as duplicate of #1873.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills impact LOW Low impact on affected users reach MEDIUM Affects a significant number of users
Projects
None yet
Development

No branches or pull requests

2 participants