Incorrect CBOW implementation in Gensim leads to inferior performance #3266
Labels
bug
Issue described a bug
difficulty medium
Medium issue: required good gensim understanding & python skills
impact LOW
Low impact on affected users
reach MEDIUM
Affects a significant number of users
Problem description
According to this article https://aclanthology.org/2021.insights-1.1.pdf:
Steps/code/corpus to reproduce
I haven't tried to verify / reproduce. Gensim's goal is to follow the original C implementation faithfully, which it does. So this is not a bug per se, more a question of "how whether / how much we want to deviate from the reference implementation". I'm in favour if the result is unambiguous better (more accurate, faster, no downsides).
Versions
All versions since the beginning of word2vec in Gensim.
The text was updated successfully, but these errors were encountered: