Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GensimBackend #71

Closed
XiepengLi opened this issue Oct 29, 2021 · 6 comments
Closed

GensimBackend #71

XiepengLi opened this issue Oct 29, 2021 · 6 comments

Comments

@XiepengLi
Copy link

AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0. Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead. See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

@XiepengLi
Copy link
Author

AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0. Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead. See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

vector_size = self.embedding_model.vector_size

@MaartenGr
Copy link
Owner

Could you share how you have initialized the model? Also, which version of KeyBERT and Gensim are you using?

@MaartenGr
Copy link
Owner

The last response has been a while ago, so I'll be closing this issue for now. Having said that, let me know if you are still experiencing this issue and I'll make sure to reopen it.

@dopc
Copy link

dopc commented Aug 2, 2022

I have faced with the same problem,

how I have initialized the model:

import gensim.downloader as api
from keybert import KeyBERT

ft = api.load('fasttext-wiki-news-subwords-300')
kw_model = KeyBERT(model=ft)

the versions I am using:

gensim==4.2.0
keybert==0.6.0

and the error message:

File ~/miniconda3/envs/myenv/lib/python3.10/site-packages/keybert/backend/_gensim.py:53, in GensimBackend.embed(self, documents, verbose)
     40 def embed(self, documents: List[str], verbose: bool = False) -> np.ndarray:
     41     """Embed a list of n documents/words into an n-dimensional
     42     matrix of embeddings
     43 
   (...)
     50         that each have an embeddings size of `m`
     51     """
     52     vector_shape = self.embedding_model.word_vec(
---> 53         list(self.embedding_model.vocab.keys())[0]
     54     ).shape
     55     empty_vector = np.zeros(vector_shape[0])
     57     embeddings = []

File ~/miniconda3/envs/myenv/lib/python3.10/site-packages/gensim/models/keyedvectors.py:735, in KeyedVectors.vocab(self)
    733 @property
    734 def vocab(self):
--> 735     raise AttributeError(
    736         "The vocab attribute was removed from KeyedVector in Gensim 4.0.0.\n"
    737         "Use KeyedVector's .key_to_index dict, .index_to_key list, and methods "
    738         ".get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.\n"
    739         "See [https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4](https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4%3C/span%3E%3Cspan) style="color:rgb(175,0,0)">"
    740     )

AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

BR.

@MaartenGr
Copy link
Owner

@dopc The way Gensim accesses the embeddings in 4.0 was chanced and not yet updated in KeyBERT. For now, using Gensim < 4.0 should work without any issues. I'll make sure it gets updated in the upcoming released.

@dopc
Copy link

dopc commented Aug 3, 2022

Thanks so much!

MaartenGr added a commit that referenced this issue Oct 11, 2022
@MaartenGr MaartenGr mentioned this issue Oct 11, 2022
MaartenGr added a commit that referenced this issue Nov 3, 2022
* Added option to extract and pass word/document embeddings for faster iteration
* Focused on making the documentation a bit nicer (visualizations, etc. )
* Fixed #71
* Fixed #122, #136
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants