Skip to content

This library trains word embeddings on a corpus, along with context clue and subword embeddings jointly. The word embeddings are used for normal embeding tasks, and the context clue and subword embeddings are used for estimating out-of-vocabulary (OOV) word embeddings. This code is heavily based on the gensim code: https://radimrehurek.com/gensim/

Notifications You must be signed in to change notification settings

rajicon/Estimator_Vectors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Estimator_Vectors

This library trains word embeddings on a corpus, along with context clue and subword embeddings jointly. The word embeddings are used for normal embeding tasks, and the context clue and subword embeddings are used for estimating out-of-vocabulary (OOV) word embeddings. This code is heavily based on the gensim code: https://radimrehurek.com/gensim/

For more information, see our IJCNN paper Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimate by Raj Patel and Carlotta Domeniconi.

To use this code, you will need Cython and the gensim library. As mentioned earlier, this library is heavily based on gensim code, and a lot of the files pertain to them.

For usage, please refer to text8_example.py

The code is very rough right now, but I will improve it as time goes on.

About

This library trains word embeddings on a corpus, along with context clue and subword embeddings jointly. The word embeddings are used for normal embeding tasks, and the context clue and subword embeddings are used for estimating out-of-vocabulary (OOV) word embeddings. This code is heavily based on the gensim code: https://radimrehurek.com/gensim/

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published