Text Embedding is the task of generating embeddings (low dimensional vector representations or Word2Vec) for the text. Several techniques are there to generate so, but they dont scale every well. Technique presented here is very scalable and works very well on hetrogenous graphs.
This is an implementation of the following paper: PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks by Jian Tang, Meng Qu, Qiaozhu Mei. KDD’15, August 10-13, 2015, Sydney, NSW, Australia.
Using the text embeddings generated by the algorithm, we have done the sentiment analysis for movie reviews data and results are outstanding (matches with what described in the paper). We got ~89% accuracy.
1> Numpy 2> Scipy 3> Theano
Steps to run code:
python train.py python test.py
Link For the Dataset used: http://ai.stanford.edu/~amaas/data/sentiment/