GitHub - riddhi2000/Predictive_Text_Embedding: Generating Word2Vec (Text embeddings)

Text Embedding is the task of generating embeddings (low dimensional vector representations or Word2Vec) for the text. Several techniques are there to generate so, but they dont scale every well. Technique presented here is very scalable and works very well on hetrogenous graphs.

This is an implementation of the following paper: PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks by Jian Tang, Meng Qu, Qiaozhu Mei. KDD’15, August 10-13, 2015, Sydney, NSW, Australia.

Using the text embeddings generated by the algorithm, we have done the sentiment analysis for movie reviews data and results are outstanding (matches with what described in the paper). We got ~89% accuracy.

Dependency:

1> Numpy 2> Scipy 3> Theano

Steps to run code:

python train.py python test.py

Link For the Dataset used: http://ai.stanford.edu/~amaas/data/sentiment/

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/small		data/small
LICENSE		LICENSE
PTE_paper.pdf		PTE_paper.pdf
Predictive-Text-Embedding.pdf		Predictive-Text-Embedding.pdf
Readme.md		Readme.md
pte_theano.py		pte_theano.py
read_data.py		read_data.py
read_data.pyc		read_data.pyc
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

riddhi2000/Predictive_Text_Embedding

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages