Skip to content

Text classifier for Hierarchical Attention Networks for Document Classification

License

Notifications You must be signed in to change notification settings

NeeMax/textClassifier

 
 

Repository files navigation

textClassifier

textClassifierHATT.py has the implementation of Hierarchical Attention Networks for Document Classification. Please see the my blog for full detail. Also see Keras Google group discussion

textClassifierConv has implemented Convolutional Neural Networks for Sentence Classification - Yoo Kim. Please see the my blog for full detail.

textClassifierRNN has implemented bidirectional LSTM and one level attentional RNN. Please see the my blog for full detail.

update on 6/22/2017

To derive the attention weight which can be useful to identify important words for the classification. Please see my latest update on the post. All you need to do is run a forward pass right before attention layer output. The result is not very promising. I will update the post once I have further result.


This repo is forked from https://github.com/richliao/textClassifier and we find some issue here. So we update the textClassifierHATT with python 2.7 and keras 2.0.8

# clone the repo
git clone {repo address}

# install Dependent library
cd textClassifier
pip install -r req.xt

# download imdb train from Kaggle in the below link and keep the files in the working directory
https://www.kaggle.com/c/word2vec-nlp-tutorial/download/labeledTrainData.tsv
# download glove word vector
wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip

# install nltk 'punkt' using the following code in python interpretor
>>>import nltk
>>>nltk.download('punkt')

# train the model
python textClassifierHATT.py

# note if in case while installing word2vec, cython error occurs then 
pip install --upgrade cython

Enjoy!

About

Text classifier for Hierarchical Attention Networks for Document Classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%