Name		Name	Last commit message	Last commit date
parent directory ..
README.rst		README.rst
__init__.py		__init__.py
corpus.py		corpus.py
embedding_mixture.py		embedding_mixture.py
hyperparams.yml		hyperparams.yml
model_setup.py		model_setup.py
nlp_pipeline.py		nlp_pipeline.py
preprocessor.py		preprocessor.py
util.py		util.py
util_deprecated.py		util_deprecated.py
word_embedding.py		word_embedding.py

README.rst

lda2vec

The lda2vec model tries to mix the best parts of word2vec and LDA into a single framework. Word2vec captures relationships between words, but the resulting vectors are largely uninterpretable and don't represent documents. LDA on the other hand is quite interpretable, but doesn't model local word relationships like word2vec.

This model builds both word and document topics, makes them interpretable, makes topics over features and documents, and makes topics that can be supervised and used to predict another target.

lda2vec also includes more contexts and features than LDA. LDA dictates that words are generated by a document vector; but we might have all kinds of 'side-information' that could influence the topics. Example features might include a comment about a particular item, written at a particular time and in a particular region.

Adapted from @nateraw, which is adapted from @meereeum, which is adapted from @cemoody (code for the original paper by Chris Moody). See Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lda2vec

lda2vec

README.rst

lda2vec

Files

lda2vec

Directory actions

More options

Directory actions

More options

Latest commit

History

lda2vec

Folders and files

parent directory

README.rst

lda2vec