Skip to content

Latest commit

 

History

History

lda2vec

lda2vec

The lda2vec model tries to mix the best parts of word2vec and LDA into a single framework. Word2vec captures relationships between words, but the resulting vectors are largely uninterpretable and don't represent documents. LDA on the other hand is quite interpretable, but doesn't model local word relationships like word2vec.

This model builds both word and document topics, makes them interpretable, makes topics over features and documents, and makes topics that can be supervised and used to predict another target.

lda2vec also includes more contexts and features than LDA. LDA dictates that words are generated by a document vector; but we might have all kinds of 'side-information' that could influence the topics. Example features might include a comment about a particular item, written at a particular time and in a particular region.

Adapted from @nateraw, which is adapted from @meereeum, which is adapted from @cemoody (code for the original paper by Chris Moody). See Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec.