A posteriori log-likelihood computed for a new document #3

vressegu · 2018-06-27T10:23:20Z

Dear Pr. Blei and collaborators,

I am trying to use your code to apply LDA for anomaly detection in a Bayesian framework.
But I am not sure that the method "score" of the class LatentDirichletAllocation" do what I want to do.

More specifically, using the notations of the paper Latent Dirichlet Allocation, Blei, Ng & Jordan (2003),
I have a a corpus of documents D = { w_1, ... w_M} for learning.
I would like to use smoothing to handle Out of Vocabulary issues.
So, the hyper-parameters I learned to fit my LDA model on D are \alpha and \eta.

Then, I want to do anomaly detection, by using the LDA model as a bayesian semi-supervised classifier.
I assume that all documents wi of the initial corpus D belong to the class "normal" (class 1).
When I see a new document w{test}, I try to know if it belongs to the class "normal" (class 1) or to the class "anomaly" (class -1).
To know this, I would like to compute a prosteriori probability:
p ( w{test} | D , \alpha , \eta )
If it is too small, w{test} is considered as an anomaly.

But, I do not know if the method "score" of the class LatentDirichletAllocation" compute
p ( w{test} | \alpha , \eta ) (formula 1)
= \int d \beta p ( w{test} | \alpha , \beta ) p ( beta | \eta )
or
p ( w{test} | D , \alpha , \eta ) (formula 2)
= \int d \beta p ( w{test} | \alpha , \beta ) p ( beta | D , \eta )
My intuition is that the method score may be initially used for the fitting of \alpha and \eta on the corpus D and thus it is not exactly what I want to do.
I think that p ( beta | D , \eta ) ( = the a posteriori pdf of \beta (the distriution of words in each topic) ) cointains much more information on the statistics of the corpus D, than the p ( beta | \eta ) ( = the a priori pdf of \beta ). Hence, it would be better suited for my classification problem.

So, please could you tell me if the method score implements (formula 1) or (formula 2)?

Thank you in advance.

Kind Regards,
Valentin Resseguier

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A posteriori log-likelihood computed for a new document #3

A posteriori log-likelihood computed for a new document #3

vressegu commented Jun 27, 2018

A posteriori log-likelihood computed for a new document #3

A posteriori log-likelihood computed for a new document #3

Comments

vressegu commented Jun 27, 2018