You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use your code to apply LDA for anomaly detection in a Bayesian framework.
But I am not sure that the method "score" of the class LatentDirichletAllocation" do what I want to do.
More specifically, using the notations of the paper Latent Dirichlet Allocation, Blei, Ng & Jordan (2003),
I have a a corpus of documents D = { w_1, ... w_M} for learning.
I would like to use smoothing to handle Out of Vocabulary issues.
So, the hyper-parameters I learned to fit my LDA model on D are \alpha and \eta.
Then, I want to do anomaly detection, by using the LDA model as a bayesian semi-supervised classifier.
I assume that all documents wi of the initial corpus D belong to the class "normal" (class 1).
When I see a new document w{test}, I try to know if it belongs to the class "normal" (class 1) or to the class "anomaly" (class -1).
To know this, I would like to compute a prosteriori probability:
p ( w{test} | D , \alpha , \eta )
If it is too small, w{test} is considered as an anomaly.
But, I do not know if the method "score" of the class LatentDirichletAllocation" compute
p ( w{test} | \alpha , \eta ) (formula 1)
= \int d \beta p ( w{test} | \alpha , \beta ) p ( beta | \eta )
or
p ( w{test} | D , \alpha , \eta ) (formula 2)
= \int d \beta p ( w{test} | \alpha , \beta ) p ( beta | D , \eta )
My intuition is that the method score may be initially used for the fitting of \alpha and \eta on the corpus D and thus it is not exactly what I want to do.
I think that p ( beta | D , \eta ) ( = the a posteriori pdf of \beta (the distriution of words in each topic) ) cointains much more information on the statistics of the corpus D, than the p ( beta | \eta ) ( = the a priori pdf of \beta ). Hence, it would be better suited for my classification problem.
So, please could you tell me if the method score implements (formula 1) or (formula 2)?
Thank you in advance.
Kind Regards,
Valentin Resseguier
The text was updated successfully, but these errors were encountered:
Dear Pr. Blei and collaborators,
I am trying to use your code to apply LDA for anomaly detection in a Bayesian framework.
But I am not sure that the method "score" of the class LatentDirichletAllocation" do what I want to do.
More specifically, using the notations of the paper Latent Dirichlet Allocation, Blei, Ng & Jordan (2003),
I have a a corpus of documents D = { w_1, ... w_M} for learning.
I would like to use smoothing to handle Out of Vocabulary issues.
So, the hyper-parameters I learned to fit my LDA model on D are \alpha and \eta.
Then, I want to do anomaly detection, by using the LDA model as a bayesian semi-supervised classifier.
I assume that all documents wi of the initial corpus D belong to the class "normal" (class 1).
When I see a new document w{test}, I try to know if it belongs to the class "normal" (class 1) or to the class "anomaly" (class -1).
To know this, I would like to compute a prosteriori probability:
p ( w{test} | D , \alpha , \eta )
If it is too small, w{test} is considered as an anomaly.
But, I do not know if the method "score" of the class LatentDirichletAllocation" compute
p ( w{test} | \alpha , \eta ) (formula 1)
= \int d \beta p ( w{test} | \alpha , \beta ) p ( beta | \eta )
or
p ( w{test} | D , \alpha , \eta ) (formula 2)
= \int d \beta p ( w{test} | \alpha , \beta ) p ( beta | D , \eta )
My intuition is that the method score may be initially used for the fitting of \alpha and \eta on the corpus D and thus it is not exactly what I want to do.
I think that p ( beta | D , \eta ) ( = the a posteriori pdf of \beta (the distriution of words in each topic) ) cointains much more information on the statistics of the corpus D, than the p ( beta | \eta ) ( = the a priori pdf of \beta ). Hence, it would be better suited for my classification problem.
So, please could you tell me if the method score implements (formula 1) or (formula 2)?
Thank you in advance.
Kind Regards,
Valentin Resseguier
The text was updated successfully, but these errors were encountered: