Topic Coherence for DTM (or any model) #808

bhargavvader · 2016-07-28T07:58:31Z

Wanted to know if there was any way to plug in the topics (topic-term distributions, doc-topic distributions, vocabulary counts, etc) to the coherence pipeline and get a measure.

Right now the notebook talks about using models (either the LdaModel or a wrapper), but if trained through an external source, I am unsure of how to do it.

pyLDAvis does this neatly in it's prepare method where it allows just matrices as inputs if you don't have a model object.

This post also mentions using individual pipeline modules for your own coherence measure - is there any documentation/tutorial on how to the same?

@dsquareindia , would love help regarding this.

devashishd12 · 2016-07-28T09:42:07Z

@bhargavvader thanks for raising the issue! I'll answer this in two parts.

Plugging in another model :
Apart from accepting the different models and wrappers, the pipeline also takes as input a list of tokenized topics. Sorry for not putting this feature into the notebook yet. You can find a good example of how to do in the coherence pipeline tests (this line to be specific). You can simply plug in the tokenized topics into the pipeline like it has been done here. So suppose I have a trained HDP model called hm, I can simply get the topic terms by:

topics = []
for id, dist in hm.show_topics(formatted=False):
    topic = []
    for t, prob in dist:
        topic.append(t)
    topics.append(topic)

and then plug in the topics into the coherence model. Can this be done with the DTM wrapper? Sorry I don't have too much experience with it.

Making your own coherence pipeline:
You can make your own custom coherence measure by basically using the different pipeline stages as individual modules. To be more precise you can check the code here. The c_uci measure was built simply by choosing the different functions from the different modules. However there are still some functions which have not been coded yet however it's pretty easy to code them.

I hope this clarifies your doubts a little.

bhargavvader · 2016-07-28T10:13:56Z

Ah yes, this certainly clears things up, thanks @dsquareindia . :)
Would be awesome if you could do a notebook with more examples of both, will be helpful for other users as well!

devashishd12 · 2016-07-28T10:22:04Z

Yeah I'll do that asap. Please keep this open so that I can reference this in my PR later.

bhargavvader · 2016-08-18T20:32:13Z

@dsquareindia any update on the tutorial notebook for this?

I'll be opening PRs for an easy way to get topics ready for coherence for the DTM wrapper and python DTM so thought a more thorough notebook from your side on plugging in models (maybe for LSI/HDP/Mallet) and a few examples of custom coherence pipelines would be handy.

devashishd12 · 2016-08-21T10:18:52Z

Sorry for the delay @bhargavvader. I'll also write a blog post soon for this.

devashishd12 · 2016-08-22T07:02:59Z

Hey @bhargavvader I've also added a small gist here where I have included a small snippet of how to make your own pipeline. Hope this helps too.

bhargavvader · 2016-08-22T20:02:19Z

This is great! You should add it to the notebook.

bhargavvader · 2016-08-23T21:57:23Z

Thanks for all the help. I'm closing the issue.

devashishd12 mentioned this issue Aug 21, 2016

[MRG] Improved doc for texts, added HDP model to notebook #834

Merged

bhargavvader closed this as completed Aug 23, 2016

ghost mentioned this issue Jun 12, 2017

Identical topics #416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topic Coherence for DTM (or any model) #808

Topic Coherence for DTM (or any model) #808

bhargavvader commented Jul 28, 2016

devashishd12 commented Jul 28, 2016

bhargavvader commented Jul 28, 2016

devashishd12 commented Jul 28, 2016

bhargavvader commented Aug 18, 2016 •

edited

Loading

devashishd12 commented Aug 21, 2016

devashishd12 commented Aug 22, 2016

bhargavvader commented Aug 22, 2016

bhargavvader commented Aug 23, 2016

Topic Coherence for DTM (or any model) #808

Topic Coherence for DTM (or any model) #808

Comments

bhargavvader commented Jul 28, 2016

devashishd12 commented Jul 28, 2016

bhargavvader commented Jul 28, 2016

devashishd12 commented Jul 28, 2016

bhargavvader commented Aug 18, 2016 • edited Loading

devashishd12 commented Aug 21, 2016

devashishd12 commented Aug 22, 2016

bhargavvader commented Aug 22, 2016

bhargavvader commented Aug 23, 2016

bhargavvader commented Aug 18, 2016 •

edited

Loading