Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic Coherence for DTM (or any model) #808

Closed
bhargavvader opened this issue Jul 28, 2016 · 8 comments
Closed

Topic Coherence for DTM (or any model) #808

bhargavvader opened this issue Jul 28, 2016 · 8 comments

Comments

@bhargavvader
Copy link
Contributor

Wanted to know if there was any way to plug in the topics (topic-term distributions, doc-topic distributions, vocabulary counts, etc) to the coherence pipeline and get a measure.

Right now the notebook talks about using models (either the LdaModel or a wrapper), but if trained through an external source, I am unsure of how to do it.

pyLDAvis does this neatly in it's prepare method where it allows just matrices as inputs if you don't have a model object.

This post also mentions using individual pipeline modules for your own coherence measure - is there any documentation/tutorial on how to the same?

@dsquareindia , would love help regarding this.

@devashishd12
Copy link
Contributor

@bhargavvader thanks for raising the issue! I'll answer this in two parts.

Plugging in another model :
Apart from accepting the different models and wrappers, the pipeline also takes as input a list of tokenized topics. Sorry for not putting this feature into the notebook yet. You can find a good example of how to do in the coherence pipeline tests (this line to be specific). You can simply plug in the tokenized topics into the pipeline like it has been done here. So suppose I have a trained HDP model called hm, I can simply get the topic terms by:

topics = []
for id, dist in hm.show_topics(formatted=False):
    topic = []
    for t, prob in dist:
        topic.append(t)
    topics.append(topic)

and then plug in the topics into the coherence model. Can this be done with the DTM wrapper? Sorry I don't have too much experience with it.

Making your own coherence pipeline:
You can make your own custom coherence measure by basically using the different pipeline stages as individual modules. To be more precise you can check the code here. The c_uci measure was built simply by choosing the different functions from the different modules. However there are still some functions which have not been coded yet however it's pretty easy to code them.

I hope this clarifies your doubts a little.

@bhargavvader
Copy link
Contributor Author

Ah yes, this certainly clears things up, thanks @dsquareindia . :)
Would be awesome if you could do a notebook with more examples of both, will be helpful for other users as well!

@devashishd12
Copy link
Contributor

Yeah I'll do that asap. Please keep this open so that I can reference this in my PR later.

@bhargavvader
Copy link
Contributor Author

bhargavvader commented Aug 18, 2016

@dsquareindia any update on the tutorial notebook for this?

I'll be opening PRs for an easy way to get topics ready for coherence for the DTM wrapper and python DTM so thought a more thorough notebook from your side on plugging in models (maybe for LSI/HDP/Mallet) and a few examples of custom coherence pipelines would be handy.

@devashishd12
Copy link
Contributor

Sorry for the delay @bhargavvader. I'll also write a blog post soon for this.

@devashishd12
Copy link
Contributor

Hey @bhargavvader I've also added a small gist here where I have included a small snippet of how to make your own pipeline. Hope this helps too.

@bhargavvader
Copy link
Contributor Author

This is great! You should add it to the notebook.

@bhargavvader
Copy link
Contributor Author

Thanks for all the help. I'm closing the issue.

@ghost ghost mentioned this issue Jun 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants