Word2Vec models #26

wwymak · 2017-02-02T00:18:36Z

Construct word2vec model with tweets for groups of people (e.g. far right) and compare with models trained on the overall twitterverse (e.g. http://fredericgodin.com/papers/Named%20Entity%20Recognition%20for%20Twitter%20Microposts%20using%20Distributed%20Word%20Representations.pdf)

Some things to try:
clustering tweets with tSNE/kMeans/PCA
predict hashtags with tweets vectors
do regression on tweet/hashtag vectors

(notes from a chat with a colleague of mine who did some nlp research.
The following are some of his recommendations:

using word2vec is more going to give better results compared to e.g. countVectorizer
use word2vec with skipgram training for the tweets themselves
there probably is no need to remove stop words or tokenize tweets (but remove punctuation)
convert emojis into e.g. happy to get better context
convert word2vec vectors into polar coordinates
train word2vec for hashtags from tweets using cbow

His opinion is that gensim is a handy tool but he also built some extra utils etc for his work that may be useful: https://github.com/pelodelfuego/word2vec-toolbox )

I have been tinkering a bit with the our data using gensim (seems fairly easy to use although I haven't actually tried seeing what falls out of it yet)

patrick-dd · 2017-02-04T20:28:06Z

Starting on this

hadoopjax · 2017-02-05T12:38:07Z

Great to hear @patrick-dd thanks for picking this up! I invited you to the D4D organization so you can be assigned the issue (helps us track who's working on what).

wwymak · 2017-02-05T16:56:34Z

looks like me and @patrick-dd is going to work from the two different ends of the problem and maybe with luck meet in the middle :) Just thought I'd add in that anyone else who is interested is welcome since it'll be useful to get different insights into this task

wwymak added help wanted status-in-progress labels Feb 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word2Vec models #26

Word2Vec models #26

wwymak commented Feb 2, 2017

patrick-dd commented Feb 4, 2017

hadoopjax commented Feb 5, 2017

wwymak commented Feb 5, 2017

Word2Vec models #26

Word2Vec models #26

Comments

wwymak commented Feb 2, 2017

patrick-dd commented Feb 4, 2017

hadoopjax commented Feb 5, 2017

wwymak commented Feb 5, 2017