Skip to content

Commit

Permalink
Add "most_similar_to_given" method for KeyedVectors (piskvorky#1582)
Browse files Browse the repository at this point in the history
* finished adding 2 new functions

* imported argmax to word2vec

* reformatted

* remove `most_similar_to_given` from w2v class

* Fix PEP8
  • Loading branch information
TheMathMajor authored and horpto committed Oct 28, 2017
1 parent 15ff1d4 commit 4368cf4
Showing 1 changed file with 26 additions and 1 deletion.
27 changes: 26 additions & 1 deletion gensim/models/keyedvectors.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@

from numpy import dot, zeros, dtype, float32 as REAL,\
double, array, vstack, fromstring, sqrt, newaxis,\
ndarray, sum as np_sum, prod, ascontiguousarray
ndarray, sum as np_sum, prod, ascontiguousarray,\
argmax

from gensim import utils, matutils # utility fnc for pickling, common scipy operations etc
from gensim.corpora.dictionary import Dictionary
Expand Down Expand Up @@ -616,6 +617,30 @@ def similarity(self, w1, w2):
"""
return dot(matutils.unitvec(self[w1]), matutils.unitvec(self[w2]))

def most_similar_to_given(self, w1, word_list):
"""Return the word from word_list most similar to w1.
Args:
w1 (str): a word
word_list (list): list of words containing a word most similar to w1
Returns:
the word in word_list with the highest similarity to w1
Raises:
KeyError: If w1 or any word in word_list is not in the vocabulary
Example::
>>> trained_model.most_similar_to_given('music', ['water', 'sound', 'backpack', 'mouse'])
'sound'
>>> trained_model.most_similar_to_given('snake', ['food', 'pencil', 'animal', 'phone'])
'animal'
"""
return word_list[argmax([self.similarity(w1, word) for word in word_list])]

def n_similarity(self, ws1, ws2):
"""
Compute cosine similarity between two sets of words.
Expand Down

0 comments on commit 4368cf4

Please sign in to comment.