Skip to content

Commit

Permalink
Fix tf-idf (#2980)
Browse files Browse the repository at this point in the history
Fix #2974
  • Loading branch information
henry0312 authored and fchollet committed Jun 14, 2016
1 parent dc569e9 commit 53aaa84
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions keras/preprocessing/text.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,8 +206,10 @@ def sequences_to_matrix(self, sequences, mode='binary'):
elif mode == 'binary':
X[i][j] = 1
elif mode == 'tfidf':
tf = np.log(c / len(seq))
df = (1 + np.log(1 + self.index_docs.get(j, 0) / (1 + self.document_count)))
# Use weighting scheme 2 in
# https://en.wikipedia.org/wiki/Tf%E2%80%93idf
tf = 1 + np.log(c)
df = np.log(1 + self.index_docs.get(j, 0) / (1 + self.document_count))
X[i][j] = tf / df
else:
raise Exception('Unknown vectorization mode: ' + str(mode))
Expand Down

0 comments on commit 53aaa84

Please sign in to comment.