You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the method we are deleting ngrams and reducing history counts, i think vocabulary needs to be cleaned up too (when word history becomes zero, for instance).
The main idea of this method is to get rid of tokens and their sequences that we find irrelevant, in order to speed up reading from file or lookup within the model. And in this case always keeping all the vocabulary entries defeats the purpose.
The text was updated successfully, but these errors were encountered:
Am still not sure about a good solution for reducing vocabulary, but I think history needs to be additionally reduced too, perhaps using something like this:
Because technically, if we are reducing the ngrams using the same threshold, then those words we are "throwing out" of ngramCounts will have the same or even higher occurence in historyCounts and therefore should be safe to delete. Tell me what you think, @olekscode.
In the method we are deleting ngrams and reducing history counts, i think vocabulary needs to be cleaned up too (when word history becomes zero, for instance).
The main idea of this method is to get rid of tokens and their sequences that we find irrelevant, in order to speed up reading from file or lookup within the model. And in this case always keeping all the vocabulary entries defeats the purpose.
The text was updated successfully, but these errors were encountered: