Tuesday, August 5, 2014

Hashing trick for word dictionary

One can use hash for building a dictionary and converting text documents to vector space representation. Dictionary size N has to be specified and documents tokenized to terms. Then, Hash(term) mod N is a term index in VSM. More details at: http://en.wikipedia.org/wiki/Feature_hashing and http://www.shogun-toolbox.org/static/notebook/current/HashedDocDotFeatures.html

No comments:

Post a Comment