Technical notes

Tuesday, August 5, 2014

Hashing trick for word dictionary

One can use hash for building a dictionary and converting text documents to vector space representation. Dictionary size N has to be specified and documents tokenized to terms. Then, Hash(term) mod N is a term index in VSM. More details at: http://en.wikipedia.org/wiki/Feature_hashing and http://www.shogun-toolbox.org/static/notebook/current/HashedDocDotFeatures.html. 
Posted by Alexander at 2:49 AM
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest

No comments:

Post a Comment

Newer Post Older Post Home
Subscribe to: Post Comments (Atom)

About Me

Alexander
View my complete profile

Blog Archive

  • ►  2019 (1)
    • ►  April (1)
  • ►  2017 (3)
    • ►  July (1)
    • ►  May (2)
  • ►  2016 (11)
    • ►  September (1)
    • ►  August (2)
    • ►  July (1)
    • ►  June (2)
    • ►  May (1)
    • ►  April (2)
    • ►  March (1)
    • ►  January (1)
  • ►  2015 (19)
    • ►  September (2)
    • ►  August (1)
    • ►  July (4)
    • ►  May (4)
    • ►  April (2)
    • ►  March (3)
    • ►  February (1)
    • ►  January (2)
  • ▼  2014 (11)
    • ►  December (3)
    • ►  September (1)
    • ▼  August (6)
      • Fetch a branch from a fork
      • Scala Play Framework simple web application with f...
      • Text classification with Apache Spark 1.1 (sentime...
      • Build a specific Maven project
      • How to use Apache Spark libraries that were compil...
      • Hashing trick for word dictionary
    • ►  July (1)
Simple theme. Powered by Blogger.