Technical notes: Rule of thumb for classification

Thursday, December 11, 2014

Rule of thumb for classification

There are quite a few machine learning classifiers. It is usually hard to say which is better until every one is tried on the given data and performance is measured. However, there are few rules of thumb:

Linear classifier is better used when:

Sparse data (lot of zeroes in feature vector)
Feature engineering performed, or deep feature learning
Up to large datasets (fits one machine)

Non-linear or kernel-based classifier is better used when

There are only few features (up to tens)
Big data - a lot of training examples

Bonus: how to manage imbalanced training set:

Evaluation: ROC under PR curve
Negative subsampling
Weighs for imbalanced classes (also - regularization parameter)

Technical notes

Thursday, December 11, 2014

Rule of thumb for classification

No comments:

Post a Comment