If anything comes out of Kaggle competitions, it is that Random Forests [4,5,6], while not consistently giving the best scores, have pretty good results most of the times.

Random Forest [4] typically is a technique of supervised learning in Machine Learning parlance.

Recently, some folks have made some sorts of connection with these techniques and compressive sensing or Johnson-Lindenstrauss. In one of the earliest case [1], while not in a random forest setting, the authors used the sparsity of the positive label within the larger label set to perform classification on the compressed labels and used L1 solvers to re-identify the actual labels. In [2], the authors adapted this technique to train random trees to randomized label features. In [3], the authors took a different approach to random forests by noting that random trees were a mapping from features into a new, very sparse, feature space. They use this node sparsity in the new feature space to compress trees and reduce the memory size of the entire forest. This is great, but what if randomization could be used to build compressed forests from the get go ? Not much has been done it seems on that front. Aside from the random picking of features in decision trees, Leo Breiman's original paper on Random Forest [4] contained another interesting approach reminiscent of random projections used in compressive sensing:

"...5. Random Forests Using Linear Combinations of Inputs

At a given node, L variables are randomly selected and added together with coefficients that are uniform random numbers on [-1,1]. F linear combinations are generated, and then a search is made over these for the best split. This procedure is called Forest-RC...."

One note that for large datasets, more randomized measurements yield similar or better accuracy than plain vanilla random forest or Adaboost.

*More on that later.*

[1] Multi-Label Prediction via Compressed Sensing by Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (attendant slides )

[3] L1-based compression of random forest models , Joly, Arnaud; Schnitzler, François; Geurts, Pierre; Wehenkel, Louis or L1-based compression of random forest models , Joly, Arnaud; Schnitzler, François; Geurts, Pierre; Wehenkel, Louis

[5] Data Mining with Random Forests™, A Brief Overview to RandomForests, Dan Steinberg, Mikhail Golovnya, N. Scott Cardell

[6] Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning, A. Criminisi, J. Shotton, and E. Konukoglu

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

**Join the CompressiveSensing subreddit or the Google+ Community and post there !**

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## No comments:

Post a Comment