If you are reading Nuit Blanche, you probably know about Kaggle, the site where a whole slew of datasets are tested against different Supervised Learning algorithms. Kaggle changes everything because you have to look at each of their competitions as a distributed attack on a single dataset from different families of algorithms and with different biases. It is as if one were to test a compressive sensing dataset against all these algorithms. This richness in point of view and angles of attack is rare in academia. In recent times, some algorithms have shown to produce good results such as CNNs for image related tasks and Random Forests for other types of data. While CNNs retain the attention of academia all the way to industry, Random Forest, in my view, is clearly the unexpected algorithm that seem to be doing well consistently on a number of tasks. I am not sure we could have gotten that insight from just reading the academic literature. Here are two competitions that are still on going and retained my interest:
The Higgs Boson competition (1292 teams)
Why ? whichever algorithm does well in the competition will have the possibility to work on real data. With real data, there is the potential to discover more than the Higgs.
The Criteo competition (283 teams)
Why ? It might be the case that "The best minds of my generation are thinking about how to make people click ads," but this is the first time we have public access to a large corpus of real CTR data.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.