Nuit Blanche: Fast, simple and accurate handwritten digit classification using extreme learning machines with shaped input-weights

Friday, January 09, 2015

Fast, simple and accurate handwritten digit classification using extreme learning machines with shaped input-weights

I first heard about Extreme Learning Machines from a presentation at the Paris Machine Learning Meetup #6 Season 1 on Botnet detection by Joseph Ghafari (a resource for this approach can be found here). These constructions are essentially nonlinear functions of random linear projections. Today, we have a paper that aims at showing how one/few iterations of these functions can provide similar results as deep neural networks.

Fast, simple and accurate handwritten digit classification using extreme learning machines with shaped input-weights by Mark D. McDonnell, Migel D. Tissera, André van Schaik, Jonathan Tapson

Deep networks have inspired a renaissance in neural network use, and are becoming the default option for difficult tasks on large datasets. In this report we show that published deep network results on the MNIST handwritten digit dataset can straightforwardly be replicated (error rates below 1%, without use of any distortions) with shallow 'Extreme Learning Machine' (ELM) networks, with a very rapid training time (~10 minutes). When we used distortions of the training set we obtained error rates below 0.6%. To achieve this performance, we introduce several methods for enhancing ELM implementation, which individually and in combination can significantly improve performance, to the point where it is nearly indistinguishable from deep network performance. The main innovation is to ensure each hidden-unit operates only on a randomly sized and positioned patch of each image. This form of random 'receptive field' sampling of the input ensures the input weight matrix is sparse, with about 90 percent of weights equal to zero, which is a potential advantage for hardware implementations. Furthermore, combining our methods with a small number of iterations of a single-batch backpropagation method can significantly reduce the number of hidden-units required to achieve a particular performance. Our close to state-of-the-art results for MNIST suggest that the ease of use and accuracy of ELM should cause it to be given greater consideration as an alternative to deep networks applied to more challenging datasets.