Nuit Blanche: Everybody wants to be shallow: Compact Nonlinear Maps and Circulant Extensions

Wednesday, March 18, 2015

Everybody wants to be shallow: Compact Nonlinear Maps and Circulant Extensions

Everybody wants to be shallow ! Compared to the deep network frenzy, there is still the sense that some sorts of regularization is at play and that a shallow network might hold the key to the right regularization. Here some recent examples featured here:

Enhanced Image Classification With a Fast-Learning Shallow Convolutional Neural Network
Fast, simple and accurate handwritten digit classification using extreme learning machines with shaped input-weights
Kernel Methods Match Deep Neural Networks on TIMIT
but also Deep Fried Convnets, Sunday Morning Insight: The Regularization Architecture, Do Deep Nets Really Need to be Deep?

Here is another instance of improving the scalability of the Random Features approach by choosing better "random projections" so they can be more representative of the dataset being investigated. It slims them up but it ain't doing much with regards to accuracy. I noted from the paper:

6.2 CNM as Neural Networks
One can view the proposed CNM framework from a different angle. If we ignore the original motivation of the work i.e., kernel approximation via Random Fourier Features, the proposed method can be seen as a shallow neural network with one hidden layer, with cos(·) as the activation function, and the SVM objective. It is interesting to note that such a “two-layer neural network”, which simulates certain shift-invariant kernels, leads to very good classification performance as shown in the experimental section. Under the neural network view, one can also use back-propagation as the optimization method, similar to the proposed alternating SGD, or use other types of activation functions such as the sigmoid, and ReLU functions. However the “network” then will no longer correspond to a shift-invariant kernel.

I know I am being provocative here, but "is this really such a bad thing ?" anyway, here is the paper: Compact Nonlinear Maps and Circulant Extensions by Felix X. Yu, Sanjiv Kumar, Henry Rowley, Shih-Fu Chang

Kernel approximation via nonlinear random feature maps is widely used in speeding up kernel machines. There are two main challenges for the conventional kernel approximation methods. First, before performing kernel approximation, a good kernel has to be chosen. Picking a good kernel is a very challenging problem in itself. Second, high-dimensional maps are often required in order to achieve good performance. This leads to high computational cost in both generating the nonlinear maps, and in the subsequent learning and prediction process. In this work, we propose to optimize the nonlinear maps directly with respect to the classification objective in a data-dependent fashion. The proposed approach achieves kernel approximation and kernel learning in a joint framework. This leads to much more compact maps without hurting the performance. As a by-product, the same framework can also be used to achieve more compact kernel maps to approximate a known kernel. We also introduce Circulant Nonlinear Maps, which uses a circulant-structured projection matrix to speed up the nonlinear maps for high-dimensional data.

Join the CompressiveSensing subreddit or the Google+ Community and post there !