Nuit Blanche: ℓ1-regularized Neural Networks are Improperly Learnable in Polynomial Time

Monday, October 19, 2015

ℓ1-regularized Neural Networks are Improperly Learnable in Polynomial Time

Here is a very interesting find:

Theorem 3 presents a more general result, showing that any activation function that is sigmoid-like or ReLU-like leads to the computational hardness, even if the loss function ℓis convex

and from the conclusion:

Although the recursive kernel method doesn’t outperform the LeNet5 model, the experiment demonstrates that it does learn better predictors than fully connected neural networks such as the multi-layer perceptron. The LeNet5 architecture encodes prior knowledge about digit recogniition via the convolution and pooling operations; thus its performance is better than the generic architectures

ℓ1-regularized Neural Networks are Improperly Learnable in Polynomial Time by Yuchen Zhang, Jason D. Lee, Michael I. Jordan

We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has k hidden layers and that the ℓ1-norm of the incoming weights of any neuron is bounded by L. We present a kernel-based method, such that with probability at least 1−δ, it learns a predictor whose generalization error is at most ϵ worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in (1/ϵ,log(1/δ),F(k,L)), where F(k,L) is a function depending on (k,L) and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoid-like activation functions and ReLU-like activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.