Theorem 3 presents a more general result, showing that any activation function that is sigmoid-like or ReLU-like leads to the computational hardness, even if the loss function ℓis convex
Although the recursive kernel method doesn’t outperform the LeNet5 model, the experiment demonstrates that it does learn better predictors than fully connected neural networks such as the multi-layer perceptron. The LeNet5 architecture encodes prior knowledge about digit recogniition via the convolution and pooling operations; thus its performance is better than the generic architectures
ℓ1-regularized Neural Networks are Improperly Learnable in Polynomial Time by Yuchen Zhang, Jason D. Lee, Michael I. Jordan
We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has
khidden layers and that the ℓ1-norm of the incoming weights of any neuron is bounded by L. We present a kernel-based method, such that with probability at least 1−δ, it learns a predictor whose generalization error is at most ϵworse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in (1/ϵ,log(1/δ),F(k,L)), where F(k,L)is a function depending on (k,L)and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoid-like activation functions and ReLU-like activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.