## Sunday, January 04, 2015

### Sunday Morning Insight: The Great Convergence ?

While empirical evidence is great, we all yearn to have a disciplined approach to some of the most recent results in Machine Learning. The next two papers are instances of providing a Framework on how to go conceptually from things we know theoretically to things that seem to work well. The first paper is about Kernel Machines using L1 regularization while the second is about the convexity of Neural Networks, an important property as has been shown in compressive sensing.
Different cheese and an artificial line separating them....

Here are two papers that are showing us how theory is trying to catch up with empirically successful approaches.

We recently saw Fredholm Kernels do semi supervised Learning (i.e. taking into account the distrbution of the set of elements for which we do not have a corresponding sample to learn from), but we also had recently Sparse Random Features for Learning whereby an L1 solver is used to get components of a kernel machine. This last bit is kind of difficult to explain within the Hilbert Space Framework because it really is a Banach space. The earlier paper showed there was a good reason to investigate it, here is the more theoretical approach to it. How is the B going to change the food flavored solvers (Random Kitchen Sinks, FastFood, Deep Fried Convnets...) ? only time will tell.
Abstract: This article mainly studies with the constructions of reproducing kernel Banach spaces (RKBSs) that are the generalization of reproducing kernel Hilbert spaces (RKHSs). Firstly, we verify many advance properties of the general RKBSs such as density, continuity, implicit representation, imbedding, compactness, and representer theorems for learning methods. Next, we develop a new concept of generalized Mercer kernels to construct the $p$-norm RKBSs for $1\leq p\leq\infty$. The $p$-norm RKBSs preserve the same simple format as the Mercer representation of RKHSs. Moreover, the p-norm RKBSs are isometrically equivalent to the standard p-norm spaces of countable sequences; hence the $p$-norm RKBSs possess more abundant geometrical structures than RKHSs including sparsity. To be more precise, the suitable countable expansion terms of the generalized Mercer kernels can be used to represent the pairs of Schauder bases and biorthogonal systems of the $p$-norm RKBSs such that the generalized Mercer kernels become the reproducing kernels of the $p$-norm RKBSs. The theory of the generalized Mercer kernels also cover many well-known kernels, for instance, min kernels, Gaussian kernels, and power series kernels. Finally, we propose to solve the support vector machines in the $p$-norm RKBSs, which are to minimize the regularized empirical risks over the $p$-norm RKBSs. We show that the infinite-dimensional support vector machines in the $p$-norm RKBSs can be equivalently transferred into the finite dimensional convex optimization problems such that we can obtain the finite dimensional representations of their support vector machine solutions for practical applications and computer programs. In particular, we verify that some typical support vector machines in the $1$-norm RKBSs are equivalent to the classical $1$-norm sparse regressions.
In a different direction, we featured the slides earlier here, here is the attendant paper:

Breaking the Curse of Dimensionality with Convex Neural Networks by Francis Bach
We consider neural networks with a single hidden layer and non-decreasing homogeneous activation functions like the rectified linear units. By letting the number of hidden units grow unbounded and using classical non-Euclidean regularization tools on the output weights, we provide a detailed theoretical analysis of their generalization performance, with a study of both the approximation and the estimation errors. We show in particular that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace. Moreover, when using sparsity-inducing norms on the input weights, we show that high-dimensional non-linear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of ob-servations. In addition, we provide a simple geometric interpretation to the non-convex problem of addition of a new unit, which is the core potentially hard computational element in the framework of learning from continuously many basis functions. We provide simple conditions for convex relaxations to achieve the same generalization error bounds, even when constant-factor approxi-mations cannot be found (e.g., because it is NP-hard such as for the zero-homogeneous activation function). We were not able to find strong enough convex relaxations and leave open the existence or non-existence of polynomial-time algorithms.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.