My name is Igor Carron

Page Views on Nuit Blanche since July 2010

My papers on ArXiv:
Approximating Kernels at the speed of Light
&
Imaging with Nature

|| Reddit

||
Attendant references pages:
The Advanced Matrix Factorization Jungle Page ||

Paris Machine Learning
@Meetup.com || @Archives

Tuesday, April 30, 2019

Eigenvalue distribution of nonlinear models of random matrices

Here is a more mathematical way of dealing with nonlinearities in DNNs.

This paper is concerned with the asymptotic empirical eigenvalue distribution of a non linear random matrix ensemble. More precisely we consider M=1mYY∗ with Y=f(WX) where W and X are random rectangular matrices with i.i.d. centered entries. The function f is applied pointwise and can be seen as an activation function in (random) neural networks. We compute the asymptotic empirical distribution of this ensemble in the case where W and X have sub-Gaussian tails and f is real analytic. This extends a previous result where the case of Gaussian matrices W and X is considered. We also investigate the same questions in the multi-layer case, regarding neural network applications.

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Monday, April 29, 2019

Beyond Thunderspline: Mad Max: Affine Spline Insights into Deep Learning/ From Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference

The following set of papers shows that most DNNs that have been working so far have specific kinds of nonlinearities that can be better understood thanks to the splines. Yes, peeeoppple, we are going back to the splines!

From the first paper, I like the fact that Breiman has an influence on something related to DNNs:
Leveraging a result from Breiman [1993], we prove in Appendix D.7 that the composition of just two MASOs has a universal approximation property.
Here is something of interest:

5. DNs are Template Matching Machines We now take a deeper look into the featurization/prediction process of (4.9) in order to bridge DNs and classical optimal classification theory via matched filters (aka “template matching”). We will focus on classification for concreteness.
Followed eventually by:

Visual inspection of the figures highlights that, as we progress through the layers of the DN, similar images become closer in VQ distance but further in Euclidean distance. In particular, the similarity of the first 3 panels in (a)–(c) of the figures, which replicate the same experiment but using a DN trained with correct labels, a DN trained with incorrect labels, and a DN that is not trained at all, indicates that the early convolution layers of the DN are partitioning the training images based more on their visual features than their class membership.
Here are the three papers:

We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space that is implicitly induced by a MASO directly links DNs to the theory of vector quantization (VQ) and K-means clustering, which opens up new geometric avenue to study how DNs organize signals in a hierarchical fashion. To validate the utility of the VQ interpretation, we develop and validate a new distance metric for signals and images that quantifies the difference between their VQ encodings. (This paper is a significantly expanded version of A Spline Theory of Deep Learning from ICML 2018.)

A Spline Theory of Deep Networks
We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space opens up a new geometric avenue to study how DNs organize signals in a hierarchical fashion. As an application, we develop and validate a new distance metric for signals that quantifies the difference between their partition encodings

Nonlinearity is crucial to the performance of a deep (neural) network (DN). To date there has been little progress understanding the menagerie of available nonlinearities, but recently progress has been made on understanding the r\^{o}le played by piecewise affine and convex nonlinearities like the ReLU and absolute value activation functions and max-pooling. In particular, DN layers constructed from these operations can be interpreted as {\em max-affine spline operators} (MASOs) that have an elegant link to vector quantization (VQ) and $K$-means. While this is good theoretical progress, the entire MASO approach is predicated on the requirement that the nonlinearities be piecewise affine and convex, which precludes important activation functions like the sigmoid, hyperbolic tangent, and softmax. {\em This paper extends the MASO framework to these and an infinitely large class of new nonlinearities by linking deterministic MASOs with probabilistic Gaussian Mixture Models (GMMs).} We show that, under a GMM, piecewise affine, convex nonlinearities like ReLU, absolute value, and max-pooling can be interpreted as solutions to certain natural hard'' VQ inference problems, while sigmoid, hyperbolic tangent, and softmax can be interpreted as solutions to corresponding soft'' VQ inference problems. We further extend the framework by hybridizing the hard and soft VQ optimizations to create a $\beta$-VQ inference that interpolates between hard, soft, and linear VQ inference. A prime example of a $\beta$-VQ DN nonlinearity is the {\em swish} nonlinearity, which offers state-of-the-art performance in a range of computer vision tasks but was developed ad hoc by experimentation. Finally, we validate with experiments an important assertion of our theory, namely that DN performance can be significantly improved by enforcing orthogonality in its linear filters.

Saturday, April 27, 2019

Saturday Morning Videos: Statistical Modeling for Shapes and Imaging, 11 March 2019 - 15 March 2019, IHP

** Nuit Blanche is now on Twitter: @NuitBlog **

Within this semester of «The Mathematics of Imaging» at IHP (from January 7th till April 5th , 2019), a second set of talks took place featuring the workshop on «Statistical Modeling for Shapes and Imaging» Here the videos of these talks, enjoy !

Photography Made Easy - Sylvain Paris - Workshop 2 - CEB T1 2019

Synthesizing stochastic microstructures for additive manufacturing - Sylvain Lefèbvre - Workshop 2 - CEB T1 2019

Detecting Overfitting of Deep Generative Networks via Latent Recovery. - Julien Rabin - Workshop 2 - CEB T1 2019

Interaction between invariant structures for shape analysis - Ron Kimmel - Workshop 2 - CEB T1 2019

3D Point Cloud Classification, Segmentation and Normal (...) - Michael Lindenbaum - Workshop 2 - CEB T1 2019

Total variation denoising with iterated conditional expectation - Cécile Louchet Workshop 2 - CEB T1 2019

Hybrid sparse stochastic processes and the resolution of (...) - Michael Unser - Workshop 2 - CEB T1 2019

Lipschitz-Killing curvatures of excursion sets for 2D (...) - Hermine Biermé - Workshop 2 - CEB T1 2019

Efficient sampling through variable splitting-inspired (...) - Pierre Chainais - Workshop 2 - CEB T1 2019

Statistical aspects of stochastic algorithms for entropic (...) - Jérémie Bigot - Workshop 2 - CEB T1 2019

Stochastic Approximation-based algorithms, when the Monte (...) - Gersende Fort - Workshop 2 - CEB T1 2019

37
PDEs on the Homogeneous Space of Positions and Orientations - Remco Duits - Workshop 2 - CEB T1 2019

Mixed-effect model for the spatiotemporal analysis of longitudinal (...) - Stéphanie Allassonnière - Workshop 2 - CEB T1 2019

A fast statistical colocalization method for 3D live cell imaging and super-resolution microscopy. - Charles Kervrann - Workshop 2 - CEB T1 2019

Functional Data Analysis Under Shape Constraints - Anuj Srivastava - Workshop 2 - CEB T1 2019

Multiple objects detection in biological images using a Marked Point Process Framework. - Xavier Descombes - Workshop 2 - CEB T1 2019

Création des mondes virtuels : Objets auto-similaires et distributions d'éléments - Marie-Paule Cani - Grand Public - CEB T1 2019

Maximum Entropy Models for Texture Synthesis - Arthur Leclaire - Workshop 2 - CEB T1 2019

From currents to oriented varifolds for data fidelity metrics; growth models for computational anatomy. - Irène Kaltenmark - Workshop 2 - CEB T1 2019

Kernel norms on normal cycles and the KeOps library for linear memory reductions over datasets. - Joan Glaunès - Workshop 2 - CEB T1 2019

Bayesian inference and convex geometry: theory, methods, and algorithms. - Marcelo Pereyra - Workshop 2 - CEB T1 2019

Gaussian random fields and anisotropy - Marianne Clausel - Workshop 2 - CEB T1 2019

The inconvenience of a single Universe - Jean-François Cardoso - Workshop 2 - CEB T1 2019

TeraLasso for sparse time-varying image modeling - Alfred Hero - Workshop 2 - CEB T1 2019

Modular large deformation and shape aware metrics in shape analysis: How to make things simple (and meaningful)? - Alain Trouvé - Workshop 2 - CEB T1 2019

Friday, April 26, 2019

Neumann Networks for Inverse Problems in Imaging

Instead of unfolding iterations of an algorithm into a network structure, today's paper looks at truncating a Neumann series. Another instance of The Great Convergence

Many challenging image processing tasks can be described by an ill-posed linear inverse problem: deblurring, deconvolution, inpainting, compressed sensing, and superresolution all lie in this framework. Traditional inverse problem solvers minimize a cost function consisting of a data-fit term, which measures how well an image matches the observations, and a regularizer, which reflects prior knowledge and promotes images with desirable properties like smoothness. Recent advances in machine learning and image processing have illustrated that it is often possible to learn a regularizer from training data that can outperform more traditional regularizers. We present an end-to-end, data-driven method of solving inverse problems inspired by the Neumann series, which we call a Neumann network. Rather than unroll an iterative optimization algorithm, we truncate a Neumann series which directly solves the linear inverse problem with a data-driven nonlinear regularizer. The Neumann network architecture outperforms traditional inverse problem solution methods, model-free deep learning approaches, and state-of-the-art unrolled iterative methods on standard datasets. Finally, when the images belong to a union of subspaces and under appropriate assumptions on the forward model, we prove there exists a Neumann network configuration that well-approximates the optimal oracle estimator for the inverse problem and demonstrate empirically that the trained Neumann network has the form predicted by theory.
also a video:

Related:

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Thursday, April 25, 2019

Why are Big Data Matrices Approximately Low Rank?

It used to be the most imaging scene were compressible and the JPEG standard was a daily reminder of that reality. In the following paper, we are getting sharper re-assurances about the data world around us. I note the use of tools like the \epsilon-rank is a reminder to \epsilon-pseudospectrum in non-normal settings. Taking this argument in the reverse, if you don't find a close-by low rank matrix next to your data matrix, maybe there is something wrong with your data. Data matrix low-rankedness is needed in several Matrix Factorization operations.

Matrices of (approximate) low rank are pervasive in data science, appearing in recommender systems, movie preferences, topic models, medical records, and genomics. While there is a vast literature on how to exploit low rank structure in these datasets, there is less attention on explaining why the low rank structure appears in the first place. Here, we explain the effectiveness of low rank models in data science by considering a simple generative model for these matrices: we suppose that each row or column is associated to a (possibly high dimensional) bounded latent variable, and entries of the matrix are generated by applying a piecewise analytic function to these latent variables. These matrices are in general full rank. However, we show that we can approximate every entry of an m×n matrix drawn from this model to within a fixed absolute error by a low rank matrix whose rank grows as O(log(m+n)). Hence any sufficiently large matrix from such a latent variable model can be approximated, up to a small entrywise error, by a low rank matrix.
And as George Linderman points out on Twitter:

Published in SIAM Jounral on Mathematics of Data Science.

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Wednesday, April 24, 2019

Enhanced Expressive Power and Fast Training of Neural Networks by Random Projections

Here is one way random projections can reduce training time in neural networks !

Random projections are able to perform dimension reduction efficiently for datasets with nonlinear low-dimensional structures. One well-known example is that random matrices embed sparse vectors into a low-dimensional subspace nearly isometrically, known as the restricted isometric property in compressed sensing. In this paper, we explore some applications of random projections in deep neural networks. We provide the expressive power of fully connected neural networks when the input data are sparse vectors or form a low-dimensional smooth manifold. We prove that the number of neurons required for approximating a Lipschitz function with a prescribed precision depends on the sparsity or the dimension of the manifold and weakly on the dimension of the input vector. The key in our proof is that random projections embed stably the set of sparse vectors or a low-dimensional smooth manifold into a low-dimensional subspace. Based on this fact, we also propose some new neural network models, where at each layer the input is first projected onto a low-dimensional subspace by a random projection and then the standard linear connection and non-linear activation are applied. In this way, the number of parameters in neural networks is significantly reduced, and therefore the training of neural networks can be accelerated without too much performance loss.

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Tuesday, April 23, 2019

CSHardware: Development of sparse coding and reconstruction subsystems for astronomical imaging, João Rino-Silvestre

João just let me know the following:
Salut Igor,
I've noticed that you have a blog dedicated to compressed sensing, so I thought that my MSc dissertation (in which I detail the development of subsystems and of a stand-alone instrument prototype) might be of interest to you and your community. I leave you the link below:
http://hdl.handle.net/10451/37722

Best regards,
João Rino-Silvestre

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Book: High-Dimensional Probability An Introduction with Applications in Data Science by Roman Vershynin

Found in the comment section of Terry's blogRoman Vershynin is writing a book titled: High-Dimensional Probability An Introduction with Applications in Data Science (University of California, Irvine March 25, 2019)

Here is the table of content:
Preface vi Appetizer: using probability to cover a geometric set
1 Preliminaries on random variables 6
1 1 Preliminaries on random variables 6
1.1 Basic quantities associated with random variables 6
1.2 Some classical inequalities 7
1.3 Limit theorems 9
1.4 Notes 12
2 Concentration of sums of independent random variables 13
2.1 Why concentration inequalities? 13
2.2 Hoeffding’s inequality 16
2.3 Chernoff’s inequality 19
2.4 Application: degrees of random graphs 21
2.5 Sub-gaussian distributions 24
2.6 General Hoeffding’s and Khintchine’s inequalities 29
2.7 Sub-exponential distributions 32
2.8 Bernstein’s inequality 37
2.9 Notes 40
3 Random vectors in high dimensions 42
3.1 Concentration of the norm 43
3.2 Covariance matrices and principal component analysis 45
3.3 Examples of high-dimensional distributions 50
3.4 Sub-gaussian distributions in higher dimensions 56
3.5 Application: Grothendieck’s inequality and semidefinite programming 60
3.6 Application: Maximum cut for graphs 66
3.7 Kernel trick, and tightening of Grothendieck’s inequality 70
3.8 Notes 74
4 Random matrices 76
4.1 Preliminaries on matrices 76
4.2 Nets, covering numbers and packing numbers 81
4.3 Application: error correcting codes 86
4.4 Upper bounds on random sub-gaussian matrices 89
4.5 Application: community detection in networks 93
4.6 Two-sided bounds on sub-gaussian matrices 97 iii iv Contents
4.7 Application: covariance estimation and clustering 99
4.8 Notes 103
5 Concentration without independence 105
5.1 Concentration of Lipschitz functions on the sphere 105
5.2 Concentration on other metric measure spaces 112
5.3 Application: Johnson-Lindenstrauss Lemma 118
5.4 Matrix Bernstein’s inequality 121
5.5 Application: community detection in sparse networks 129
5.6 Application: covariance estimation for general distributions 129
5.7 Notes 133
6 Quadratic forms, symmetrization and contraction 135
6.1 Decoupling 135
6.2 Hanson-Wright Inequality 139
6.3 Concentration of anisotropic random vectors 142
6.4 Symmetrization 145
6.5 Random matrices with non-i.i.d. entries 147
6.6 Application: matrix completion 148
6.7 Contraction Principle 151 6.8 Notes 154
7 Random processes 156
7.1 Basic concepts and examples 156
7.2 Slepian’s inequality 160
7.3 Sharp bounds on Gaussian matrices 167
7.4 Sudakov’s minoration inequality 170
7.5 Gaussian width 172
7.6 Stable dimension, stable rank, and Gaussian complexity 178
7.7 Random projections of sets 181
7.8 Notes 185
8 Chaining 187
8.1 Dudley’s inequality 187
8.2 Application: empirical processes 195
8.3 VC dimension 200
8.4 Application: statistical learning theory 212
8.5 Generic chaining 219
8.6 Talagrand’s majorizing measure and comparison theorems 223
8.7 Chevet’s inequality 225
8.8 Notes 227
9 Deviations of random matrices and geometric consequences 229
9.1 Matrix deviation inequality 229
9.2 Random matrices, random projections and covariance estimation 235
9.3 Johnson-Lindenstrauss Lemma for infinite sets 238
9.4 Random sections: M∗ bound and Escape Theorem 240
9.5 Notes
10 Sparse Recovery 246
10.1 High-dimensional signal recovery problems 246
10.2 Signal recovery based on M∗ bound 248
10.3 Recovery of sparse signals 250
10.4 Low-rank matrix recovery 254
10.5 Exact recovery and the restricted isometry property 256
10.6 Lasso algorithm for sparse regression 262
10.7 Notes 267
11 Dvoretzky-Milman’s Theorem 269
11.1 Deviations of random matrices with respect to general norms 269
11.2 Johnson-Lindenstrauss embeddings and sharper Chevet inequality 272
11.3 Dvoretzky-Milman’s Theorem 274
11.4 Notes 279
Bibliography 280
Index

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Monday, April 22, 2019

Videos: New Deep Learning Techniques, February 5 - 9, 2018, IPAM Workshop

Overview

In recent years, artificial neural networks a.k.a. deep learning have significantly improved the fields of computer vision, speech recognition, and natural language processing. The success relies on the availability of large-scale datasets, the developments of affordable high computational power, and basic deep learning operations that are sound and fast as they assume that data lie on Euclidean grids. However, not all data live on regular lattices. 3D shapes in computer graphics represent Riemannian manifolds. In neuroscience, brain activity (fMRI) is encoded on the structural connectivity network (sMRI). In genomics, the human body functionality is expressed through DNA, RNA, and proteins that form the gene regulatory network (GRN). In social sciences, people interact through networks. Eventually, data in communication networks are structured by graphs like the Internet or road traffic networks.

Deep learning that has originally been developed for computer vision cannot be directly applied to these highly irregular domains, and new classes of deep learning techniques must be designed. This is highly challenging as most standard data analysis tools cannot be used on heterogonous data domains. The workshop will bring together experts in mathematics (statistics, harmonic analysis, optimization, graph theory, sparsity, topology), machine learning (deep learning, supervised & unsupervised learning, metric learning) and specific applicative domains (neuroscience, genetics, social science, computer vision) to establish the current state of these emerging techniques and discuss the next directions.
This workshop will include a poster session; a request for posters will be sent to registered participants in advance of the workshop.
Here are the videos (and some slides):
Samuel Bowman (New York University)
Toward natural language semantics in learned representations

Emily Fox (University of Washington)
Interpretable and Sparse Neural Network Time Series Models for Granger Causality Discovery

Ellie Pavlick (University of Pennsylvania), Should we care about linguistics?

Leonidas Guibas (Stanford University), Knowledge Transport Over Visual Data

Yann LeCun (New York University), Public Lecture: Deep Learning and the Future of Artificial Intelligence

Alán Aspuru-Guzik (Harvard University), Generative models for the inverse design of molecules and materials

Daniel Rueckert (Imperial College), Deep learning in medical imaging: Techniques for image reconstruction, super-resolution and segmentation

Kyle Cranmer (New York University), Deep Learning in the Physical Sciences

Stéphane Mallat (École Normale Supérieure), Deep Generative Networks as Inverse Problems

Michael Elad (Technion - Israel Institute of Technology), Sparse Modeling in Image Processing and Deep Learning

Yann LeCun (New York University), Public Lecture: AI Breakthroughs & Obstacles to Progress, Mathematical and Otherwise

Xavier Bresson (Nanyang Technological University, Singapore), Convolutional Neural Networks on Graphs

Federico Monti (Universita della Svizzera Italiana), Deep Geometric Matrix Completion: a Geometric Deep Learning approach to Recommender Systems

Joan Bruna (New York University), On Computational Hardness with Graph Neural Networks

Jure Leskovec (Stanford University), Large-scale Graph Representation Learning

Arthur Szlam (Facebook), Composable planning with attributes

Yann LeCun (New York University), A Few (More) Approaches to Unsupervised Learning

Sanja Fidler (University of Toronto), Teaching Machines with Humans in the Loop
Raquel Urtasun (University of Toronto), Deep Learning for Self-Driving Cars

Pratik Chaudhari (University of California, Los Angeles (UCLA)), Unraveling the mysteries of stochastic gradient descent on deep networks

Stefano Soatto (University of California, Los Angeles (UCLA)), Emergence Theory of Deep Learning

Tom Goldstein (University of Maryland), What do neural net loss functions look like?

Stanley Osher (University of California, Los Angeles (UCLA)), New Techniques in Optimization and Their Applications to Deep Learning and Related Inverse Problems

Michael Bronstein (USI Lugano, Switzerland), Deep functional maps: intrinsic structured prediction for dense shape correspondence

Sainbayar Sukhbaatar (New York University), Deep Architecture for Sets and Its Application to Multi-agent Communication

Zuowei Shen (National University of Singapore), Deep Learning: Approximation of functions by composition

Wei Zhu (Duke University), LDMnet: low dimensional manifold regularized neural networks

Join the CompressiveSensing subreddit or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.