## Page Views on Nuit Blanche since July 2010

My papers on ArXiv:
Approximating Kernels at the speed of Light
&
Imaging with Nature

LightOn
LinkedIn (727)|| on CrunchBase || our Blog
(2452)
Compressive Sensing on LinkedIn
(3967)
(1333)||
Attendant references pages:
The Advanced Matrix Factorization Jungle Page ||

Paris Machine Learning
@Meetup.com (8016 members) || @Archives

## Thursday, April 25, 2019

### Why are Big Data Matrices Approximately Low Rank?

It used to be the most imaging scene were compressible and the JPEG standard was a daily reminder of that reality. In the following paper, we are getting sharper re-assurances about the data world around us. I note the use of tools like the \epsilon-rank is a reminder to \epsilon-pseudospectrum in non-normal settings. Taking this argument in the reverse, if you don't find a close-by low rank matrix next to your data matrix, maybe there is something wrong with your data. Data matrix low-rankedness is needed in several Matrix Factorization operations.

Matrices of (approximate) low rank are pervasive in data science, appearing in recommender systems, movie preferences, topic models, medical records, and genomics. While there is a vast literature on how to exploit low rank structure in these datasets, there is less attention on explaining why the low rank structure appears in the first place. Here, we explain the effectiveness of low rank models in data science by considering a simple generative model for these matrices: we suppose that each row or column is associated to a (possibly high dimensional) bounded latent variable, and entries of the matrix are generated by applying a piecewise analytic function to these latent variables. These matrices are in general full rank. However, we show that we can approximate every entry of an m×n matrix drawn from this model to within a fixed absolute error by a low rank matrix whose rank grows as O(log(m+n)). Hence any sufficiently large matrix from such a latent variable model can be approximated, up to a small entrywise error, by a low rank matrix.
And as George Linderman points out on Twitter:

Published in SIAM Jounral on Mathematics of Data Science.

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.