Nuit Blanche: Sunday Morning Insight: Of Fluid Autoencoders and Data Tsunamis

Sunday, February 08, 2015

Sunday Morning Insight: Of Fluid Autoencoders and Data Tsunamis

The ah-ah moment probably occurred at the First French-German Mathematical Image Analysis Conference at IHP as I was watching Rene Vidal perfectly explains his subspace clustering approach [6]. The idea was to describe the data as a means of connecting datapoints that are close to each other to the exclusion of themselves. The decomposition goes as:

X = XZ

with the constraint diag(Z) = 0 in order to enforce the latter exclusion.

From [6]

As explained two months ago in a paper by Julie Josse and Stefan Wager (Stable Autoencoding: A Flexible Framework for Regularized Low-Rank Matrix Estimation) one could also seek a low rank Z instead of just diag(Z) = 0.

A relaxation of the diag(Z) = 0 constraint is Tr(Z) = 0 which as we all know from our Fluid Mechanics 101 courses is a volume preserving property (see Lie Algebra). Or to say it some other way, one could probably decompose Z as Z1 + Z2 such that Tr(Z1) = 0 and diag(Z) = 0. There, component Z1 quantifies the deformation of the dataset while Z2 quantifies volume change (low volume change if Z2 is a low rank matrix, as in the paper featured above). That last component could serve as a basis for autoencoders and might even provide additional light on how to devise nonlinear autoencoders [1,2,3,4,5]. It could probably be a proxy for exploration in the 'exploitation vs exploration' conundrum.

At the very least, the decomposition ought to provide a basis as to why data matrices are part of the Data Tsunami.

[1] Provable Bounds for Learning Some Deep Representations Autoencoders, Unsupervised Learning, and Deep Architectures by Pierre Baldi

[2] Auto-encoders: reconstruction versus compression by Yann Ollivier

[3] Sunday Morning Insight: The Great Convergence ?
[4] Sunday Morning Insight: The Regularization Architecture
[5] In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
[6] Recent Subspace clustering algorithms implementation :