Nuit Blanche: Fast and Robust Archetypal Analysis for Representation Learning

Friday, May 30, 2014

Fast and Robust Archetypal Analysis for Representation Learning - implementation -

Here is the second entry on Archetypal Analysis today with a matlab implementation within SPAMS (soon to be added to the Advanced Matrix Factorization Jungle Page):

From the Introduction section:

Our main objective is to rehabilitate a pioneer unsupervised learning technique called archetypal analysis [5], which is easy to interpret while providing good results in prediction tasks. It was proposed as an alternative to principal component analysis (PCA) for discovering latent factors from high-dimensional data. Unlike principal components, each factor learned by archetypal analysis, called archetype, is forced to be a convex combination of a few data points. Such associations between archetypes and data points are useful for interpretation. For example, clustering techniques provide such associations between data and centroids. It is indeed common in genomics to cluster gene expression data from several individuals, and to interpret each centroid by looking for some common physiological

traits among individuals of the same cluster [7]. Interestingly, archetypal analysis is related to popular approaches such as sparse coding [18] and non-negative matrix factorization (NMF) [19], even though all these formulations were independently invented around the same time. Archetypal analysis indeed produces sparse representations of the data points, by approximating them with convex combinations of archetypes; it also provides a non-negative factorization when the data matrix is non-negative.

A natural question is why archetypal analysis did not gain a lot of success, unlike NMF or sparse coding. We believe that the lack of efﬁcient available software has limited the deployment of archetypal analysis to promising applications; our goal is to address this issue...

The forimulation section has a very nice writeup on how the linear algebra of this matrix factorization differs/parallels that of NMF and Sparse Coding. Here is the paper: Fast and Robust Archetypal Analysis for Representation Learning by Yuansi Chen, Julien Mairal, Zaid Harchaoui

We revisit a pioneer unsupervised learning technique called archetypal analysis, which is related to successful data analysis methods such as sparse coding and non-negative matrix factorization. Since it was proposed, archetypal analysis did not gain a lot of popularity even though it produces more interpretable models than other alternatives. Because no efficient implementation has ever been made publicly available, its application to important scientific problems may have been severely limited. Our goal is to bring back into favour archetypal analysis. We propose a fast optimization scheme using an active-set strategy, and provide an efficient open-source implementation interfaced with Matlab, R, and Python. Then, we demonstrate the usefulness of archetypal analysis for computer vision tasks, such as codebook learning, signal classification, and large image collection visualization.

The implementation is part ofthe SPAms software at: http://spams-devel.gforge.inria.fr/

In the Advanced Matrix Factorization Jungle Page, I listed the Archetypal decomposition with the following definition: