Friday, May 30, 2014

Probabilistic Archetypal Analysis - implementation -



If one recalls the recent theoretical headway in NMF, it relies on the need to have pure components, an assumption called near-separability ( see the recent The Why and How of Nonnegative Matrix Factorization ). More recently, in Random Projections for Non-negative Matrix Factorization, there was also some resorting to some extreme point in point clouds:
However, the geometric interpretation remains valid and the approach gives non-negative factors U and V such that the columns of U are the extreme rays of a polyhedral cone that contains most of the columns of X....

So this is facsinating that there seem to be another line of inquiry, Archetypal Analysis, that started back in 1994 with Cutler and Breiman that does a matrix factorization similar to the NMF but has additional constraints in that 
each individual member of a set of data vectors as a mixture (a constrained linear combination) of the pure types or archetypes of the data set.
Archetypal Analysis has some relation to k-means as well. This advanced matrix decomposition is soon to be added to the Advanced Matrix Factorization Jungle page as this entry and the next feature new implementations of that algorithm. Here is the first: Probabilistic Archetypal Analysis by Sohan Seth, Manuel J. A. Eugster
Archetypal analysis represents a set of observations as convex combinations of pure patterns, or archetypes. The original geometric formulation of finding archetypes by approximating the convex hull of the observations assumes them to be real valued. This, unfortunately, is not compatible with many practical situations. In this paper we revisit archetypal analysis from the basic principles, and propose a probabilistic framework that accommodates other observation types such as integers, binary, and probability vectors. We corroborate the proposed methodology with convincing real-world applications on finding archetypal winter tourists based on binary survey data, archetypal disaster-affected countries based on disaster count data, and document archetypes based on term-frequency data. We also present an appropriate visualization tool to summarize archetypal analysis solution better.
The implementation is at: http://aalab.github.io/


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments:

Printfriendly