Monday, November 19, 2007

Monday Morning Algorithm Part 3: Compressed Sensing meets Machine Learning / Recognition via Sparse Representation Classification Algorithm

[Wired readers, you may want to read this summary]
The whole list of Monday Morning Algorithms is listed here with attendant .m files. If you want contribute, like Jort Gemmeke just did, please let me know. For those reading this on Sunday, well, it's Monday somewhere. Now on today's algorithm:

In a previous entry, we discovered that the Machine Learning Community was beginning to use Random projections (the ones that follow the Restricted Isometric Property!) not just to reduce tremendously the dimension of images for pattern recognition. The thinking has gone farther by making the clever argument that when searching for a match between a query and a dictionary composed of training examples, only a few elements will match thus yielding a sparse representation in that dictionary. And so using one of the Compressed Sensing reconstruction technique, one could in effect say that the result of the database query has become a linear programming question (for those of us using L1 reconstruction techniques). John Wright, Arvind Ganesh Balasubramanian, Allen Yang and Yi Ma proposed this in their study on face recognition [1]. The generic approach in Machine Learning has mostly been based on Nearest Neighbor computation.

Intrigued by this approach, Jort Gemmeke decided to contribute today's Monday Morning Algorithm. Jort decided to write Algorithm 1 (Recognition via Sparse Representation) of reference [1]. I mostly contributed in prettying it up and then some (i.e. all the good stuff is his and all the mistakes are mine). Thank you Jort . It uses GPSR ( Mário Figueiredo, Robert D. Nowak and Stephen Wright have made a new version available) but in the comment section you can also uncomment to access L-1 Magic, Sparsify or CVX.

In this example we have a dictionary made up of cosine functions of different frequencies making up ten different classes (i.e. ten different frequencies), the query is close to the third class. In every class there are two cosine functions of the same frequency but with different noise added to them. The query is free of noise. You need to have GPSR in your path to have it run.

clear all;
% Algorithm implementing Recognition via Sparse Representation
% or Algorithm 1 suggested in
% "Feature selection in face recognition: A sparse representation
% perspective" by Allen Y. Yang, John Wright, Yi Ma, and Shankar Sastry
% written by Jort Florent Gemmeke and Igor Carron.

% n total number of images in the training database
n = 20;
% m is dimension of the image/ambient
% space (number of samples in signal )
% (e.g. total "training" set) ..e.g. 101
m = size(x,2);
% d is the number of Compressed Sensing
% measurements, it is also the dimension of the feature
% space.
d = 7;
% k is the number of classification groups and for k subjects
k = 10;
% e.g. every group consists of kn = 20 samples.
kn = n/k;

% Building the training database/dictionary A of n elements
% Within that dictionary there are k classes
% Each of these classes correspond to functions:cos(i x)
% with added noise. i is the differentiator between classes.
% Within each class, any element is different from the other
% by some amount of noise.
% Each element is listed in a row. Each function-element is
% sampled m times.
for i=1:k
for j=1:kn
yy = noise*rand(1,m);

% In the following, every index i contains a vector
% of length m with nonzeros corresponding to
% columns in A belonging to class i (i in [1,k])
% Every row holds one element of a class,
% i.e. there are as many rows as elements in this
% dictionary/database.
for i=1:k
onesvecs(i,:)=[zeros(1,(i-1)*kn) ones(1,kn) zeros(1,n-((i-1)* kn + kn))];
% This is the element for which we want to
% know to what class it belongs to. Here we give
% the answer for the plot in x_true.
x_true = [zeros(1,4) 1 1 zeros(1,n-6)];
x_true = x_true/norm(x_true,1);
% Now the expression stating that y = cos(3.*x);
y = cos(3.*x)';
% Please note it is different from both entries
% in the dictionary as there is no noise.
% we could have put this as well: y = A * x_true'
% but it would have meant we were using the training
% function as a query test.
% Step 2
% Dimensionality reduction from 101 to 7 features
% using random projection.
R = randn(d,m);
Atilde = R * A;
ytilde = R * y;

% Normalize columns of Atilde and ytilde
for i=1:size(A,2)
ytilde = ytilde/norm(ytilde);

% Step 3
% call any L1 solver you want. Here we call GPSR
% ---- L1 Magic Call
%xp = l1qc_logbarrier(Atilde\ytilde, Atilde, [], ytilde, 0.01, 1e-3);
% ---- Sparsify Solver Call
%xp = greed_omp(ytilde,Atilde,n);
% ---- CVX Solver Call
%sigma = 0.05;
% variable xp(k);
% minimize(norm(xp,1));
% subject to
% norm(Atilde*xp - ytilde) lessthan sigma;
% ---- GPSR Solver call

xp = GPSR_BB(ytilde,Atilde, 0.01*max(abs(Atilde'*ytilde)),'Continuation',1);

% plot results of L1 solution
figure; plot(xp,'v');
title('L1 solution (v), initial signal (o)');
hold; plot(x_true,'o');

% Step 4
% Calculate residuals (formula 13)
residual = [];
for i=1:k
% only keeps nonzeros in xp where onesvec is nonzero
deltavec(:,i) = onesvecs(i,:)'.* xp;
% calculate residual on only a subset of Atilde
residual(i) = norm(ytilde-Atilde*deltavec(:,i));
figure; bar(residual); % plot residuals
title('The lowest bar indicates the class number to which the element belongs to')

Please note the substantial phase space reduction from 101 down to 7 and still a good classification capability with noise over signal ratio above 10 percent for signals in the library.

[1] Robust Face Recognition via Sparse Representation, John Wright, Arvind Ganesh Balasubramanian, Allen Yang and Yi Ma. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Updated version (November 8, 2007) of the MATLAB code is available here: GPSR_4.0

Photo Credit: Credits: ESA ©2007 MPS for OSIRIS Team MPS/UPD/LAM/IAA/RSSD/INTA/UPM/DASP/IDA, View of Earth by night from the Rosetta probe in the fly by that took place on November 13th. Rosetta is on 10-year journey to 67/P Churyumov-Gerasimenko. And No, Rosetta is not an asteroid.

No comments: