Showing posts with label monday morning algorithm. Show all posts
Showing posts with label monday morning algorithm. Show all posts

Monday, February 08, 2010

Monday Morning Algorithm 17: Compressive Sensing Linear Black Box Discovery

On Saturday, I wrote about an algorithm that is trivial to implement in linear algebra when done naively. It is so trivial I am not sure it has a name. Anyway, I just implemented it and pretty much like for the original l_1-magic tests, it looks like magic! I use the wording Linear Black Box thanks to Sergey Ten but if anybody knows of a better name...

I was also corrected by Stephen an anonymous reader, on how to introduce the problem, so let me use his significantly better introduction:

You are trying to infer an m x n matrix A with unknown elements. The only method of learning about this matrix is through applying matrix vector products Ax = y. That is, you construct a vector x, perform a multiplication Ax, and finally analyze the resulting vector y of observations.

In reality, the elements of matrix A consists of individual features of a physical system we are trying to learn about. Because this physical system A is linear, we know the act of physically measuring the system is equivalent to matrix vector multiplication Ax = y.


The trivial algorithm uses a series of x vectors equal to the canonical basis vectors to find the columns of A. In our case though, we have the additional information from the physics of the system that A is sparse [Update: However, we do not know the sparsity pattern of A]. We also have a constraint that the number of analog or physical matrix-vector Ax multiplications must be reduced. In the context of Linear Algebra, there is really little you can do to use this sparsity information about A and you are therefore stuck with having to perform n measurements. In Saturday's entry, I sketched a solution where the number of measurements could be of the order of klog(n) where k represents the sparsity of matrix A. Today's entry is a self contained prototype of that implementation. It uses GPSR but it could use any of the other algorithm listed in the reconstruction section of the big picture in compressive sensing. Here it is:

% This code implements a Linear Black Box problem
% when one knows that the unknown linear system is sparse.
% More information can be found here:
%
% http://nuit-blanche.blogspot.com/2010/02/cs-compressive-sensing-linear-al
% gebra.html
%
% Written by Igor Carron
% This script/program is released under the Commons Creative Licence
% with Attribution Non-commercial Share Alike (by-nc-sa)
% http://creativecommons.org/licenses/by-nc-sa/3.0/
% Disclaimer:the short answer, this script is for educational purpose only.
% More Disclaimer: http://igorcarron.googlepages.com/disclaimer
%


clear;
m = 1000;
n = 2000;
n1 = 50;

A = zeros(m,n);
A(2,1324) =3;
A(254,802)=25;
A(6,1)=12;

% 1. Start with an empty matrix X (the size of X will be n x n1
% for the remainder of the algorithm)

X = randn(n,n1);
Y = A*X;

% 2. Produce a random vector of size n x 1 that will be added
% as one the column of X,

v = randn(n,1);
X =[X v];

% 3. Increase n1 by 1

n1 = n1 + 1;

% 4. Physically compute the response
% y = A x (x is an n x 1 vector, y is an m x 1 vector)

y = A*v;

% 5. Add the new found y to the columns of Y ( an m x n1 matrix)

Y = [Y y];

Xt = X';
Yt =Y';

% 6. Solve for all vectors at_i (i goes from 1 to m)
%in yt_i = Xt at_i where yt_i is an C n1 x 1 vector,
%Xt is an n1 x n matrix, at_i is an n x 1 vector.
%You may have noticed that C Xt is underdetermined, i.e. too
% few rows compared to the number of columns, but there are C
%many different solvers that solve this system of equation.

for i=1:m
ytilde = Yt(1:n1,i);
at_i = GPSR_BB(ytilde,Xt, 0.01*max(abs(Xt'*ytilde)),'Continuation',1);
At(:,i) = at_i;
end
AA = At';
caxis([0 20]);
figure(1)
title('Initial matrix')
mesh(A); colormap(jet); colorbar
figure(2)
title(strcat('Reconstructed from :',num2str(n1), ...
' measurements instead of :', num2str(n)))
mesh(AA); colormap(jet); colorbar
figure(3)
caxis([1e-10 10]);
title('Reconstruction error matrix ')
mesh((abs(AA-A))); colormap(jet); colorbar


% 7. If convergence is observed for At (i.e. for all
% the columns at_i) then stop, if not C go to step 2.

It is a prototype, it is simple, but I think it is a good start for playing around and modifying it to include number of features (such as the loop in item 7).

What is important to see from this simple example:
  • In the trivial setting, I would need 1000 measurements of my physical system to get the full matrix A. I would probably have to put much thought on how to do this in parallel in the analog system if I wanted to do this fast.
  • In the Compressive Sensing setting, I need 50 measurements for an error of a few percents on the coefficients of A but, and this is the beauty of it, I can use all the computational power available on a supercomputer or on the cloud to find A. More specifically every row solving problem is self contained. The problem is eminently parallel. This is something I could not do in the trivial setting, i.e. no parallel supercomputer could help me perform my physical experimentation faster! Also, as an aside, when was the last time you heard an experimentalist say she needed a cluster of exactly 1026 CPUs to get results from her experiment ... yeah, never, I thought so.
The next step besides this prototype, would be to probably include a constraint on the fact that the coefficients must be positive and so forth. You can download the code here. This algorithm and others are listed in the Monday Morning Algorithm series.

Credit photo: NASA, Landsat view of New Orleans.

Tuesday, April 22, 2008

Monday Morning Algorithm 16: S-POCS, Using Non Convex Sparse Signal Space Instead of the Convex l1 Ball Improves POCS Recovery

This is a first, Laurent Jacques is a guest blogger on this blog and since he mentioned that he had worked on it on Monday I made this algorithm part of the Monday Morning Algorithm series. But more seriously, here is what he has to say :


I have spent some time today working with some CS reconstructions using the alternate projection algorithms (POCS) of Candes and Romberg [1] and Lu Gan [2].

In the context of CS where y = Phi x (length y = m, length x = N, Phi is m by N, x is K sparse and Phi follows the RIP), [1] explains that the solution of a BP program lies at the unique (thanks to RIP) intersection between the hyperplane P={x : Phi x = y} and the l1 ball B of radius ||x||_1. Using the extra a-priori information about the l1 norm of x, it is then possible to design an iterative alternate projection algorithm (or POCS) onto these two convex sets P and B in order to find their common intersection point ( i.e. to recover x from y.). The algorithm uses then two projectors : one involving the pseudo-inverse of Phi to project any signal u on P. i.e. to a solution of Phi x = y, and another using soft-thresholding so that the new thresholded signal falls inside B.

This method (denoted hereafter by l1-POCS and implemented in cspocs_l1.m with attendant description and example in the header) works very well as shown in [1]. Experimentally however, I have observed that it is highly sensitive to potential mis-evaluation of ||x||_1. Initially, I wanted to somehow approximate this norm using y and RIP inequalities, but I gave up since a simple multiplication or division of ||x||_1 by the small factor of 1.01 either breaks the recovery algorithm or does not converge.

In [2], the author designed another algorithm made of two POCS-like stages. I have implemented only a simplified version of the first stage since it seems that the second stage is more specific to the final goal of [2], i.e. removing artifacts due both to noise on the measurements and to the use of a special blocked measurement matrix. In this first stage, the POCS structure is again an iterative alternate projection process between the previous hyperplane P and the (non convex) space of K-sparse signals. A hard thresholding keeping the K largest components of a vector is used as the projector on this second space, while the same pseudo-inverse projector than in l1-POCS is used for projection onto P. Therefore, in [2], the a priori (input) extra information is not the l1 norm of x as for l1-POCS but the sparsity of x (as in CoSaMP). I have implemented this new "sparse POCS" (or S-POCS) approach in cspocs_K.m (description and example in the header too).

Even though the space of K-sparse signals is not convex, the new algorithm works surprisingly well : fewer iterations are needed when compared to l1-POCS, and small errors on K are acceptable. An input sparsity of K' = K+10 (if K is the sparsity of x) still works in the example provided with the algorithm. Moreover, If K' is less than K, S-POCS seems to converge very slowly to a wrong solution, while for K' = K the recovery is perfect and the rate of convergence is very fast (e.g. the quantity || y - Phi x || / ||x|| converges quickly below a user defined tolerance value, with ||.|| the l2 norm). This suggests a brute force method (not implemented) to create a similar algorithm but without the a priori knowledge of the sparsity level K. Indeed, one could just test the S-POCS above on a K' ranging between 1 and N and stop this loop when S-POCS converges "sufficiently well".

A zip file containing both scripts featuring l1-POCS and S-POCS can be found here and in the Compressed Sensing codes section. After downloading the archive into a folder, you may want to run trial_s_pocs.m and trial_l1_pocs.m to convince yourselves.

References :
[1] E. J. Candès and J. Romberg. Practical signal recovery from random projections. Wavelet Applications in Signal and Image Processing XI, Proc. SPIE Conf. 5914.
[2] Lu Gan, Block compressed sensing of natural images. Proc. Int. Conf. on Digital Signal Processing (DSP), Cardiff, UK, July 2007.



Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Monday, April 14, 2008

Monday Morning Algorithm 15: Building a Noiselet Basis and Measurement Matrix

For today's algorithm, Laurent Duval, a reader of this blog, decided to implement the noiselet basis of Coifman, Geshwind and Meyer [1] and share it. This basis has the particularity of being incoherent with the Haar Basis and therefore it is of particular interest in imaging in the compressed sensing framework (since pixels are Heaviside/Haar functions). It was mentioned before here and here. The script is available here but is also part of the Monday Morning Algorithm page and the Compressed Sensing code page as well.

function [noiselet_Matrix] = Build_Noiselets(n_Power)

% [noiselet_Matrix] = Build_Noiselets(n_Power)
% Built an orthonormal noiselet matrix with 2^n_Power
% In: n_Power (integer); default: n_Power = 5 (n_Sample = 2^n_Power = 32)
% Out: noiselet_Matrix (complex array); default (no outpu): display real and
% imaginery parts
% Example: noiselet_Matrix_1024 = Build_Noiselets(10);
%
% Comments: no optimization at all, suggestions welcome
% Creation: 2008/04/10
% Update: 2008/04/13
%
% Author: Laurent Duval
% URL: http://www.laurent-duval.eu

if nargin == 0
n_Power = 5;
end
n_Sample = 2^n_Power;

noiselet_Matrix = zeros(n_Sample,2*n_Sample-1);
noiselet_Matrix(:,1) = 1;
coef1 = 1 - i;
coef2 = 1 + i;
vect_Fill = zeros(n_Sample/2,1);
for i_Col = 1:n_Sample-1
vect_2x = [noiselet_Matrix(1:2:n_Sample,i_Col);vect_Fill];
vect_2x_1 = [vect_Fill;noiselet_Matrix(1:2:n_Sample,i_Col)];
noiselet_Matrix(:,2*i_Col) = coef1*vect_2x + coef2*vect_2x_1;
noiselet_Matrix(:,2*i_Col+1) = coef2*vect_2x + coef1*vect_2x_1;
end
noiselet_Matrix = 1/n_Sample*noiselet_Matrix(:,n_Sample:2*n_Sample-1);

if nargout == 0
figure(1)
subplot(1,2,1)
imagesc(real(noiselet_Matrix))
xlabel('Real part')
subplot(1,2,2)
imagesc(imag(noiselet_Matrix))
xlabel('Imaginery part')
end


Laurent is making some anagram on noiselets in his blog. As usual now, the disclaimer for using this script is listed here.

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Reference: [1] R. Coifman, F. Geshwind, and Y. Meyer. Noiselets. Appl. Comp. Harmonic Analysis, 10:27–44, 2001.

Monday, April 07, 2008

Monday Morning Algorithm 14: A Comparison of the Reconstruction Capability of CoSaMP, OMP, Subspace Pursuit and Reweighted Lp in Compressive Sensing

Today's Monday Morning Algorithm is provided by David Mary a reader of this blog. David was interested in checking the reconstruction efficiency of different algorithms used in Compressed Sensing / Compressive Sensing (If you want to know more about Compressed Sensing check the Big Picture out). So he implemented several reconstruction algorithms and wrote a script (scripttf2dcosamp.m) that shows their capabilities with regards to the initial sparsity of the signal. It includes the following implementations:


The original signal is an image. It features pixels of various intensities laying on a dark background (think of stars at night). It is obviously sparse in the direct/image space.

The measurement matrix is a 2D Random Partial Fourier Transform.

The goal is to find a sparse approximation of u_true knowing that b=phi*u_true (+ noise)
  • b: obs vector,
  • phi : CS matrix,
  • u_true : target object

To see the results, download this zip file in a folder, then enter scripttf2dcosamp at the prompt. You should see something like this.

Pretty neat, uh ? Thank you David for sharing this. I will eventually put each of these implementations on the Compressed Sensing codes page.

I have also decided to create a disclaimer for all the codes produced/implemented here. In a former life, when my team competed for DARPA's Grand Challenge (we failed), I had decided to release some of our codes. This is the disclaimer we used:

# This code is released
# under the General
# Public Licence (GPL),
# DISCLAIMER : if you do not know
# what you are doing
# do not use this program on a
# motorized vehicle or any machinery
# for that matter. Even if you know
# what you are doing, we absolutely
# cannot be held responsible for your
# use of this program under ANY circumstances.
# The program is
# released for programming education
# purposes only i.e. reading
# the code will allow students
# to understand the flow of the program.
# We do not advocate nor condone
# the use of this program in
# any robotic systems. If you are thinking
# about including this software
# in a machinery of any sort, please don't do it.

Since this was focused on a robot, I have changed it a little to reflect other concerns as well.

DISCLAIMER : if you do not know what you are doing do not use this program. Do not use it on a motorized vehicle or any machinery for that matter. Even if you know what you are doing, we absolutely CANNOT be held responsible for your use of this program under ANY circumstances. The program is released for programming education purposes ONLY i.e. reading the code will allow students to understand the flow and the intent of an algorithm. The purpose of this code/program is to further human understanding in basic science.

We do not advocate nor condone the use of this program in any robotic systems nor in the implementation of any hardware. If you are thinking about including this software in a machinery or some larger software program of any sort, please don't do it , because you are on your own.
This disclaimer is now listed here. I know it sounds funny.

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Monday, March 31, 2008

Monday Morning Algorithm 12: Boosted Subspace Pursuit using Re-weigthed Lp for Compressive Sensing

[Warning: Before you read this entry, read this one first ]
Today's algorithm is going to be a composition of two algorithms already implemented. When I implemented the algorithm of Wei Dai and Olgica Milenkovic on Subspace Pursuit (Monday Morning Algorithm 10), my implementation and my simple example would work most of the time but not all the time. I also noticed that the final solution was close to the expected solution. Then I remembered what Emmanuel Candes said in his lecture on re-weighted L1 (specifically, the last one entitled Applications, experiments and open problems) : find a solution close enough and then use re-weighted L1 to refine the solution further. Since the re-weighted Lp solution technique of Rick Chartrand and Wotao Yin has also been implemented (Monday Morning Algorithm 2), I rewrote a line into it so that it would take the solution of the subspace Pursuit as the initial guess of its solution (instead of the L2 solution as is done in the original algorithm). By running SPtrial of the initial subspace pursuit implementation one may get this result:


The solution is wrong but not that wrong. By using this solution as a guess into the re-weighted Lp algorithm, one eventually obtains:



function x = lp_re1(A,y,p,x0)
%
% lp-Reweighted Least Squares Problem Solver
% Algorithm implementing an approach suggested in
% "Iteratively Reweighted Algorithms for Compressive Sensing"
% by Rick Chartrand and Wotao Yin
% Algorithm implemented as featured in:
% http://math.lanl.gov/Research/Publications/Docs/chartrand-2008-iteratively.pdf
%
%
% lp_re solves problems of the following form:
% minimize ||x||_p
%
% subject to A*x = y
%
% where A and y are problem data and x is variable (described below).
%
% CALLING SEQUENCES
% [x] = lp_re1(A,y,p,x0);
%
% INPUT
% A : mxn matrix; input data. columns correspond to features.
%
% m : number of measurements ( or rows) of A
% n : dimension of ambient space (or columns) of A
%
% y : vector of dimension m; result.
%
% p : norm order; p = 1 is l1 norm
%
% x0 : guess for x, vector of size n
%
% OUTPUT
% x : vector of vector n; initial unknown
%
% USAGE EXAMPLES
% x = lp_re(A,y,p,x0);
%
% AUTHOR Igor Carron
% UPDATE ver. 0.1, Feb 7 2008
%
% Creative Commons Licence
% http://en.wikipedia.org/wiki/Creative_Commons
%
epsilon = 1
% u_0 is a guess given to the subroutine
u_0 = x0;
u_old = u_0;
j=0;
while epsilon > 10^(-8)
j;
epsilon;
j = j + 1
w = (u_old.^(2) + epsilon).^(p/2-1);
v = 1./w;
Q_n = diag(v,0);
tu = inv(A*Q_n*A');
u_new = Q_n * A' * tu * y;
if lt(norm(u_new - u_old,2),epsilon^(1/2)/100)
epsilon = epsilon /10;
end
u_old = u_new;
end
x=u_new;

This is obviously a toy example that can be started by running SP1trial. In larger cases, one needs to pay attention to the issue Joel Tropp mentioned earlier (fast multiply) as re_lp1 is real slow otherwise. The code and example can be found on the Monday Morning Algorithm page.


Credit:NASA/JPL/GSFC/SwRI/SSI, Enceladus Heat map as seen by Cassini 19 days ago.

Monday, March 24, 2008

Monday Morning Algorithm 11: Sparse Measurement Matrix using Scrambled Block Hadamard Ensemble


In a previous entry, we discovered the paper by Lu Gan, Thong Do and Trac Tran that provides us with a compact measurement matrix as described in Fast compressive imaging using scrambled block Hadamard ensemble. We are going to implement this sparse measurement matrix as we have previously done with a similar entry in the Monday Morning Algorithm series. This measurement matrix is a cornerstone in the field of Compressed Sensing. If you don't know about Compressed Sensing, you may want to check this out. As usual all errors are mine and if you find a mistake in the implementation, please let me know. If you also know how to quietly do evals with matlab I am also a taker.

function [wfinal]=sbhe(k,hd,mt)
%
% Function implementing measurement matrix as given in
% "Fast compressive imaging using scrambled block Hadamard ensemble."
% by Lu Gan, Thong Do and Trac Tran
%
% Written by Igor Carron.
%
% Input:
% k : number of hadamard matrix blocks
% hd: size of Hadamard matrix (needs to be divisible by 2)
% mt: number of columns of wfinal
% Output:
% wfinal: matrix with k*nd columns and mt rows
%
% k=10;
% hd = 32;
% mt = 30;
ntotal = hd * k;
wh = hadamard(hd);
st1='';
for i=1:k-1
st1 = strcat(st1,'wh,');
end
st1= strcat(st1,'wh');
eval(['w = blkdiag(' st1 ')']);
u = randperm(ntotal);
for ii=1:ntotal
wnew(:,ii)=w(:,u(ii));
end
%figure(1)
%spy(w)
%figure(2)
%spy(wnew)
vv = [];
while size(vv,1)~=mt
v2 = cat(2,vv,round(1+rand(mt,1)*(ntotal-1)));
vv = unique(v2);
end
m = size(vv,1);
for j=1:m
wfinal(j,:) = wnew(vv(j),:);
end
%spy(wfinal)

The m file can be downloaded from here or here.

Credit Photo: ESA/ MPS/DLR/IDA, Venus’s atmosphere, taken by the Venus Monitoring Camera (VMC) during Venus Express orbit number 458 on 23 July 2007.

Friday, March 14, 2008

Monday Morning Algorithm 10: Subspace Pursuit for Compressive Sensing.

[Warning: Before you read this entry, read this one first ]
I know this is not Monday, but if you read this on Monday, it'll look like it was posted on Monday. Today's algorithm has to do with Compressed Sensing. In Compressed Sensing, the idea is that you can sample a signal much below the Nyquist rate if you have the prior knowledge that it is sparse. Since most signal are sparse, this technique should eventually become mainstream. If you want to know more about Compressed Sensing check the Big Picture out.

If you have been obtaining Compressed Sensing measurements, the most difficult task is then to decode that information into the actual signal. You can use any of the amazing reconstruction codes produced so far or you could implement a simple one proposed by Wei Dai and Olgica Milenkovic in Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity. [Update: Wei Dai provided me with the original code used in the paper and asked me to put it on this site ]

The algorithm is pretty straightforward from the explanation given in the paper. Here is the SP.m file but the full set of m files implementing this technique can be found either in the local CS code section or in the Monday Morning Algorithm archive. Once you have downloaded all the functions in one directory, you can run directly trialSP from the prompt. It runs one simple example of a vector of 290 elements and 3 non-zero elements. As expected by the log(K) asymptotic, I am pretty much amazed at the speed of this thing. If you find a bug please do let me know.

function [xfinal,That]=SP(K, phi, y,nn)
%
% This is an example using the Subspace Pursuit
% Algorithm of Wei Dai and Olgica Milenkovic
% "Subspace Pursuit for Compressive Sensing: Closing the
% Gap Between Performance and Complexity"
%
% Written by Igor Carron
% http://nuit-blanche.blogspot.com
% Creative Commons Licence

% Initialization
%
u = phi' * y;
[That,phiThat]=deft(phi,u,K);
y_r = resid(y,phiThat);
%
%
% Iteration
%
while y_r ~= 0
u1 = phi' * y_r;
[Tprime1,phiTprime1]=deft(phi,u1,K);
[Tprime,phiTprime]=uni(Tprime1,That,phi);
phiTprimecross = cros(phiTprime);
x_pprime = phiTprimecross * y;
[Ttilde,phiTtilde] = deft(phi,x_pprime,K);
ytilde_r = resid(y,phiTtilde);
if gt(norm(ytilde_r) ,norm(y_r))
break
else
That = Ttilde;
y_r = ytilde_r;
end
end
%
% Output
%
phiThatcross = cros(phiThat);
x_That = phiThatcross * y;
That;
xfinal = zeros(nn,1);
for ii = 1:K
xfinal(That(ii))=x_That(ii);
end

Tuesday, January 15, 2008

Monday Morning Algorithm 9: Implementation of Sparse Measurement Matrices for Compressive Sampling

Ok, so it's not Monday, at least I thought about the algorithm on Monday Morning so there. Last week I mentioned the work of Radu Berinde and Piotr Indyk on Sparse Recovery Using Sparse Random Matrices ( an older version of this paper is here). Since this is a potentially ground breaking work with regards to the hardware implementation of Compressive Sampling, I decided to implement a non-optimized version of the measurement matrix construction of the paper. As usual if you find a mistake please let me know. Also all the codes produced in the Monday Morning Algorithm Series can be found here.


clear
% Algorithm implementing the Sparse Measurement Matrix
% construction of Radu Berinde and Piotr Indyk in
% "Sparse Recovery Using Sparse Random Matrices"
% d is the number of ones per column
% m is the number of rows
% n is the number of columns
% generally m should be much less than n
% n is limited by (m,d) or
% factorial(m)/(factorial(d)*factorial(m-d))
% Written by Igor Carron
d=3
% m rows
m=5;
% n columns
n=10;
% with these parameters it works with n=10
% but it does not work with n greater than 10
A=zeros(m,n);
jj = 0;
for i=1:n
while sum(A(:,i),1)~=d | jj==47
jj=0;
A(:,i)=zeros(m,1);
for j=1:d
A(round((m-1)*rand(1,1))+1,i)=1;
end
v1 = diag(A(:,i))'*ones(m,n);
w = A - v1;
for j=1:i-1
if w(:,j)== zeros(m,1)
jj = 47;
end
end
end
end
% Measurement Matrix is A

Some people might think there is an error in j = 47, they think it is j =42. They are wrong. Also while this is a binary matrix, it can be used with non-binary objects. [ Update: I have made it a function and it is available here. It is working as nicely as the normal measurement matrices on the example set I used it on].

Credit Photo: NASA/JPL/ Space Science Institute, Saturn and Titan on June 9, 2006.

Tuesday, December 25, 2007

Monday Morning Algorithm 8: An Example of Compressed Measurements Speeding Up Diffusion Maps


In the previous installment of the Monday Morning Algorithm, we featured the diffusion method. I also mentioned that one could use compressed sensing and obtain the same result: i.e the ability to sort several high dimensional images because only one parameter is being varied. When going through this exercise, one can note that of the most expensive operation is the inter-distance measured between all the elements of the set. If every picture is 10 MP, one has to evaluate the difference between elements made of ten million numbers. In Compressed Sensing, one learns that projecting a large image on the right kind of random projections, one can keep the amount of information in the original image. We are going to reuse our last algorithm but we will be projecting the original images of Buzz Lightyear on a smaller space. We will be using a Gaussian ensemble and will be projecting from an image of dimension 57600 down to 100 Compressed Measurements. Let us note that this not an efficient operation unless we have directly access to a compressed measurements or if one does extra operations with the compressed measurements.


clear
% Stephane Lafon, “Diffusion maps and geometric harmonics”,
% Ph.D. dissertation, Yale University (May
% 2004)
% written by Cable Kurwitz and Igor Carron
% This algorithm shows how random projections
% used in compressed sensing provides the same
% results as the original images.
%
% Number of Random Projections
number_RP =100;
% get all the files in jpg
d = dir('buzz*.jpg')
ntotal = size(d,1)
for i=1:ntotal
d(i).name
end
Img = imread(d(1).name);
s=size(Img,1);
w=size(Img,2);
hd=size(Img,3);
swhd=s*w*hd;
%reshape these images into one vector each
for ii=1:ntotal
Img = imread(d(ii).name);
b1(:,ii) = double(reshape(Img, swhd,1));
end
%
% Compressed sensing part
sa=randn(number_RP,swhd);
for i=1:swhd
sa(:,i)=sa(:,i)./norm(sa(:,i));
end
b=sa*b1;
%b=b1;
%
%
for i=1:ntotal
for j=1:ntotal
D1(i,j)=norm(b(:,i)-b(:,j),2);
end
end
%
n1=ntotal;
D_sum=0;
for l = 1:n1;
D_sum = D_sum + min(nonzeros(D1(l,:)));
end;
epsilon = D_sum/n1/10; %
%
% Step 1
%
kernel_1 = exp(-(D1./epsilon).^1);
%
% Step 2
%
one_V = ones(1,n1);
%
% Step 3
%
p = kernel_1*one_V';
kernel_2 = kernel_1./((p*p').^1);
%
% Step 4
%
v = sqrt(kernel_2*one_V');
%
% Step 5
%
K = kernel_2./(v*v');
%
% Step 6
%
%% Singular Value Decomposition
[U S V] = svd(K);
% Normalization
for i=1:ntotal-1
phi(:,i) = U(:,i)./U(:,1); %
end
eig_funct = phi(:,1:6);
%
% The eigenvalues of \delta are approximated by those of K, and its
% eigenfunctions Ái are approximated by phi(:,i) = U(:; i):=U(:; 1)
%
vv=eig(K);

% Initial order of Buzz images
figure(1)
for i=1:ntotal
subplot(3,4,i)
imshow(d(i).name)
end

X = vv(ntotal-1).*phi(:,2);
% sorting the Buzz images using the first
% eigenfunctions the Laplace Beltrami operator
[Y,I] = sort(X);
% d(I).name
%
figure(2)
for i=1:ntotal
subplot(3,4,i)
imshow(d(I(i)).name)
end
%
figure(3)
plot(vv(ntotal-1).*phi(:,2),'.')
title(' Buzz Lightyear')
xlabel('Index')
ylabel('Lambda_1 phi_1')
%
figure(4)
plot(vv(ntotal-1).*phi(:,2),vv(ntotal-2).*phi(:,3),'.')
title(' Buzz Lightyear')
xlabel('Lambda_1 phi_1')
ylabel('Lambda_2 phi_2')
%
figure(5)
plot(eig(K),'*')
xlabel('Index of eigenvalue')
ylabel('Value of eigenvalue')
title(' Eingenvalues of K')
vv=eig(K);
%
It looks like 100 projections are enough to recover the same results as the original analysis. One can note that this process of figuring out what is the number of random projections needed would require the type of analysis evaluating the intrinsic dimensionality of the set as mentioned here by the folks at Rice.

All the Monday Morning Algorithms (and attendant codes/files) are listed here.

Tuesday, December 18, 2007

Monday Morning Algorithm 7: Diffusion Maps for Dimensionality Reduction and Manifold Parametrization.

There aren't that many dimensionality reduction approaches that also try to provide some parametrization of the manifold. The diffusion map approach is one of them. It tries to approximate a Laplace-Beltrami operator on the manifold. This is computed through the algorithm of page 33 of Stephane Lafon's dissertation [1]. Extensions can be found here and interactive animated examples can be found here. For today's algorithm, we will use images of Buzz Lightyear rotating around a single axis. The file is here. Obviously this is an unoptimized code. For better implementation of diffusion maps and other dimensionality reduction technique, this site is a good choice .

What do we do here ? we first compute the inter distance between each frame, then plug this into a kernel which eventually provides an idea of how connected each frame is to each other. Using a thermal heat transfer analogy, a frame is a node and the new distance between any one node to another one is inversely proportional the number of nodes between the two nodes. The new metric induced by this new distance enables a new parametrization of the manifold. In our case, the manifold is made of one object rotating around one axis. Parametrization of the underlying manifold should yield a one parameter family since only one dimension is being varied between each frame (one rotation). In order to show this, we use the first eigenfunction of the Laplace Beltrami and sort each frame according to its projection on that eigenfunction. The sorting along that dimension permits one to sort Buzz Lightyear rotating continuously from one end to the other.

clear
% Stephane Lafon, “Diffusion maps and geometric harmonics”,
% Ph.D. dissertation, Yale University (May
% 2004)
% written by Cable Kurwitz and Igor Carron
%
% get all the files in jpg
d = dir('buzz*.jpg')
ntotal = size(d,1)
for i=1:ntotal
d(i).name
end
Img = imread(d(1).name);
s=size(Img,1);
w=size(Img,2);
hd=size(Img,3);
swhd=s*w*hd;
%reshape these images into one vector each
for ii=1:ntotal
Img = imread(d(ii).name);
b1(:,ii) = double(reshape(Img, swhd,1));
end
%
% Compressed sensing part
%sa=randn(20,swhd);
%b=sa*b1;
b=b1;
%
%
for i=1:ntotal
for j=1:ntotal
D1(i,j)=norm(b(:,i)-b(:,j),2);
end
end
%
n1=ntotal;
D_sum=0;
for l = 1:n1;
D_sum = D_sum + min(nonzeros(D1(l,:)));
end;
epsilon = D_sum/n1/10; %
%
% Step 1
%
kernel_1 = exp(-(D1./epsilon).^1);
%
% Step 2
%
one_V = ones(1,n1);
%
% Step 3
%
p = kernel_1*one_V';
kernel_2 = kernel_1./((p*p').^1);
%
% Step 4
%
v = sqrt(kernel_2*one_V');
%
% Step 5
%
K = kernel_2./(v*v');
%
% Step 6
%
%% Singular Value Decomposition
[U S V] = svd(K);
% Normalization
for i=1:ntotal-1
phi(:,i) = U(:,i)./U(:,1); %
end
eig_funct = phi(:,1:6);
%
% The eigenvalues of \delta are approximated by those of K, and its
% eigenfunctions Ái are approximated by phi(:,i) = U(:; i):=U(:; 1)
%
vv=eig(K);

% Initial order of Buzz images
figure(1)
for i=1:ntotal
subplot(3,4,i)
imshow(d(i).name)
end

X = vv(ntotal-1).*phi(:,2);
% sorting the Buzz images using the first
% eigenfunctions the Laplace Beltrami operator
[Y,I] = sort(X);
% d(I).name
%
figure(2)
for i=1:ntotal
subplot(3,4,i)
imshow(d(I(i)).name)
end
%
figure(3)
plot(vv(ntotal-1).*phi(:,2),'.')
title(' Buzz Lightyear')
xlabel('Index')
ylabel('Lambda_1 phi_1')
%
figure(4)
plot(vv(ntotal-1).*phi(:,2),vv(ntotal-2).*phi(:,3),'.')
title(' Buzz Lightyear')
xlabel('Lambda_1 phi_1')
ylabel('Lambda_2 phi_2')
%
figure(5)
plot(eig(K),'*')
xlabel('Index of eigenvalue')
ylabel('Value of eigenvalue')
title(' Eingenvalues of K')
vv=eig(K);
%


Please note the ability to test the same algorithm by reducing first the frames through random projections (one needs to delete some % ). It is pretty obvious that one would need a robust metric in the first place instead of L2, one could foresee the use of a specifically designed metric (Improving Embeddings by Flexible Exploitation of Side Information) on top of the reduction operated with the random projections. The specifically designed metric would alleviate the need to come up with the ad-hoc epsilon of this method.

The diffusion mapping has been taken to another level by Mauro Maggioni with diffusion wavelets and Sridhar Mahadevan with Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions.


Reference: [1] S. Lafon, “Diffusion maps and geometric harmonics”, Ph.D. dissertation, Yale University (May 2004)

Monday, December 10, 2007

Monday Morning Algorithm Part 6: The Fast Johnson-Lindenstrauss Transform

Last year, Nir Ailon and Bernard Chazelle described a new algorithm that improves the ability to perform the Johnson-Lindenstrauss transform. It is presented in Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform (slides (ppt) are here). The summary can be described in the Fast Johnson-Lindenstrauss Transform Lemma:

Fix any set X of n vectors in R^d, epsilon < 1, and p is either 1 or 2. Let phi be the FJLT. With probability at least 2/3, we have for all x element of X: (1 + epsilon) * alpha_p * ||x||_2 < || Phi x||_p < (1 + epsilon) * alpha_p * ||x||_2 This dimension reduction technique improves a classic algorithm by Johnson and Lindenstrauss from the mid 80’s. Their technique is the first to offer an asymptotic improvement, and has already been used in design of efficient algorithms for nearest neighbor searching and high dimensional linear algebraic numerical computations. A better technique has recently surfaced as well. A good description of this new technique can be found in this presentation by Edo Liberty.

I have implemented a version of it by following the instructions on the paper but I am not convinced I have done the right implementation as the results are not so sharp to me. I haven't had much time to debug it so here it is, if you find a mistake please let me know:

clear
% Fast Johnson-Lindenstrauss Transform
% as described by Nir Ailon and Bernard Chazelle
% in their paper:
% "Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss
% Transform"
% The algorithm uses the PHD transform of paragraph 2
% The transform is a k x d projection
% for every set S of n points in R^d
% Let us define an example case
% All columns of G are sets of points
% i.e. we have 1000 points of dimension 128
n = 1000;
d = 256;
% let us examine a reduction from d = 256 to k = 150
%
k = 150;
G = 180 * rand(d,n);
% k = O(epsilon^(-2) * log(n))
epsilon = sqrt(log(n)/k)
%
% epsilon is really O( sqrt(log(n)/k) )
%
%
% Projection in dim k << d % let us take d = 2^w w = log2(d); % % % Defining P (k x d) % first define q % q = min((log(n))^2/d,1); % P = rand(k,d); P = (P < q); P1 = 1/q * randn(k,d); P = P1 .* P; % % Defining H (d x d) H = [1]; for i=1:w H = [ H H;H -H]; end H = 1/d^(0.5)* H; % % Defining D ( d x d) vec1=rand(d,1); v1 = ((vec1 > 0.5) - 1*(vec1<=0.5)); D = diag(v1); % % % FJLT = P * H * D; u = FJLT * G(:,5); v = FJLT * G(:,36); norm(G(:,5)-G(:,36))^2 * k * (1-epsilon) norm( u - v)^2 norm(G(:,5)-G(:,36))^2 * k * (1+epsilon)

Sunday, December 02, 2007

Monday Morning Algorithm part 5: The 1-D Experimental Probabilistic Hypersurface


The Experimental Probabilistic Hypersurface is a tool initially designed by Bernard Beauzamy. Let me first try to say why it is useful first. Imagine you have a function of several variables, say 100, and it takes 2 hours to evaluate that function. At the end of his/her 8 hours day, the engineer will have processed at best 4 function evaluations. It will take a long time of doing any type of sensitivity studies.

In order to have a faster turn around, one could foresee a scheme where one would do a linear interpolation between points (of dimension 100) that we already know. This would provide an idea of what the function looks like in a larger region. Unfortunately, this interpolation scheme in dimension 100 would by itself take too much time. Irrespective to this time constraint, there is nothing that says this function should be linear or even continuous.

In 2004, Bernard Beauzamy made this original construction whereby the function to be evaluated is a multidimensional probabilistic distribution. For every set of (100) parameters that have been already evaluated, the result of the EPH for this function is a dirac. For all other sets of parameters that have not been evaluated, the result of the EPH for this function is a distribution whose shape will be as peaky (resp. flat) as one is close (resp. far) from the sets of parameters for which the function has already been evaluated. Olga Zeydina, a Ph.D candidate, is writing her thesis on the subject. Most information on this construction can be found in the document section of the Robust Mathematical Modeling program at SCM. Of particular interest are: the initial practical implementation [ Report 1 ] and a newer implementation [Report 2].

Today's algorithm will be featured in two files. One that implements the EPH, the other using the EPH function and produce graphs from this document. Listed in this entry, I will list the EPH function file. The second file can be found in the MMA repository.

function [t,pxx]=EPH(xi,theta,x,t1,tau,xe)
% Algorithm developed by Bernard Beauzamy
% and Olga Zeydina
%
% This implementation was written by
% Igor Carron
%
% Inputs
% xi: a parameter set where the function
% has been evaluated
%
% theta: the result of the
% function evaluation for parameter set xi
%
% x: bounding value for parameter set xi
%
% t1: bounding value for result of
% function evaluation
%
% tau: is the increment in the interval
% defined by the bounding value of t1
%
%
% xe: Query parameter set. We want to
% know the value of the function at
% this point.
%
% Outputs
%
% t: interval where the distribution is
% defined
%
% pxx: probability distribution sought
% sampled over t.
%
N =size(xi,2)
%
x1_min= x(1,1)
x1_max= x(1,2)
%
t_min= t1(1,1);
t_max= t1(1,2);
%
%
nu=size([t_min:tau:t_max],2)-1
t=[t_min:tau:t_max];
%
dmax(1)=max(abs(x(1,1)-xi(1)),abs(x(1,2)-xi(1)));
lambd(1)=log(nu)/dmax(1);
for i=2:N
dmax(i)=max(abs(x(1,1)-xi(i)),abs(x(1,2)-xi(i)));
if dmax(i)>dmax(i-1)
lambd(i)=log(nu)/dmax(i);
else
lambd(i)=lambd(i-1);
end
end
nxe=size(xe,2);
options = optimset('TolFun',1e-3);
for jxx=1:nxe
for i=1:N
d(i)=abs(xe(jxx)-xi(i));
end
for i=1:N
a =pi/tau.^2*exp(1-2*lambd(i)*d(i));
lamn=lambd(i)*d(i);
at2=sum(exp(-a.*t.^2));
at3=sum(t.^2.*exp(-a.*t.^2));
diffe=abs(at2-1/tau*(pi/a).^(1/2))/(1/tau*(pi/a).^(1/2));
if abs(diffe)<1e-10
areal(i) = a;
else
areal(i) = fzero(@(x) fina(x,t,lamn),a,options);
end
b = 1/2-lambd(i)*d(i);
breal(i) = log(1/(sum(exp(-areal(i).*t.^2))));
aa(i)=areal(i);
bb(i)=breal(i);
end
for i=1:N
pro(i,:)=d;
end
prd=cumprod(d,2);
for i=1:N
pro(i,i)=1.0;
end
xpro=cumprod(pro,2);
pioverdi=xpro(:,N);
d11s=sum(pioverdi);
for j=1:nu+1
for i=1:N
pij1(i,j)=exp(-aa(i)*(t(j)-theta(i)).^2 + bb(i));
end
pxx(jxx,j)=1/d11s*(pioverdi'*pij1(:,j));
end
end

To run this function, one needs to run the MMA5.m file. This is a simple 1-D implementation. One can obviously foresee its application to 2-D or n-D functions. But more important, one could use the resident expert knowledge of the engineer and use the algorithm of last week [1] to produce a new metric in the parameter space and then use the EPH. If you are using it for some serious work, you may want to join us in the Robust Mathematical Modeling program. This probabilistic approach to function evaluation can also be used as a storage mechanism. Another issue that most engineers face is that after so many function evaluations, it generally is difficult to summarize all the findings. This approach solves this problem. One can even implement a way for the engineer to add parameters as he/she goes.


Saturday, November 24, 2007

Monday Morning Algorithm Part 4: Improving Embeddings by Flexible Exploitation of Side Information

[ All Monday Morning Algorithms are listed here]
In Machine Learning, one is often faced with using the Euclidian distance as the de-facto way of comparing objects or features. This is like comparing Apples and Oranges, little contextual side information is provided since it really depends on the application. Comparing Apples and Oranges might be Ok if you are caring about eating one fruit a day, but it might be extremely non-productive if you care about how comparing Vitamin C content between fruits without making a difference between the two. One way to do this is to construct a specific metric associated with the end goal you have in mind and create a Mahalanobis distance function. The main issue is how do you impart any type of similarity and dissimilarity between features with any type of prior knowledge of the problem at hand. Ali Ghodsi, Dana Wilkinson and Finnegan Southey have begun answering this question by building a metric based on side information in Improving Embeddings by Flexible Exploitation of Side Information. We are going to feature this algorithm today with Cable Kurwitz (as usual all the good stuff is his and the mistakes are mine). The conclusion of the article reads


Many machine learning techniques handle complex data by mapping it into new spaces and then applying standard techniques, often utilizing distances in the new space. Learning a distance metric directly is an elegant means to this end. This research shows how side information of two forms, class equivalence information and partial distance information, can be used to learn a new distance as a preprocessing step for embedding. The results demonstrate that side information allows us to capture attributes of the data within the embedding in a controllable manner. Even a small amount of side information can improve the embedding’s structure. Furthermore, the method can be kernelized and also used to capture multiple attributes in the same embedding. Its advantages over existing metric learning methods are a simpler optimization formulation that can be readily solved with off-the-shelf software and greater flexibility in the nature of the side information one can use. In all, this technique represents a useful addition to our toolbox for informed embeddings.

In order to implement this algorithm you have to have Michael Grant, Stephen Boyd, and Yinyu Ye's CVX program installed.

clear
% simple example where x1 and x2 are very
% close and x3 and x4 are very far apart
% the dimension space is 5
n =5;
x1=[1 2 3 4 5]';
x2=[0.1 2.1 12 56 100; 7 1.9 98 200 1]';
x3=[1 4 7 8 19]';
x4=[123 43 7.1 29 133]';
% computing elements in the loss function
for i=1:2
xs1(:,i)=x1-x2(:,i);
end
xt1 = zeros(n*n,1);
for i=1:2
xt1= xt1 + vec(xs1(:,i)*xs1(:,i)') ;
end
for i=1:1
xs2=x3-x4
end
for i=1:1
xt2= vec(xs2*xs2') ;
end


cvx_begin
variable A(n,n) symmetric;
minimize(vec( A )' *(xt1-xt2))
subject to
A == semidefinite(n);
trace(A) == 1;
cvx_end
% checking results
% difference between vector supposedly close to each other
for i=1:2
xr1(i)=(x1-x2(:,i))'*A*(x1-x2(:,i))
end
% results is close to zero for both distance(x1 and the
% two vectors x2)
%
% now difference between vectors that are supposedly far
% from each other
for i=1:1
(x3-x4)'*A*(x3-x4)
end
% results are very far for the two vectors x3 and x4 as
% expected.
% Actual results:
% xr1 =
%
% 2.7221e+003
%
%
%xr1 =
%
% 1.0e+003 *
%
% 2.7221 0.0067
%
%
%ans =
%
% 2.8875e+004


It's a nifty little tool. From there you can use you favorite dimensionality reduction technique.


Some other algorithms I will be featuring in the future Monday Morning Algorithm Series include: The Fast Johnson Lindenstrauss Transform, the Experimental Probabilistic Hypersurface, Diffusion methods for dimensionality reduction and others....

Credit Photo: Chinese Space Agency, Xinhua, First photo of the Moon from the Chinese made Chang'e lunar orbiter yesterday.

Monday, November 19, 2007

Monday Morning Algorithm Part 3: Compressed Sensing meets Machine Learning / Recognition via Sparse Representation Classification Algorithm

[Wired readers, you may want to read this summary]
The whole list of Monday Morning Algorithms is listed here with attendant .m files. If you want contribute, like Jort Gemmeke just did, please let me know. For those reading this on Sunday, well, it's Monday somewhere. Now on today's algorithm:

In a previous entry, we discovered that the Machine Learning Community was beginning to use Random projections (the ones that follow the Restricted Isometric Property!) not just to reduce tremendously the dimension of images for pattern recognition. The thinking has gone farther by making the clever argument that when searching for a match between a query and a dictionary composed of training examples, only a few elements will match thus yielding a sparse representation in that dictionary. And so using one of the Compressed Sensing reconstruction technique, one could in effect say that the result of the database query has become a linear programming question (for those of us using L1 reconstruction techniques). John Wright, Arvind Ganesh Balasubramanian, Allen Yang and Yi Ma proposed this in their study on face recognition [1]. The generic approach in Machine Learning has mostly been based on Nearest Neighbor computation.

Intrigued by this approach, Jort Gemmeke decided to contribute today's Monday Morning Algorithm. Jort decided to write Algorithm 1 (Recognition via Sparse Representation) of reference [1]. I mostly contributed in prettying it up and then some (i.e. all the good stuff is his and all the mistakes are mine). Thank you Jort . It uses GPSR ( Mário Figueiredo, Robert D. Nowak and Stephen Wright have made a new version available) but in the comment section you can also uncomment to access L-1 Magic, Sparsify or CVX.

In this example we have a dictionary made up of cosine functions of different frequencies making up ten different classes (i.e. ten different frequencies), the query is close to the third class. In every class there are two cosine functions of the same frequency but with different noise added to them. The query is free of noise. You need to have GPSR in your path to have it run.

clear all;
% Algorithm implementing Recognition via Sparse Representation
% or Algorithm 1 suggested in
% "Feature selection in face recognition: A sparse representation
% perspective" by Allen Y. Yang, John Wright, Yi Ma, and Shankar Sastry
% written by Jort Florent Gemmeke and Igor Carron.

% n total number of images in the training database
n = 20;
% m is dimension of the image/ambient
% space (number of samples in signal )
% (e.g. total "training" set) ..e.g. 101
x=[0:0.01:1];
m = size(x,2);
% d is the number of Compressed Sensing
% measurements, it is also the dimension of the feature
% space.
d = 7;
% k is the number of classification groups and for k subjects
k = 10;
% e.g. every group consists of kn = 20 samples.
kn = n/k;

% Building the training database/dictionary A of n elements
% Within that dictionary there are k classes
% Each of these classes correspond to functions:cos(i x)
% with added noise. i is the differentiator between classes.
% Within each class, any element is different from the other
% by some amount of noise.
% Each element is listed in a row. Each function-element is
% sampled m times.
noise=0.1;
for i=1:k
for j=1:kn
yy = noise*rand(1,m);
A(:,(i-1)*kn+j)=cos(i.*x)+yy;
end
end

% In the following, every index i contains a vector
% of length m with nonzeros corresponding to
% columns in A belonging to class i (i in [1,k])
% Every row holds one element of a class,
% i.e. there are as many rows as elements in this
% dictionary/database.
for i=1:k
onesvecs(i,:)=[zeros(1,(i-1)*kn) ones(1,kn) zeros(1,n-((i-1)* kn + kn))];
end
% This is the element for which we want to
% know to what class it belongs to. Here we give
% the answer for the plot in x_true.
x_true = [zeros(1,4) 1 1 zeros(1,n-6)];
x_true = x_true/norm(x_true,1);
% Now the expression stating that y = cos(3.*x);
y = cos(3.*x)';
% Please note it is different from both entries
% in the dictionary as there is no noise.
% we could have put this as well: y = A * x_true'
% but it would have meant we were using the training
% function as a query test.
%
%
%
% Step 2
% Dimensionality reduction from 101 to 7 features
% using random projection.
R = randn(d,m);
Atilde = R * A;
ytilde = R * y;

% Normalize columns of Atilde and ytilde
for i=1:size(A,2)
Atilde(:,i)=Atilde(:,i)/norm(Atilde(:,i),2);
end
ytilde = ytilde/norm(ytilde);

% Step 3
% call any L1 solver you want. Here we call GPSR
%
% ---- L1 Magic Call
%xp = l1qc_logbarrier(Atilde\ytilde, Atilde, [], ytilde, 0.01, 1e-3);
%
% ---- Sparsify Solver Call
%xp = greed_omp(ytilde,Atilde,n);
%
% ---- CVX Solver Call
%sigma = 0.05;
%cvx_begin
% variable xp(k);
% minimize(norm(xp,1));
% subject to
% norm(Atilde*xp - ytilde) lessthan sigma;
%cvx_end
%
% ---- GPSR Solver call

xp = GPSR_BB(ytilde,Atilde, 0.01*max(abs(Atilde'*ytilde)),'Continuation',1);

% plot results of L1 solution
figure; plot(xp,'v');
title('L1 solution (v), initial signal (o)');
hold; plot(x_true,'o');

%
% Step 4
% Calculate residuals (formula 13)
residual = [];
for i=1:k
% only keeps nonzeros in xp where onesvec is nonzero
deltavec(:,i) = onesvecs(i,:)'.* xp;
% calculate residual on only a subset of Atilde
residual(i) = norm(ytilde-Atilde*deltavec(:,i));
end
figure; bar(residual); % plot residuals
title('The lowest bar indicates the class number to which the element belongs to')


Please note the substantial phase space reduction from 101 down to 7 and still a good classification capability with noise over signal ratio above 10 percent for signals in the library.

[1] Robust Face Recognition via Sparse Representation, John Wright, Arvind Ganesh Balasubramanian, Allen Yang and Yi Ma. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Updated version (November 8, 2007) of the MATLAB code is available here: GPSR_4.0

Photo Credit: Credits: ESA ©2007 MPS for OSIRIS Team MPS/UPD/LAM/IAA/RSSD/INTA/UPM/DASP/IDA, View of Earth by night from the Rosetta probe in the fly by that took place on November 13th. Rosetta is on 10-year journey to 67/P Churyumov-Gerasimenko. And No, Rosetta is not an asteroid.

Monday, November 12, 2007

Monday Morning Algorithm Part deux: Reweighted Lp for Non-convex Compressed Sensing Reconstruction

Today's algorithm implements the algorithm found in Iteratively Reweighted Algorithms for Compressive Sensing by Rick Chartrand and Wotao Yin as mentioned in an earlier post. The cons for the algorithm is that it solves a non-convex problem and therefore there is no provable way of showing you achieved an optimal solution. The pros are:
  • you don't need semi-definite programming
  • you do better than the reweighted L1 technique and
  • you do it faster
  • all operations rely on simple linear algebra which can then be optimized for the problem at hand.
The algorithm should be usable on Matlab and Octave. If you want to contribute a small algorithm next week or any week, please let me know. So here is the beast:

clear
% Algorithm implementing an approach suggested in
% "Iteratively Reweighted Algorithms for Compressive Sensing"
% by Rick Chartrand and Wotao Yin
%
% n is dimension of the ambient space (number of signal samples)
% m is the number of Compressed Sensing measurements
% p is the defining the order of the metric p=1 is l1 norm....
n= 100;
m = 40;
p = 0.9;
% Setting up the problem where one produces
% an input for the problem to be solved
phi = rand(m,n);
u_true = [1 10 120 90 950 4 100 1000 20 30 zeros(1,n-10)]';
u_noise = 10^(-4)* rand(n,1);
u_true = u_true + u_noise;
b = phi * u_true;
%
% solving for phi * u = b , where u is now the
%unknown to be found by the IRlp
%
epsilon = 1
% u_0 is the L_2 solution which would be exact if m = n,
% but here m is less than n
u_0 = phi\b;
u_old = u_0;
j=0
while epsilon > 10^(-8)
j = j + 1
w=(u_old.^(2) + epsilon).^(p/2-1);
v=1./w;
Q_n=diag(v,0);
tu=inv(phi*Q_n*phi');
u_new = Q_n * phi' * tu * b;
if lt(norm(u_new - u_old,2),epsilon^(1/2)/100)
epsilon = epsilon /10;
end
u_old = u_new;
end
% One plot showing the superposition of the least-square
% solution u_0 (symbol v), the
% reweighted Lp solution (symbol *) and the
% true solution (symbol o)
plot(u_0,'v')
title('Least-squares u_0 (v), reweighted Lp (*)...
, initial signal symbol (o)')
hold
plot(u_new,'*')
plot(u_true,'o')


As usual if you find a mistake or something interesting after playing with it, please let me know. Also, I you have a second and half, please answer these two questions, thanks in advance.








Credit Photo: NASA/JPL/Space Science Institute, Ring Tableau, Saturn's Moon: Mimas, Epimetheus, Daphnis

Monday, November 05, 2007

Monday Morning Algorithm Part 1: Fast Low Rank Approximation using Random Projections

I am not sure I will be able to do this often, but it is always interesting to start the week with an interesting algorithm. So I decided to implement one every beginning of the week. Generally, these algorithms are listed in papers and sometimes when one is lucky they are implemented in Matlab, Octave or equivalent. The beauty of these is certainly underscored when they are readily implemented and you can wow yourself by using them right on the spot. I will try to make it so that one can cut and paste the rest of the entry and run it directly on their Matlab or Octave implementation.
I will generally try to focus on small algorithms (not more than 10 important lines) relevant to some of the issues tackled and mentioned in this blog namely Compressed Sensing, randomized algorithms, bayesian computations, dimensionality reduction....
This week, the script I wrote in matlab is relevant to finding a low-rank approximation to a matrix with more rows than columns (overdetermined systems are like that). One could readily use the SVD command in Matlab/Octave but it can take a long time for large matrices. This algorithm uses properties from Random matrices to do the bidding in less time than the traditional SVD based method. I initially found it in [3], the original reference is [1] and it is used in the context of LSI is [2].

clear
% preparing the problem
% trying to find a low approximation to A, an m x n matrix
% where m >= n
m = 1000;
n = 900;
% first let's produce example A
A = rand(m,n);
%
% beginning of the algorithm designed to find alow rank matrix of A
% let us define that rank to be equal to k
k = 50;
% R is an m x l matrix drawn from a N(0,1)
% where l is such that l > c log(n)/ epsilon^2
%
l = 100;
% timing the random algorithm
trand =cputime;
R = randn(m,l);
B = 1/sqrt(l)* R' * A;
[a,s,b]=svd(B);
Ak = A*b(:,1:k)*b(:,1:k)';
trandend = cputime-trand;
% now timing the normal SVD algorithm
tsvd = cputime;
% doing it the normal SVD way
[U,S,V] = svd(A,0);
Aksvd= U(1:m,1:k)*S(1:k,1:k)*V(1:n,1:k)';
tsvdend = cputime -tsvd;
%
%
% relative error between the two computations in percent
rel = norm(Ak-Aksvd)/norm(Aksvd)*100
% gain in time
gain_in_time = tsvdend - trandend
% the random algorithm is faster by
tsvdend/trandend


If you find any mistake, please let me know.

References:
[1] The Random Projection Method, Santosh S. Vempala
[2] Latent Semantic Indexing: A Probabilistic Analysis (1998) Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala
[3] The Random Projection Method; chosen chapters from DIMACS vol.65 by Santosh S. Vempala, presentation by Edo Liberty October 13, 2006
Photo Credit: ESA/ASI/NASA/Univ. of Rome/JPL/Smithsonian, This image shows the topographic divide between the Martian highlands and lowlands. The mysterious deposits of the Medusae Fossae Formation are found in the lowlands along the divide. The radar sounder on ESA's Mars Express orbiter, MARSIS, has revealed echoes from the lowland plains buried by these mysterious deposits.

Printfriendly