Friday, November 30, 2007

Non-Negative Matrix Factorization with Sparsity Constraint for Spectral Unmixing applied to Space Situational Awareness

Ever since the Nature Article of Daniel Lee and Sebastian Seung (Learning the parts of objects by non-negative matrix factorization), there has a been steady progress in implementing ever increasingly sophisticated algorithms performing the Non-Negative Matrix Factorization. Patrik Hoyer came up with the Non-negative Matrix Factorization with sparseness constraints and we now have Shift Invariant Sparse Coding of Image and Music Data by Morten Mørup and Mikkel Schmidt.

Closer to some of the interest expressed in this blog is Space Situational Awareness. Bob Plemmons has used this method to perform the spectral unmixing from data sensed for the Maui Hyperspectral sensors. Much of his work is mentioned in [1] . 

By looking up from the ground, and knowing the particular reflectance as a function of wavelength of particular elements such as aluminum, black paints,... one should be able to identify specific elements orbiting the Earth. It is important because the population of earth orbiting elements is constantly changing and needs to be appraised often. Space debris can have a significant impact on other orbiting spacecrafts and satellites.

Since Bob mentions the Maui shots of the Columbia spacecraft, I wonder if those shots were taken with the same hyperspectral sensor ?

Because in case it is, we can see that most of the Shuttle is white.....

whereas our camera that was on top of the Spacehab module was very shiny (it was behind the white box on the roof).

It would be of interest to see if the method can do a good job at spatial resolution. I can provide more information if needed. From there we could infer if there were missing pieces from the RCC panels as shown from the Southwest Research Institute foam impact experiment.

Another Spectral Unmixing method pushing the sparsity constraint and a new heuristic is introduced by Argyrios Zymnis, Seung-Jean Kim, Joelle Skaf, Mario Parente, and Stephen Boyd in Hyperspectral Image Unmixing via Alternating Projected Subgradients . The abstract reads:

We formulate the problem of hyperspectral image unmixing as a nonconvex optimization problem, similar to nonnegative matrix factorization. We present a heuristic for approximately solving this problem using an alternating projected subgradient approach. Finally, we present the result of applying this method on the 1990 AVIRIS image of Cuprite, Nevada and show that our results are in agreement with similar studies on the same data.

On a related note, ESA makes available imagery from ENVISAT/MERIS to whoever applies for it for research purposes.

[1] Low-Rank Nonnegative Factorizations for Spectral Imaging Applications, Bob Plemmons

Thursday, November 29, 2007

Morphological Diversity using Sparse Constraints

Jerome Bobin just announced the opening of a site featuring most of his work and attendant Matlab code relevant to Morphological Diversity which is introduced as:

Recent advances in harmonic analysis and signal processing have advocated the use of overcomplete signal representations. The ability of such redundant signal dictionary to lead to very sparse representations has already been used successfully in various fields. In this context, morphological diversity has emerged as an effective signal analysis tool. The gist of morphological diversity consists in modeling signals as the linear combination of several so-called morphological components...

Recovering the morphological components from their combination then relies on the incoherence (morphological diversity) of their respective sparse representation (the DCT and Curvelet tight frame). Morphological Component Analysis (MCA) has been devised to solve this harsh recovery problem.

In the general case, a wide range of signal representations can be accounted for such as wavelets, contourlets, bandlets, wave atoms or curvelets.

Morphological diversity and Morphological Component Analysis (MCA) then turns to be a privileged tool for sparse signal representation.

This website has be designed so as to give a wide review of morphological diversity ‘s ability. This particular and effective framework has already been extended and applied to a very large range of applications in signal and image processing from texture/contour separation to source separation or compressed sensing. We really think that morphological diversity has just begun a fruitful and long life in signal processing.

The hyperspectral part of the website mentions the use of Mars Data and the upcoming availability of a code that does Blind Source Separation using Spatial and Spectral Sparsity Constraints. On a related note, ESA now provides access to Earth hyperspectral data from MERIS and Proba.

Credit Photo: ESA/DLR/FU Berlin (G. Neukum), Hyperspectral photo of Phobos, one of Mars' Moon taken by the Mars Express probe three years ago.

Outliers in the Long Tail: The case of Autism.

When reading subjects like this one, I am still amazed that some people can live off blogging and that other people think they can do that too based purely on traffic based ads. It is important to realize that sometimes what is important is quality traffic. This is often quantified in click-through traffic but oftentimes, the important feature of a blog is to bring awareness on a subject that is difficult to communicate on. So the whole issue of power laws on the net as featured in the long tail argument of Chris Anderson does not seem to take into account the quality aspect of it. In other words, while the linking behavior and traffic do follow a power law while applications on Facebook do not, the most important question should be:

Is there a business case for using the web where the amount of traffic is not directly proportional to income or recognition ?

Let me take an example featured in this blog. In some cases, people don't know about a subject or the wording associated with that subject but they are very much willing to know more because it affects them. Case in point Autism (it could be any other subject). I have made different postings on the business case to be made for an eye tracking device to detect or quantify autism ( Part I, Part II, Part III). When I look back at the log of queries leading to this blog (after these entries were posted), there is a fair amount of low level queries with the following keywords:

  • " infant eye tracking",
  • "baby not eye tracking well",
  • "eye tracking autism",
  • "baby's eyes not tracking",
  • "newborn eye tracking",
  • "detecting autism early",
  • "eye tracking kids",
  • "babies with eye tracking problems",
  • "autism and follow eyes",
  • "autism and eye tracking importance",
  • "autism and eye movements"........
I would bet that a fair amount of these queries comes from parents who think that something is amiss with their kids. Linking the eye tracking behavior to Autism has probably given them some headstart with regards to getting a diagnosis. Even though, it may not be a diagnosis of autism, we know from studies that eye tracking is important in the learning process in infants. In a more recent entry, I mentioned the difficulty of getting a diagnosis of Autism using current testing techniques. That post has brought a steady stream of queries along the lines of the following keywords:

  • "adi-r and ados diagnosis",
  • "autism difficulty with standardized testing",
  • "ados failed to detect asd",
  • "age at which autism diagnosed",
  • "pdd-nos stability of diagnosis"
In both cases, the traffic is seldom large but I am sure the entries have had some amount of influence on the readers. [ In a related note, Aleks Jakulin and Masanao Yajima are in the process of writing another installment of the Venn Diagram blog entry that was started on this blog (current update is here). Much interesting information should come out of their work. This is important work to many people: medical doctors, families and researchers. To medical doctors and families, it points to the weakness of the current testing system. To researchers, making sense of this data should allow them to pinpoint areas of the tests that need refinement and those areas that are too coarse. ]

With one out of 166 kids being in the Autistic Spectrum Disorder, it would seem to me there is ample work for people just trying to make sense of new findings (and not delivering "solutions" which is the current market). Doctors or families while being directly affected, sometimes do not have the means to interpret the new findings and need somebody who is specialized in this area. I have had several hits from other countries than the U.S. and I would guess the "market" is much larger than what the traffic shows.

In all, only the web can provide access to less than 1/100 of the population and I would venture that some percentage of that population (my guess is 10 percent) is willing to pay for a service that they cannot get otherwise. Eventually instead of a long tail or power law, we should really think of it in a different manner not unlike that featured by Dan Bricklin

which reminds me of the Plouffe's inverter graphs that shows how digits in numbers follow power laws (Benford's law):

One can clearly see outliers in the tail.

P.S. I am currently not running ads in this blog is very much linked to the fact that the keyword "Autism" features many ads for products for which nobody can guarantee anything.

Saturday, November 24, 2007

Monday Morning Algorithm Part 4: Improving Embeddings by Flexible Exploitation of Side Information

[ All Monday Morning Algorithms are listed here]
In Machine Learning, one is often faced with using the Euclidian distance as the de-facto way of comparing objects or features. This is like comparing Apples and Oranges, little contextual side information is provided since it really depends on the application. Comparing Apples and Oranges might be Ok if you are caring about eating one fruit a day, but it might be extremely non-productive if you care about how comparing Vitamin C content between fruits without making a difference between the two. One way to do this is to construct a specific metric associated with the end goal you have in mind and create a Mahalanobis distance function. The main issue is how do you impart any type of similarity and dissimilarity between features with any type of prior knowledge of the problem at hand. Ali Ghodsi, Dana Wilkinson and Finnegan Southey have begun answering this question by building a metric based on side information in Improving Embeddings by Flexible Exploitation of Side Information. We are going to feature this algorithm today with Cable Kurwitz (as usual all the good stuff is his and the mistakes are mine). The conclusion of the article reads

Many machine learning techniques handle complex data by mapping it into new spaces and then applying standard techniques, often utilizing distances in the new space. Learning a distance metric directly is an elegant means to this end. This research shows how side information of two forms, class equivalence information and partial distance information, can be used to learn a new distance as a preprocessing step for embedding. The results demonstrate that side information allows us to capture attributes of the data within the embedding in a controllable manner. Even a small amount of side information can improve the embedding’s structure. Furthermore, the method can be kernelized and also used to capture multiple attributes in the same embedding. Its advantages over existing metric learning methods are a simpler optimization formulation that can be readily solved with off-the-shelf software and greater flexibility in the nature of the side information one can use. In all, this technique represents a useful addition to our toolbox for informed embeddings.

In order to implement this algorithm you have to have Michael Grant, Stephen Boyd, and Yinyu Ye's CVX program installed.

% simple example where x1 and x2 are very
% close and x3 and x4 are very far apart
% the dimension space is 5
n =5;
x1=[1 2 3 4 5]';
x2=[0.1 2.1 12 56 100; 7 1.9 98 200 1]';
x3=[1 4 7 8 19]';
x4=[123 43 7.1 29 133]';
% computing elements in the loss function
for i=1:2
xt1 = zeros(n*n,1);
for i=1:2
xt1= xt1 + vec(xs1(:,i)*xs1(:,i)') ;
for i=1:1
for i=1:1
xt2= vec(xs2*xs2') ;

variable A(n,n) symmetric;
minimize(vec( A )' *(xt1-xt2))
subject to
A == semidefinite(n);
trace(A) == 1;
% checking results
% difference between vector supposedly close to each other
for i=1:2
% results is close to zero for both distance(x1 and the
% two vectors x2)
% now difference between vectors that are supposedly far
% from each other
for i=1:1
% results are very far for the two vectors x3 and x4 as
% expected.
% Actual results:
% xr1 =
% 2.7221e+003
%xr1 =
% 1.0e+003 *
% 2.7221 0.0067
%ans =
% 2.8875e+004

It's a nifty little tool. From there you can use you favorite dimensionality reduction technique.

Some other algorithms I will be featuring in the future Monday Morning Algorithm Series include: The Fast Johnson Lindenstrauss Transform, the Experimental Probabilistic Hypersurface, Diffusion methods for dimensionality reduction and others....

Credit Photo: Chinese Space Agency, Xinhua, First photo of the Moon from the Chinese made Chang'e lunar orbiter yesterday.

Friday, November 23, 2007

StrangeMaps, Nuclear Power, HASP, ML in Space, GPU with Matlab

Taking a break from Compressed Sensing for a moment, here is a strangely addictive site reviewing different types of maps. It's called Strangemaps. Quite timely my first blog entry was about maps and it was about four years ago. Google Maps did not exist then. Things changed indeed, now environmentalists are cheering for nuclear power this is quite a turn of event for those of us who have been in the nuclear engineering field.
Talking about Mississippi, LSU is accepting application for the next HASP flight and the deadline is December 18th. The big difference between a HASP flight and that of a sounding balloon can be found here:

A sounding balloon project I am following with much interest is project WARPED. They have an excellent technical description of what they are doing.

In other news from the KDnuggets weekly e-mail, some items caught my attention: Maybe the birth of a new kind of Journalism in What Will Journalist-Programmers Do?and there is a CFP on the subject of Machine Learning in Space, (Mar 31)

Via GPGPU, AMD is talking about introducing double precision in frameworks using GPUs (Graphics Cards). On the other hand, the NVIDIA CUDA compiler now has a Matlab plug-in where whenever you use a function in Matlab like FFT, this plug-in takes over the job and send it to the Graphics card (GPU). Maybe we can do faster reconstruction in Compressed Sensing using a Partial Fourier Transform and a GPU. Oops I did it again, I spoke about CS.

Tuesday, November 20, 2007

Compressed Sensing: Sensor Networks, Learning Compressed Sensing by Uncertain Component Analysis, Sparsity or Positivity ?, New CVX build

Two new items in the Rice Repository on Sensor Networks:

Shuchin Aeron, Manqi Zhao, and Venkatesh Saligrama, Sensing capacity of sensor networks: Fundamental tradeoffs of SNR, sparsity, and sensing diversity.

The abstract reads:

A fundamental question in sensor networks is to determine the sensing capacity – the minimum number of sensors necessary to monitor a given region to a desired degree of fidelity based on noisy sensor measurements. In the context of the so called compressed sensing problem sensing capacity provides bounds on the maximum number of signal components that can be identified per sensor under noisy measurements. In this paper we show that sensing capacity is a function of SNR, signal sparsity—the inherent complexity/dimensionality of the underlying signal/information space and its frequency of occurrence and sensing diversity – the number of modalities of operation of each sensor. We derive fundamental tradeoffs between SNR, sparsity, diversity and capacity. We show that the capacity is a monotonic function of SNR and diversity. A surprising result is that as sparsity approaches zero so does the sensing capacity irrespective of diversity. This implies for instance that to reliably monitor a small number of targets in a given region requires disproportionately large number of sensors.

Shuchin Aeron, Manqi Zhao, and Venkatesh Saligrama, On sensing capacity of sensor networks for the class of linear observation, fixed SNR models.
The abstract reads:
In this paper we address the problem of finding the sensing capacity of sensor networks for a class of linear observation models and a fixed SNR regime. Sensing capacity is defined as the maximum number of signal dimensions reliably identified per sensor observation. In this context sparsity of the phenomena is a key feature that determines sensing capacity. Precluding the SNR of the environment the effect of sparsity on the number of measurements required for accurate reconstruction of a sparse phenomena has been widely dealt with under compressed sensing. Nevertheless the development there was motivated from an algorithmic perspective. In this paper our aim is to derive these bounds in an information theoretic set-up and thus provide algorithm independent conditions for reliable reconstruction of sparse signals. In this direction we first generalize the Fano’s inequality and provide lower bounds to the probability of error in reconstruction subject to an arbitrary distortion criteria. Using these lower bounds to the probability of error, we derive upper bounds to sensing capacity and show that for fixed SNR regime sensing capacity goes down to zero as sparsity goes down to zero. This means that disproportionately more sensors are required to monitor very sparse events. We derive lower bounds to sensing capacity (achievable) via deriving upper bounds to the probability of error via adaptation to a max-likelihood detection set-up under a given distortion criteria. These lower bounds to sensing capacity exhibit similar behavior though there is an SNR gap in the upper and lower bounds. Subsequently, we show the effect of correlation in sensing across sensors and across sensing modalities on sensing capacity for various degrees and models of correlation. Our next main contribution is that we show the effect of sensing diversity on sensing capacity, an effect that has not been considered before. Sensing diversity is related to the effective coverage of a sensor with respect to the field. In this direction we show the following results (a) Sensing capacity goes down as sensing diversity per sensor goes down; (b) Random sampling (coverage) of the field by sensors is better than contiguous location sampling (coverage). In essence the bounds and the results presented in this paper serve as guidelines for designing efficient sensor network architectures.

I also found Learning Compressed Sensing by Yair Weiss, Hyun Sung Chang and William Freeman
The abstract reads:

Compressed sensing [7], [6] is a recent set of mathematical results showing that sparse signals can be exactly reconstructed from a small number of linear measurements. Interestingly, for ideal sparse signals with no measurement noise, random measurements allow perfect reconstruction while measurements based on principal component analysis (PCA) or independent component analysis (ICA) do not. At the same time, for other signal and noise distributions, PCA and ICA can significantly outperform random projections in terms of enabling reconstruction from a small number of measurements. In this paper we ask: given a training set typical of the signals we wish to measure, what are the optimal set of linear projections for compressed sensing ? We show that the optimal projections are in general not the principal components nor the independent components of the data, but rather a seemingly novel set of projections that capture what is still uncertain about the signal, given the training set. We also show that the projections onto the learned uncertain components may far outperform random projections. This is particularly true in the case of natural images, where random projections have vanishingly small signal to noise ratio as the number of pixels becomes large.
This is an important paper as it expand ICA/PCA to underdetermined systems through the use of they call Uncertain Component Analysis.

Unrelated to the previous subject, Morten Morup shows in this poster the extension and state of the art technique in Non-Negative Matrix Factorization. The poster (entitled Extensions of non negative matrix factorization NMF to Higher Order Data) shows where the use of the regularizing ell1 norm provides the ability to do Sparse Coding NMF i.e. find the sparsest positive factorization. Ever since the Nature Article of Daniel Lee and Sebastian Seung (Learning the parts of objects by non-negative matrix factorization), much has been done to improve the subject by Non Negative Matrices. And so while the work of Morup is extremely important, it would seem that the positivity can be assured only by an argument of sparsity. This is the intriguing especially in light of this article by Alfred Bruckstein, Michael Elad, and Michael Zibulevsky entitled "A Non-Negative and Sparse Enough Solution of an Underdetermined Linear System of Equations is Unique". The abstract reads:

In this paper we consider an underdetermined linear system of equations Ax = b with non-negative entries of A and b, and the solution x being also required to be non- negative. We show that if there exists a sufficiently sparse solution to this problem, it is necessarily unique. Furthermore, we present a greedy algorithm - a variant of the matching pursuit - that is guaranteed to find this sparse solution. We also extend the existing theoretical analysis of the basis pursuit problem, i.e. min kxk1 s.t. Ax = b, by studying conditions for perfect recovery of sparse enough solutions. Considering a matrix A with arbitrary column norms, and an arbitrary monotone element-wise concave penalty replacing the ell1-norm objective function, we generalize known equivalence results. Beyond the desirable generalization that this result introduces, we show how it is exploited to lead to the above-mentioned uniqueness claim.

In other words, if it is sparse enough, then it is unique and one can drop the L1 penalty constraint as a non negativity constraint is sufficient. Wow. (Thanks Jort for the link.)

In a different area, Michael Grant, Stephen Boyd, and Yinyu Ye released a new build (Version 1.1, Build 565) for CVX ( Matlab Software for Disciplined Convex Programming).

Monday, November 19, 2007

Monday Morning Algorithm Part 3: Compressed Sensing meets Machine Learning / Recognition via Sparse Representation Classification Algorithm

[Wired readers, you may want to read this summary]
The whole list of Monday Morning Algorithms is listed here with attendant .m files. If you want contribute, like Jort Gemmeke just did, please let me know. For those reading this on Sunday, well, it's Monday somewhere. Now on today's algorithm:

In a previous entry, we discovered that the Machine Learning Community was beginning to use Random projections (the ones that follow the Restricted Isometric Property!) not just to reduce tremendously the dimension of images for pattern recognition. The thinking has gone farther by making the clever argument that when searching for a match between a query and a dictionary composed of training examples, only a few elements will match thus yielding a sparse representation in that dictionary. And so using one of the Compressed Sensing reconstruction technique, one could in effect say that the result of the database query has become a linear programming question (for those of us using L1 reconstruction techniques). John Wright, Arvind Ganesh Balasubramanian, Allen Yang and Yi Ma proposed this in their study on face recognition [1]. The generic approach in Machine Learning has mostly been based on Nearest Neighbor computation.

Intrigued by this approach, Jort Gemmeke decided to contribute today's Monday Morning Algorithm. Jort decided to write Algorithm 1 (Recognition via Sparse Representation) of reference [1]. I mostly contributed in prettying it up and then some (i.e. all the good stuff is his and all the mistakes are mine). Thank you Jort . It uses GPSR ( Mário Figueiredo, Robert D. Nowak and Stephen Wright have made a new version available) but in the comment section you can also uncomment to access L-1 Magic, Sparsify or CVX.

In this example we have a dictionary made up of cosine functions of different frequencies making up ten different classes (i.e. ten different frequencies), the query is close to the third class. In every class there are two cosine functions of the same frequency but with different noise added to them. The query is free of noise. You need to have GPSR in your path to have it run.

clear all;
% Algorithm implementing Recognition via Sparse Representation
% or Algorithm 1 suggested in
% "Feature selection in face recognition: A sparse representation
% perspective" by Allen Y. Yang, John Wright, Yi Ma, and Shankar Sastry
% written by Jort Florent Gemmeke and Igor Carron.

% n total number of images in the training database
n = 20;
% m is dimension of the image/ambient
% space (number of samples in signal )
% (e.g. total "training" set) ..e.g. 101
m = size(x,2);
% d is the number of Compressed Sensing
% measurements, it is also the dimension of the feature
% space.
d = 7;
% k is the number of classification groups and for k subjects
k = 10;
% e.g. every group consists of kn = 20 samples.
kn = n/k;

% Building the training database/dictionary A of n elements
% Within that dictionary there are k classes
% Each of these classes correspond to functions:cos(i x)
% with added noise. i is the differentiator between classes.
% Within each class, any element is different from the other
% by some amount of noise.
% Each element is listed in a row. Each function-element is
% sampled m times.
for i=1:k
for j=1:kn
yy = noise*rand(1,m);

% In the following, every index i contains a vector
% of length m with nonzeros corresponding to
% columns in A belonging to class i (i in [1,k])
% Every row holds one element of a class,
% i.e. there are as many rows as elements in this
% dictionary/database.
for i=1:k
onesvecs(i,:)=[zeros(1,(i-1)*kn) ones(1,kn) zeros(1,n-((i-1)* kn + kn))];
% This is the element for which we want to
% know to what class it belongs to. Here we give
% the answer for the plot in x_true.
x_true = [zeros(1,4) 1 1 zeros(1,n-6)];
x_true = x_true/norm(x_true,1);
% Now the expression stating that y = cos(3.*x);
y = cos(3.*x)';
% Please note it is different from both entries
% in the dictionary as there is no noise.
% we could have put this as well: y = A * x_true'
% but it would have meant we were using the training
% function as a query test.
% Step 2
% Dimensionality reduction from 101 to 7 features
% using random projection.
R = randn(d,m);
Atilde = R * A;
ytilde = R * y;

% Normalize columns of Atilde and ytilde
for i=1:size(A,2)
ytilde = ytilde/norm(ytilde);

% Step 3
% call any L1 solver you want. Here we call GPSR
% ---- L1 Magic Call
%xp = l1qc_logbarrier(Atilde\ytilde, Atilde, [], ytilde, 0.01, 1e-3);
% ---- Sparsify Solver Call
%xp = greed_omp(ytilde,Atilde,n);
% ---- CVX Solver Call
%sigma = 0.05;
% variable xp(k);
% minimize(norm(xp,1));
% subject to
% norm(Atilde*xp - ytilde) lessthan sigma;
% ---- GPSR Solver call

xp = GPSR_BB(ytilde,Atilde, 0.01*max(abs(Atilde'*ytilde)),'Continuation',1);

% plot results of L1 solution
figure; plot(xp,'v');
title('L1 solution (v), initial signal (o)');
hold; plot(x_true,'o');

% Step 4
% Calculate residuals (formula 13)
residual = [];
for i=1:k
% only keeps nonzeros in xp where onesvec is nonzero
deltavec(:,i) = onesvecs(i,:)'.* xp;
% calculate residual on only a subset of Atilde
residual(i) = norm(ytilde-Atilde*deltavec(:,i));
figure; bar(residual); % plot residuals
title('The lowest bar indicates the class number to which the element belongs to')

Please note the substantial phase space reduction from 101 down to 7 and still a good classification capability with noise over signal ratio above 10 percent for signals in the library.

[1] Robust Face Recognition via Sparse Representation, John Wright, Arvind Ganesh Balasubramanian, Allen Yang and Yi Ma. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Updated version (November 8, 2007) of the MATLAB code is available here: GPSR_4.0

Photo Credit: Credits: ESA ©2007 MPS for OSIRIS Team MPS/UPD/LAM/IAA/RSSD/INTA/UPM/DASP/IDA, View of Earth by night from the Rosetta probe in the fly by that took place on November 13th. Rosetta is on 10-year journey to 67/P Churyumov-Gerasimenko. And No, Rosetta is not an asteroid.

Thursday, November 15, 2007

Compressed Sensing: SpaRSA and Image Registration using Random Projections

Mário Figueiredo, Robert D. Nowak and Stephen Wright the authors of GPSR are coming up with a faster method for reconstructing solutions from Compressed Sensing measurements. The algorithm is presented in this preprint: Sparse Reconstruction by Separable Approximation the abstract reads:

Finding sparse approximate solutions to large underdetermined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), wavelet-based deconvolution and reconstruction, and compressed sensing (CS) are a few well-known areas in which problems of this type appear. One standard approach is to minimize an objective function that includes a quadratic (ℓ2) error term added to a sparsity-inducing (usually ℓ1) regularizer. We present an algorithmic framework for the more general problem of minimizing the sum of a smooth convex function and a nonsmooth, possibly nonconvex, sparsity-inducing function. We propose iterative methods in which each step is an optimization subproblem involving a separable quadratic term (diagonal Hessian) plus the original sparsity-inducing term. Our approach is suitable for cases in which this subproblem can be solved much more rapidly than the original problem. In addition to solving the standard ℓ2 − ℓ1 case, our approach handles other problems, e.g., ℓp regularizers with p less than 1, or group-separable (GS) regularizers. Experiments with CS problems show that our approach provides state-of-the-art speed for the standard ℓ2 − ℓ1 problem, and is also efficient on problems with GS regularizers

The algorithm seems to promise a faster capability than GPSR_BB. Let us note the potential use of non-convex terms in the regularization term. Thank you Jort.

Another paper caught my attention: Fast Global Image Registration Using Random Projections by Dennis Healy, Jr. and Gustavo Rohd. The abstract reads:

We describe a new method for efficiently computing the global optimum of least squares registration problems based on the recently developed theory of signal processing using random projections. The method is especially attractive for large scale registration problems where the goal is to register many images to a standard template. We test our new algorithm using real images of cells’ nuclei and show our method can outperform more traditional dimension reduction methods such as projections onto lower dimensional B-spline function spaces.

This is a very nice application where image registration uses the work of Mike Wakin and Richard Baraniuk on Random Projection of Smooth manifolds. Nice.

On a totally unrelated note, on Monday, the Monday Morning Algorithm series will feature the algorithm presented here.

Wednesday, November 14, 2007

Compressed Sensing: Another deterministic code, a LASSO-CS root finder and a tree based greedy algorithm

It looks like I overlooked a fifth deterministic code for measurement in Compressed Sensing. It is featured in Annihilating filter-based decoding in the compressed sensing framework by Ali Hormati and Martin Vetterli
the abstract reads:

Recent results in compressed sensing or compressive sampling suggest that a relatively small set of measurements taken as the inner product with universal random measurement vectors can well represent a source that is sparse in some fixed basis. By adapting a deterministic, non-universal and structured sensing device, this paper presents results on using the annihilating filter to decode the information taken in this new compressed sensing environment. The information is the minimum amount of nonadaptive knowledge that makes it possible to go back to the original object. We will show that for a $k$-sparse signal of dimension $n$, the proposed decoder needs $2k$ measurements and its complexity is of $O(k^2)$ whereas for the decoding based on the $\ell_1$ minimization, the number of measurements needs to be of $O(k\log(n))$ and the complexity is of $O(n^3)$. In the case of noisy measurements, we first denoise the signal using an iterative algorithm that finds the closest rank $k$ and Toeplitz matrix to the measurements matrix (in Frobenius norm) before applying the annihilating filter method. Furthermore, for a $k$-sparse vector with known equal coefficients, we propose an algebraic decoder which needs only $k$ measurements for the signal reconstruction. Finally, we provide simulation results that demonstrate the performance of our algorithm.

From the same lab, but the document is private: Efficient and Stable Acoustic Tomography Using Sparse Reconstruction Methods by Ivana Jovanovic, Ali Hormati, Luciano Sbaiz and Martin Vetterli. The abstract reads:
We study an acoustic tomography problem and propose a new inversion technique based on sparsity. Acoustic tomography observes the parameters of the medium that influence the speed of sound propagation. In the human body, the parameters that mostly influence the sound speed are temperature and density, in the ocean - temperature and current, in the atmosphere - temperature and wind. In this study, we focus on estimating temperature in the atmosphere using the information on the average sound speed along the propagation path. The latter is practically obtained from travel time measurements. We propose a reconstruction algorithm that exploits the concept of sparsity. Namely, the temperature is assumed to be a linear combination of some functions (e.g. bases or set of different bases) where many of the coefficients are known to be zero. The goal is to find the non-zero coefficients. To this end, we apply an algorithm based on linear programming that under some constrains finds the solution with minimum l0 norm. This is actually equivalent to the fact that many of the unknown coefficients are zeros. Finally, we perform numerical simulations to assess the effectiveness of our approach. The simulation results confirm the applicability of the method and demonstrate high reconstruction quality and robustness to noise.
I also found the presentation of this paper from Ewout van den Berg and Michael Friedlander entitled " In pursuit of a root". The presentation is much clearer about what they are talking about ( I am a visual person). In essence they try to solve a Basis Pursuit problem by a Lasso and rely on the fact that the correspondence between the two problems is continuous and differentiable. Very nice.

Similarly, this presentation-talk by Minh Do entitled "Practical Compressed Sensing" puts a new light on the possible extension of compressed sensing. Namely the fact that not only most signals are sparse in a good basis of representation but most coefficients of importance lie in a tree across different scales

Minh Do has already published a scheme based on this observation called: Tree-Based Orthogonal Matching Pursuit Algorithm For Signal Reconstruction with Chinh La. I am curious on how the implementation will fare in comparison to other reconstruction techniques.

Photo Credit: JAXA, Japanese probe Kayuga transmitted this HDTV of an Earth rise above the Moon
at 2:52 p.m. on November 7, 2007 (JST).The "Earth-rise" movie is here, and the "Earth-set" movie is here.

Tuesday, November 13, 2007

Sparsity: What is it good for ?

The solution of an underdetermined system of linear equations is generally too large (i.e. it generally is an entire subspace as opposed to a unique solution). When you have too many unknowns and not enough equations, experience tells you to wait for additional equations or constraints to obtain a potentially unique solution. In Compressed Sensing, sparsity of the solution is the element that provides an ability to put additional constraints in order to find a unique solution.

But sparsity can be also used for other purpose as shown by this excellent study by Jerome Bobin, Jean-Luc Starck, Jalal Fadili and Yassir Moudden in Sparsity and Morphological Diversity in Blind Source Separation where their approach "takes advantage of this “morphological diversity” to differentiate between the sources with accuracy". Evidently, the reconstruction techniques parallels the ones found in Compressed Sensing. The abstract reads:

Over the last few years, the development of multi-channel sensors motivated interest in methods for the coherent processing of multivariate data. Some specific issues have already been addressed as testified by the wide literature on the so-called blind source separation (BSS) problem. In this context, as clearly emphasized by previous work, it is fundamental that the sources to be retrieved present some quantitatively measurable diversity. Recently, sparsity and morphological diversity have emerged as a novel and effective source of diversity for BSS. We give here some new and essential insights into the use of sparsity in source separation and we outline the essential role of morphological diversity as being a source of diversity or contrast between the sources. This paper introduces a new BSS method coined Generalized Morphological Component Analysis (GMCA) that takes advantages of both morphological diversity and sparsity, using recent sparse overcomplete or redundant signal representations. GMCA is a fast and efficient blind source separation method. We present arguments and a discussion supporting the convergence of the GMCA algorithm. Numerical results in multivariate image and signal processing are given illustrating the good performance of GMCA and its robustness to noise.

The GMCALab matlab code for the implementation of this algorithm can be found here.

Monday, November 12, 2007

Monday Morning Algorithm Part deux: Reweighted Lp for Non-convex Compressed Sensing Reconstruction

Today's algorithm implements the algorithm found in Iteratively Reweighted Algorithms for Compressive Sensing by Rick Chartrand and Wotao Yin as mentioned in an earlier post. The cons for the algorithm is that it solves a non-convex problem and therefore there is no provable way of showing you achieved an optimal solution. The pros are:
  • you don't need semi-definite programming
  • you do better than the reweighted L1 technique and
  • you do it faster
  • all operations rely on simple linear algebra which can then be optimized for the problem at hand.
The algorithm should be usable on Matlab and Octave. If you want to contribute a small algorithm next week or any week, please let me know. So here is the beast:

% Algorithm implementing an approach suggested in
% "Iteratively Reweighted Algorithms for Compressive Sensing"
% by Rick Chartrand and Wotao Yin
% n is dimension of the ambient space (number of signal samples)
% m is the number of Compressed Sensing measurements
% p is the defining the order of the metric p=1 is l1 norm....
n= 100;
m = 40;
p = 0.9;
% Setting up the problem where one produces
% an input for the problem to be solved
phi = rand(m,n);
u_true = [1 10 120 90 950 4 100 1000 20 30 zeros(1,n-10)]';
u_noise = 10^(-4)* rand(n,1);
u_true = u_true + u_noise;
b = phi * u_true;
% solving for phi * u = b , where u is now the
%unknown to be found by the IRlp
epsilon = 1
% u_0 is the L_2 solution which would be exact if m = n,
% but here m is less than n
u_0 = phi\b;
u_old = u_0;
while epsilon > 10^(-8)
j = j + 1
w=(u_old.^(2) + epsilon).^(p/2-1);
u_new = Q_n * phi' * tu * b;
if lt(norm(u_new - u_old,2),epsilon^(1/2)/100)
epsilon = epsilon /10;
u_old = u_new;
% One plot showing the superposition of the least-square
% solution u_0 (symbol v), the
% reweighted Lp solution (symbol *) and the
% true solution (symbol o)
title('Least-squares u_0 (v), reweighted Lp (*)...
, initial signal symbol (o)')

As usual if you find a mistake or something interesting after playing with it, please let me know. Also, I you have a second and half, please answer these two questions, thanks in advance.

Credit Photo: NASA/JPL/Space Science Institute, Ring Tableau, Saturn's Moon: Mimas, Epimetheus, Daphnis

Sunday, November 11, 2007

Compressed Sensing: Algorithms for Deterministic Compressed Sensing

Prompted by a commenter, I realized that there are now four, [update: strike that, make that five] different ways of building deterministic algorithms for the measurement matrices in Compressed Sensing. Terry Tao had previously introduced the subject. One can also read his take on DeVore's construction. There are currently at least three other lines of research yielding deterministic compressed sensing measurement matrices, that of Mark Iwen as mentioned before and Piotr Indyk as Muthu Muthukrishnan is pointing out in this entry. So to sumarize we have four constructions :

For the last entry, Emmanuel Candès made a presentation at IMA ( Compressive Sampling and Frontiers in Signal Processing ) highlithing the connections of Compressed Sensing with information and coding theory.

Photo Credit: JAXA/NHK, First HDTV view of the moon from the Kayuga probe of the Japanese Space Agency this past week. The whole movie can be seen here.

Saturday, November 10, 2007

Compressed Sensing: Sparse PCA and a Compressed Sensing Search Engine

Following up on their paper entitled Full Regularization Path for Sparse Principal Component Analysis, Alexandre d'Aspremont, Francis Bach, Laurent El Ghaoui have produced a new paper where they describe a greedy algorithm to perform their Sparse PCA analysis (as opposed to their previous approach using semidefinite programming). The new paper is: Optimal Solutions for Sparse Principal Component Analysis
the abstract reads:

Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all target numbers of non zero coefficients, with total complexity O(n^3), where n is the number of variables. We then use the same relaxation to derive sufficient conditions for global optimality of a solution, which can be tested in O(n^3) per pattern. We discuss applications in subset selection and sparse recovery and show on artificial examples and biological data that our algorithm does provide globally optimal solutions in many cases.
The code PathSPCA: A fast greedy algorithm for sparse PCA can be found here.

I have set up a search capability on the right side of this blog that uses the new capability for Google search to search through this page and ALL the sites I am linking to. Since I am making an effort at linking to all the researchers involved in Compressed Sensing, searching though this interface enables searching through most websites relevant to this subject.

Photo Credit: NASA, The Earth and the Moon as seen from Galileo en route to Jupiter.

Wednesday, November 07, 2007

Compressed Sensing: Hardware Implementation, Part IV, Hyperspectral Coded Aperture, Connection between the CS camera and the primary visual cortex ?

Ashwin Wagadarikar mentions a page dedicated to his work on hyperspectral imaging using coded/dispersive apertures and using compressed sensing reconstruction techniques to recover data. It adds to a previous entry on the explosive birth of coded aperture on designs using either a single or dual disperser. The paper on the single disperser is now available on that page or you can directly request a preprint from Ashwin. It is an excellent read and as mentioned before, they used the Gradient Projection for Sparse Reconstruction (GPSR) method to perform the reconstructions.

Marco Duarte
answers by a comment in a previous entry on their multiscale random projection
where I mentioned that there was no description of the regularizing kernels:

"..For our paper on multiscale random projections, our regularization kernel relies on coarse pixelation, since we were applying our technique on the single pixel camera. This is to say that the random measurements are obtained at different resolutions to get the projections of the different regularizations of the target image..."
The ability to command the pixel on the DMD allows for some interesting convolutions. In other words, in order to deal with the non-differentiability of Image Appearance Manifolds, the CS camera has to do smoothing projections at different scales. This process is not unlike the parallel I had made on the current state of the art in compressed sensing and its potential use in the modeling of the primary visual cortex. The thesis of Thomas Serre makes it clear that random projections seem to be in the play (at least in his model that takes into account most of what we know about biological processes). He also shows that those projections are done at different scales as well.

Compressed Sensing: Extending Reed-Solomon codes.

Mehmet Akçakaya and Vahid Tarokh have written a series of papers where they use the theoretical framework of algebraic coding/decoding to do Compressive sampling. The measurement matrices are what they call Vandermonde frames that generalize Reed-Solomon codes using Vandermonde matrices. Reconstruction of the signal can be computed using many well known Reed-Solomon type decoding algorithms. In the noiseless case, they have better recovery results (in regards to sparsity) than previously.

Shannon Theoretic Limits on Noisy Compressive Sampling by Mehmet Akçakaya, Vahid Tarokh. The abstract reads:
In this paper, we study the number of measurements required to recover a sparse signal in ${\mathbb C}^M$ with $L$ non-zero coefficients from compressed samples in the presence of noise. For a number of different recovery criteria, we prove that $O(L)$ (an asymptotically linear multiple of $L$) measurements are necessary and sufficient if $L$ grows linearly as a function of $M$. This improves on the existing literature that is mostly focused on variants of a specific recovery algorithm based on convex programming, for which $O(L\log(M-L))$ measurements are required. We also show that $O(L\log(M-L))$ measurements are required in the sublinear regime ($L = o(M)$).

On Sparsity, Redundancy and Quality of Frame Representations by Mehmet Akçakaya, Vahid Tarokh. The abstract reads:

We consider approximations of signals by the elements of a frame in a complex vector space of dimension N and formulate both the noiseless and the noisy sparse representation problems. The noiseless representation problem is to find sparse representations of a signal r given that such representations exist. In this case, we explicitly construct a frame, referred to as the Vandermonde frame, for which the noiseless sparse representation problem can be solved uniquely using O(N2) operations, as long as the number of non-zero coefficients in the sparse representation of r is N for some 0    0.5, thus improving on a result of Candes and Tao [4]. The noisy sparse representation problem is to find sparse representations of a signal r satisfying a distortion criterion. In this case, we establish a lower bound on the trade-off between the sparsity of the representation, the underlying distortion and the redundancy of any given frame.
A Frame Construction and A Universal Distortion Bound for Sparse Representations by Mehmet Akçakaya, Vahid Tarokh.The abstract reads:
We consider approximations of signals by the elements of a frame in a complex vector space of dimension N and formulate both the noiseless and the noisy sparse representation problems. The noiseless representation problem is to find sparse representations of a signal r given that such representations exist. In this case, we explicitly construct a frame, referred to as the Vandermonde frame, for which the noiseless sparse representation problem can be solved uniquely using O(N^2) operations, as long as the number of non-zero coefficients in the sparse representation of r is N for some 0    0.5. It is known that   0.5 cannot be relaxed without violating uniqueness. The noisy sparse representation problem is to find sparse representations of a signal r satisfying a distortion criterion. In this case, we establish a lower bound on the trade-off between the sparsity of the representation, the underlying distortion and the redundancy of any given frame.

Tuesday, November 06, 2007

Compressed Sensing: stopping criterion, random multiscale projections, manifold learning, sparse image reconstruction

The Rice Compressed Sensing Repository has some new preprints/articles:

Sparse signal reconstruction from noisy compressive measurements using cross validation by Petros Boufounos, Marco Duarte, and Richard Baraniuk.
Compressive sensing is a new data acquisition technique that aims to measure sparse and compressible signals at close to their intrinsic information rate rather than their Nyquist rate. Recent results in compressive sensing show that a sparse or compressible signal can be reconstructed from very few incoherent measurements. Although the sampling and reconstruction process is robust to measurement noise, all current reconstruction methods assume some knowledge of the noise power or the acquired signal to noise ratio. This knowledge is necessary to set algorithmic parameters and stopping conditions. If these parameters are set incorrectly, then the reconstruction algorithms either do not fully reconstruct the acquired signal (underfitting) or try to explain a significant portion of the noise by distorting the reconstructed signal (overfitting). This paper explores this behavior and examines the use of cross validation to determine the stopping conditions for the optimization algorithms. We demonstrate that by designating a small set of measurements as a validation set it is possible to optimize these algorithms and reduce the reconstruction error. Furthermore we explore the trade-off between using the additional measurements for cross validation instead of reconstruction.
This paper sets up a cross validation procedure that tries to remove the heuristics of the residual stopping criterion used in either Basis Pursuit or Greedy algorithms.

Multiscale random projections for compressive classification by Marco Duarte, Mark Davenport, Michael Wakin, Jason Laska, Dharmpal Takhar, Kevin Kelly, and Richard Baraniuk

We propose a framework for exploiting dimension-reducing random projections in detection and classification problems. Our approach is based on the generalized likelihood ratio test;
in the case of image classification, it exploits the fact that a set of images of a fixed scene under varying articulation parameters forms a low-dimensional, nonlinear manifold. Exploiting
recent results showing that random projections stably embed a smooth manifold in a lower-dimensional space, we develop the multiscale smashed filter as a compressive analog of the familiar matched filter classifier. In a practical target classification problem using a single-pixel camera that directly acquires compressive image projections, we achieve high classification rates using many fewer measurements than the dimensionality
of the images.
It looks to me like a very nice outgrowth of the smashed filter introductory paper mentioned here. In this paper though, some thoughts is given to the fact that target identification must be done at different scales because images appearance manifolds are not smooth manifolds as initially shown by Donoho and Grimes. In this paper, they use regularizing kernels of different supports to provide the ability to smooth out image manifolds and then use nearest neighbor approaches to compare unknown objects and pose to libraries of known objects and pose. No description of these kernels is given though.

Random projections for manifold learning [Technical Report] by Chinmay Hegde, Michael Wakin, and Richard Baraniuk.

We propose a novel method for linear dimensionality reduction of manifold modeled data. First, we show that with a small number M of random projections of sample points in RN belonging to an unknown K-dimensional Euclidean manifold, the intrinsic dimension (ID) of the sample set can be estimated to high accuracy. Second, we rigorously prove that using only this set of random projections, we can estimate the structure of the underlying manifold. In both cases, the number of random projections required is linear in K and logarithmic in N, meaning that K <>
This paper tries to answer a central problem is dimensionality reduction. What is the intrinsic dimension of a manifold using random projections ? This is a useful tool.

Multichannel image estimation via simultaneous orthogonal matching pursuit, Ray Maleh and Anna Gilbert,
The abstract reads:
In modern imaging systems, it is possible to collect information about an image on multiple channels. The simplest example is that of a color image which consists of three channels (i.e. red, green, and blue). However, there are more complicated situations such as those that arise in hyperspectral imaging. Furthermore, most of these images are sparse or highly compressible. We need not measure thoroughly on all the channels in order to reconstruct information about the image. As a result, there is a great need for efficient algorithms that can simultaneously process a few measurements on all channels. In this paper, we discuss how the Simultaneous Orthogonal Matching Pursuit (SOMP) algorithm can reconstruct multichannel images from partial Fourier measurements, while providing more robustness to noise than multiple passes of ordinary Orthogonal Matching Pursuit (OMP) on every channel. In addition, we discuss the use of SOMP in extracting edges from images that are sparse in the total variational sense and extend the ideas presented in this paper to outline how sparse-gradient multichannel images can be recovered by this powerful algorithm.

Sparse gradient image reconstruction done faster by Ray Maleh, Anna Gilbert, and Martin Strauss, The abstract reads:
In a wide variety of imaging applications (especially medical imaging), we obtain a partial set or subset of the Fourier transform of an image. From these Fourier measurements, we want to reconstruct the entire original image. Convex optimization is a powerful, recent solution to this problem. Unfortunately, convex optimization in its myriad of implementations is computationally expensive and may be impractical for large images or for multiple images. Furthermore, some of these techniques assume that the image has a sparse gradient (i.e., that the gradient of the image consists of a few nonzero pixel values) or that the gradient is highly compressible. In this paper, we demonstrate that we can recover such images with GRADIENTOMP, an efficient algorithm based upon Orthogonal Matching Pursuit (OMP), more effectively than with convex optimization. We compare both the qualitative and quantitative performance of this algorithm to the optimization techniques.

Again convexity is a good way of doing business because there are currently mathematically provable bounds. But it also looks like there is some heuristics showing that a non-convex problem can provide faster solution. I am going to have to come back to that at some point because it looks like there are many methods that seem to trade convexity for faster, non-convex and heuristically optimal solution techniques.

Compressed sensing image reconstruction via recursive spatially adaptive filtering by
Karen Egiazarian, Alessandro Foi, and Vladimir Katkovnik, the abstract reads:
We introduce a new approach to image reconstruction from highly incomplete data. The available data are assumed to be a small collection of spectral coefficients of an arbitrary linear transform. This reconstruction problem is the subject of intensive study in the recent field of “compressed sensing” (also known as “compressive sampling”). Our approach is based on a quite specific recursive filtering procedure. At every iteration the algorithm is excited by injection of random noise in the unobserved portion of the spectrum and a spatially adaptive image denoising Þlter, working in the image domain, is exploited to attenuate the noise and reveal new features and details out of the incomplete and degraded observations. This recursive algorithm can be interpreted as a special type of the Robbins-Monro stochastic approximation procedure with regularization enabled by a spatially adaptive filter. Overall, we replace the conventional parametric modeling used in CS by a nonparametric one. We illustrate the effectiveness of the proposed approach for two important inverse problems from computerized tomography: Radon inversion from sparse projections and limited-angle tomography. In particular we show that the algorithm allows to achieve exact reconstruction of synthetic phantom data even from a very small number projections. The accuracy of our reconstruction is in line with the best results in the compressed sensing field.
In light of this entry, two other extended previews grabbed my interest:
Not found on the Rice page yet are the following:

Sparsity in Time–Frequency Representations , by Gotz Pfander and Holger Rauhut, the abstract reads:

We consider signals and operators in finite dimension which have sparse time-frequency representations. As main result we show that an S-sparse Gabor representation in Cn with respect to a random unimodular window can be recovered by Basis Pursuit with high probability provided that S  Cn/ log(n). Our results are applicable to the channel estimation problem in wireless communications and they establish the usefulness of a class of measurement matrices for compressive sensing.
Note on sparsity in signal recovery and in matrix identification by Gotz Pfander, the abstract reads:

We describe a connection between the identification problem for matrices with sparse representations in given matrix dictionaries and the problem of sparse signal recovery. This allows the application of novel compressed sensing techniques to operator identification problems such as the channel measurement problem in communications engineering.

Credit Photo: NASA, the International Space Station as seen from the Shuttle (STS-120) shortly after 5:00 AM (CST) yesterday.