
Page Views on Nuit Blanche since July 2010
Nuit Blanche community
@NuitBlog || Facebook || Reddit
Compressive Sensing on LinkedIn
Advanced Matrix Factorization on Linkedin ||
Sunday, December 02, 2007
Monday Morning Algorithm part 5: The 1-D Experimental Probabilistic Hypersurface

Compressed Sensing: Restricted Isometry Properties and Nonconvex Compressive Sensing
In previous work, numerical experiments showed that lp minimization with p above 0 and p less than 1 recovers sparse signals from fewer linear measurements than does l1 minimization. It was also shown that a weaker restricted isometry property is sufficient to guarantee perfect recovery in the lp case. In this work, we generalize this result to an lp variant of the restricted isometry property, and then determine how many random, Gaussian measurements are sufficient for the condition to hold with high probability. The resulting sufficient condition is met by fewer measurements for smaller p.
Reference:
[1] Pisier Gilles, The volume of convex bodies and Banach space geometry.
Friday, November 30, 2007
Non-Negative Matrix Factorization with Sparsity Constraint for Spectral Unmixing applied to Space Situational Awareness


We formulate the problem of hyperspectral image unmixing as a nonconvex optimization problem, similar to nonnegative matrix factorization. We present a heuristic for approximately solving this problem using an alternating projected subgradient approach. Finally, we present the result of applying this method on the 1990 AVIRIS image of Cuprite, Nevada and show that our results are in agreement with similar studies on the same data.
Reference:
[1] Low-Rank Nonnegative Factorizations for Spectral Imaging Applications, Bob Plemmons
Thursday, November 29, 2007
Morphological Diversity using Sparse Constraints

Recent advances in harmonic analysis and signal processing have advocated the use of overcomplete signal representations. The ability of such redundant signal dictionary to lead to very sparse representations has already been used successfully in various fields. In this context, morphological diversity has emerged as an effective signal analysis tool. The gist of morphological diversity consists in modeling signals as the linear combination of several so-called morphological components...Recovering the morphological components from their combination then relies on the incoherence (morphological diversity) of their respective sparse representation (the DCT and Curvelet tight frame). Morphological Component Analysis (MCA) has been devised to solve this harsh recovery problem.
In the general case, a wide range of signal representations can be accounted for such as wavelets, contourlets, bandlets, wave atoms or curvelets.
Morphological diversity and Morphological Component Analysis (MCA) then turns to be a privileged tool for sparse signal representation.
This website has be designed so as to give a wide review of morphological diversity ‘s ability. This particular and effective framework has already been extended and applied to a very large range of applications in signal and image processing from texture/contour separation to source separation or compressed sensing. We really think that morphological diversity has just begun a fruitful and long life in signal processing.
The hyperspectral part of the website mentions the use of Mars Data and the upcoming availability of a code that does Blind Source Separation using Spatial and Spectral Sparsity Constraints. On a related note, ESA now provides access to Earth hyperspectral data from MERIS and Proba.
Credit Photo: ESA/DLR/FU Berlin (G. Neukum), Hyperspectral photo of Phobos, one of Mars' Moon taken by the Mars Express probe three years ago.
Outliers in the Long Tail: The case of Autism.
Is there a business case for using the web where the amount of traffic is not directly proportional to income or recognition ?
Let me take an example featured in this blog. In some cases, people don't know about a subject or the wording associated with that subject but they are very much willing to know more because it affects them. Case in point Autism (it could be any other subject). I have made different postings on the business case to be made for an eye tracking device to detect or quantify autism ( Part I, Part II, Part III). When I look back at the log of queries leading to this blog (after these entries were posted), there is a fair amount of low level queries with the following keywords:
- " infant eye tracking",
- "baby not eye tracking well",
- "eye tracking autism",
- "baby's eyes not tracking",
- "newborn eye tracking",
- "detecting autism early",
- "eye tracking kids",
- "babies with eye tracking problems",
- "autism and follow eyes",
- "autism and eye tracking importance",
- "autism and eye movements"........
- "adi-r and ados diagnosis",
- "autism difficulty with standardized testing",
- "ados failed to detect asd",
- "age at which autism diagnosed",
- "pdd-nos stability of diagnosis"
With one out of 166 kids being in the Autistic Spectrum Disorder, it would seem to me there is ample work for people just trying to make sense of new findings (and not delivering "solutions" which is the current market). Doctors or families while being directly affected, sometimes do not have the means to interpret the new findings and need somebody who is specialized in this area. I have had several hits from other countries than the U.S. and I would guess the "market" is much larger than what the traffic shows.
In all, only the web can provide access to less than 1/100 of the population and I would venture that some percentage of that population (my guess is 10 percent) is willing to pay for a service that they cannot get otherwise. Eventually instead of a long tail or power law, we should really think of it in a different manner not unlike that featured by Dan Bricklin

which reminds me of the Plouffe's inverter graphs that shows how digits in numbers follow power laws (Benford's law):

P.S. I am currently not running ads in this blog is very much linked to the fact that the keyword "Autism" features many ads for products for which nobody can guarantee anything.
Saturday, November 24, 2007
Monday Morning Algorithm Part 4: Improving Embeddings by Flexible Exploitation of Side Information

In Machine Learning, one is often faced with using the Euclidian distance as the de-facto way of comparing objects or features. This is like comparing Apples and Oranges, little contextual side information is provided since it really depends on the application. Comparing Apples and Oranges might be Ok if you are caring about eating one fruit a day, but it might be extremely non-productive if you care about how comparing Vitamin C content between fruits without making a difference between the two. One way to do this is to construct a specific metric associated with the end goal you have in mind and create a Mahalanobis distance function. The main issue is how do you impart any type of similarity and dissimilarity between features with any type of prior knowledge of the problem at hand. Ali Ghodsi, Dana Wilkinson and Finnegan Southey have begun answering this question by building a metric based on side information in Improving Embeddings by Flexible Exploitation of Side Information. We are going to feature this algorithm today with Cable Kurwitz (as usual all the good stuff is his and the mistakes are mine). The conclusion of the article reads
Many machine learning techniques handle complex data by mapping it into new spaces and then applying standard techniques, often utilizing distances in the new space. Learning a distance metric directly is an elegant means to this end. This research shows how side information of two forms, class equivalence information and partial distance information, can be used to learn a new distance as a preprocessing step for embedding. The results demonstrate that side information allows us to capture attributes of the data within the embedding in a controllable manner. Even a small amount of side information can improve the embedding’s structure. Furthermore, the method can be kernelized and also used to capture multiple attributes in the same embedding. Its advantages over existing metric learning methods are a simpler optimization formulation that can be readily solved with off-the-shelf software and greater flexibility in the nature of the side information one can use. In all, this technique represents a useful addition to our toolbox for informed embeddings.
In order to implement this algorithm you have to have Michael Grant, Stephen Boyd, and Yinyu Ye's CVX program installed.
clear
% simple example where x1 and x2 are very
% close and x3 and x4 are very far apart
% the dimension space is 5
n =5;
x1=[1 2 3 4 5]';
x2=[0.1 2.1 12 56 100; 7 1.9 98 200 1]';
x3=[1 4 7 8 19]';
x4=[123 43 7.1 29 133]';
% computing elements in the loss function
for i=1:2
xs1(:,i)=x1-x2(:,i);
end
xt1 = zeros(n*n,1);
for i=1:2
xt1= xt1 + vec(xs1(:,i)*xs1(:,i)') ;
end
for i=1:1
xs2=x3-x4
end
for i=1:1
xt2= vec(xs2*xs2') ;
end
cvx_begin
variable A(n,n) symmetric;
minimize(vec( A )' *(xt1-xt2))
subject to
A == semidefinite(n);
trace(A) == 1;
cvx_end
% checking results
% difference between vector supposedly close to each other
for i=1:2
xr1(i)=(x1-x2(:,i))'*A*(x1-x2(:,i))
end
% results is close to zero for both distance(x1 and the
% two vectors x2)
%
% now difference between vectors that are supposedly far
% from each other
for i=1:1
(x3-x4)'*A*(x3-x4)
end
% results are very far for the two vectors x3 and x4 as
% expected.
% Actual results:
% xr1 =
%
% 2.7221e+003
%
%
%xr1 =
%
% 1.0e+003 *
%
% 2.7221 0.0067
%
%
%ans =
%
% 2.8875e+004
It's a nifty little tool. From there you can use you favorite dimensionality reduction technique.
Some other algorithms I will be featuring in the future Monday Morning Algorithm Series include: The Fast Johnson Lindenstrauss Transform, the Experimental Probabilistic Hypersurface, Diffusion methods for dimensionality reduction and others....
Credit Photo: Chinese Space Agency, Xinhua, First photo of the Moon from the Chinese made Chang'e lunar orbiter yesterday.
Friday, November 23, 2007
StrangeMaps, Nuclear Power, HASP, ML in Space, GPU with Matlab

In other news from the KDnuggets weekly e-mail, some items caught my attention: Maybe the birth of a new kind of Journalism in What Will Journalist-Programmers Do?and there is a CFP on the subject of Machine Learning in Space, (Mar 31)
Via GPGPU, AMD is talking about introducing double precision in frameworks using GPUs (Graphics Cards). On the other hand, the NVIDIA CUDA compiler now has a Matlab plug-in where whenever you use a function in Matlab like FFT, this plug-in takes over the job and send it to the Graphics card (GPU). Maybe we can do faster reconstruction in Compressed Sensing using a Partial Fourier Transform and a GPU. Oops I did it again, I spoke about CS.
Tuesday, November 20, 2007
Compressed Sensing: Sensor Networks, Learning Compressed Sensing by Uncertain Component Analysis, Sparsity or Positivity ?, New CVX build
Shuchin Aeron, Manqi Zhao, and Venkatesh Saligrama, Sensing capacity of sensor networks: Fundamental tradeoffs of SNR, sparsity, and sensing diversity.
The abstract reads:
A fundamental question in sensor networks is to determine the sensing capacity – the minimum number of sensors necessary to monitor a given region to a desired degree of fidelity based on noisy sensor measurements. In the context of the so called compressed sensing problem sensing capacity provides bounds on the maximum number of signal components that can be identified per sensor under noisy measurements. In this paper we show that sensing capacity is a function of SNR, signal sparsity—the inherent complexity/dimensionality of the underlying signal/information space and its frequency of occurrence and sensing diversity – the number of modalities of operation of each sensor. We derive fundamental tradeoffs between SNR, sparsity, diversity and capacity. We show that the capacity is a monotonic function of SNR and diversity. A surprising result is that as sparsity approaches zero so does the sensing capacity irrespective of diversity. This implies for instance that to reliably monitor a small number of targets in a given region requires disproportionately large number of sensors.
Shuchin Aeron, Manqi Zhao, and Venkatesh Saligrama, On sensing capacity of sensor networks for the class of linear observation, fixed SNR models.
The abstract reads:
In this paper we address the problem of finding the sensing capacity of sensor networks for a class of linear observation models and a fixed SNR regime. Sensing capacity is defined as the maximum number of signal dimensions reliably identified per sensor observation. In this context sparsity of the phenomena is a key feature that determines sensing capacity. Precluding the SNR of the environment the effect of sparsity on the number of measurements required for accurate reconstruction of a sparse phenomena has been widely dealt with under compressed sensing. Nevertheless the development there was motivated from an algorithmic perspective. In this paper our aim is to derive these bounds in an information theoretic set-up and thus provide algorithm independent conditions for reliable reconstruction of sparse signals. In this direction we first generalize the Fano’s inequality and provide lower bounds to the probability of error in reconstruction subject to an arbitrary distortion criteria. Using these lower bounds to the probability of error, we derive upper bounds to sensing capacity and show that for fixed SNR regime sensing capacity goes down to zero as sparsity goes down to zero. This means that disproportionately more sensors are required to monitor very sparse events. We derive lower bounds to sensing capacity (achievable) via deriving upper bounds to the probability of error via adaptation to a max-likelihood detection set-up under a given distortion criteria. These lower bounds to sensing capacity exhibit similar behavior though there is an SNR gap in the upper and lower bounds. Subsequently, we show the effect of correlation in sensing across sensors and across sensing modalities on sensing capacity for various degrees and models of correlation. Our next main contribution is that we show the effect of sensing diversity on sensing capacity, an effect that has not been considered before. Sensing diversity is related to the effective coverage of a sensor with respect to the field. In this direction we show the following results (a) Sensing capacity goes down as sensing diversity per sensor goes down; (b) Random sampling (coverage) of the field by sensors is better than contiguous location sampling (coverage). In essence the bounds and the results presented in this paper serve as guidelines for designing efficient sensor network architectures.
I also found Learning Compressed Sensing by Yair Weiss, Hyun Sung Chang and William Freeman
The abstract reads:
Compressed sensing [7], [6] is a recent set of mathematical results showing that sparse signals can be exactly reconstructed from a small number of linear measurements. Interestingly, for ideal sparse signals with no measurement noise, random measurements allow perfect reconstruction while measurements based on principal component analysis (PCA) or independent component analysis (ICA) do not. At the same time, for other signal and noise distributions, PCA and ICA can significantly outperform random projections in terms of enabling reconstruction from a small number of measurements. In this paper we ask: given a training set typical of the signals we wish to measure, what are the optimal set of linear projections for compressed sensing ? We show that the optimal projections are in general not the principal components nor the independent components of the data, but rather a seemingly novel set of projections that capture what is still uncertain about the signal, given the training set. We also show that the projections onto the learned uncertain components may far outperform random projections. This is particularly true in the case of natural images, where random projections have vanishingly small signal to noise ratio as the number of pixels becomes large.
Unrelated to the previous subject, Morten Morup shows in this poster the extension and state of the art technique in Non-Negative Matrix Factorization. The poster (entitled Extensions of non negative matrix factorization NMF to Higher Order Data) shows where the use of the regularizing ell1 norm provides the ability to do Sparse Coding NMF i.e. find the sparsest positive factorization. Ever since the Nature Article of Daniel Lee and Sebastian Seung (Learning the parts of objects by non-negative matrix factorization), much has been done to improve the subject by Non Negative Matrices. And so while the work of Morup is extremely important, it would seem that the positivity can be assured only by an argument of sparsity. This is the intriguing especially in light of this article by Alfred Bruckstein, Michael Elad, and Michael Zibulevsky entitled "A Non-Negative and Sparse Enough Solution of an Underdetermined Linear System of Equations is Unique". The abstract reads:
In this paper we consider an underdetermined linear system of equations Ax = b with non-negative entries of A and b, and the solution x being also required to be non- negative. We show that if there exists a sufficiently sparse solution to this problem, it is necessarily unique. Furthermore, we present a greedy algorithm - a variant of the matching pursuit - that is guaranteed to find this sparse solution. We also extend the existing theoretical analysis of the basis pursuit problem, i.e. min kxk1 s.t. Ax = b, by studying conditions for perfect recovery of sparse enough solutions. Considering a matrix A with arbitrary column norms, and an arbitrary monotone element-wise concave penalty replacing the ell1-norm objective function, we generalize known equivalence results. Beyond the desirable generalization that this result introduces, we show how it is exploited to lead to the above-mentioned uniqueness claim.
In other words, if it is sparse enough, then it is unique and one can drop the L1 penalty constraint as a non negativity constraint is sufficient. Wow. (Thanks Jort for the link.)
In a different area, Michael Grant, Stephen Boyd, and Yinyu Ye released a new build (Version 1.1, Build 565) for CVX ( Matlab Software for Disciplined Convex Programming).
Monday, November 19, 2007
Monday Morning Algorithm Part 3: Compressed Sensing meets Machine Learning / Recognition via Sparse Representation Classification Algorithm

The whole list of Monday Morning Algorithms is listed here with attendant .m files. If you want contribute, like Jort Gemmeke just did, please let me know. For those reading this on Sunday, well, it's Monday somewhere. Now on today's algorithm:
In a previous entry, we discovered that the Machine Learning Community was beginning to use Random projections (the ones that follow the Restricted Isometric Property!) not just to reduce tremendously the dimension of images for pattern recognition. The thinking has gone farther by making the clever argument that when searching for a match between a query and a dictionary composed of training examples, only a few elements will match thus yielding a sparse representation in that dictionary. And so using one of the Compressed Sensing reconstruction technique, one could in effect say that the result of the database query has become a linear programming question (for those of us using L1 reconstruction techniques). John Wright, Arvind Ganesh Balasubramanian, Allen Yang and Yi Ma proposed this in their study on face recognition [1]. The generic approach in Machine Learning has mostly been based on Nearest Neighbor computation.
Intrigued by this approach, Jort Gemmeke decided to contribute today's Monday Morning Algorithm. Jort decided to write Algorithm 1 (Recognition via Sparse Representation) of reference [1]. I mostly contributed in prettying it up and then some (i.e. all the good stuff is his and all the mistakes are mine). Thank you Jort . It uses GPSR ( Mário Figueiredo, Robert D. Nowak and Stephen Wright have made a new version available) but in the comment section you can also uncomment to access L-1 Magic, Sparsify or CVX.
In this example we have a dictionary made up of cosine functions of different frequencies making up ten different classes (i.e. ten different frequencies), the query is close to the third class. In every class there are two cosine functions of the same frequency but with different noise added to them. The query is free of noise. You need to have GPSR in your path to have it run.
clear all;
% Algorithm implementing Recognition via Sparse Representation
% or Algorithm 1 suggested in
% "Feature selection in face recognition: A sparse representation
% perspective" by Allen Y. Yang, John Wright, Yi Ma, and Shankar Sastry
% written by Jort Florent Gemmeke and Igor Carron.
% n total number of images in the training database
n = 20;
% m is dimension of the image/ambient
% space (number of samples in signal )
% (e.g. total "training" set) ..e.g. 101
x=[0:0.01:1];
m = size(x,2);
% d is the number of Compressed Sensing
% measurements, it is also the dimension of the feature
% space.
d = 7;
% k is the number of classification groups and for k subjects
k = 10;
% e.g. every group consists of kn = 20 samples.
kn = n/k;
% Building the training database/dictionary A of n elements
% Within that dictionary there are k classes
% Each of these classes correspond to functions:cos(i x)
% with added noise. i is the differentiator between classes.
% Within each class, any element is different from the other
% by some amount of noise.
% Each element is listed in a row. Each function-element is
% sampled m times.
noise=0.1;
for i=1:k
for j=1:kn
yy = noise*rand(1,m);
A(:,(i-1)*kn+j)=cos(i.*x)+yy;
end
end
% In the following, every index i contains a vector
% of length m with nonzeros corresponding to
% columns in A belonging to class i (i in [1,k])
% Every row holds one element of a class,
% i.e. there are as many rows as elements in this
% dictionary/database.
for i=1:k
onesvecs(i,:)=[zeros(1,(i-1)*kn) ones(1,kn) zeros(1,n-((i-1)* kn + kn))];
end
% This is the element for which we want to
% know to what class it belongs to. Here we give
% the answer for the plot in x_true.
x_true = [zeros(1,4) 1 1 zeros(1,n-6)];
x_true = x_true/norm(x_true,1);
% Now the expression stating that y = cos(3.*x);
y = cos(3.*x)';
% Please note it is different from both entries
% in the dictionary as there is no noise.
% we could have put this as well: y = A * x_true'
% but it would have meant we were using the training
% function as a query test.
%
%
%
% Step 2
% Dimensionality reduction from 101 to 7 features
% using random projection.
R = randn(d,m);
Atilde = R * A;
ytilde = R * y;
% Normalize columns of Atilde and ytilde
for i=1:size(A,2)
Atilde(:,i)=Atilde(:,i)/norm(Atilde(:,i),2);
end
ytilde = ytilde/norm(ytilde);
% Step 3
% call any L1 solver you want. Here we call GPSR
%
% ---- L1 Magic Call
%xp = l1qc_logbarrier(Atilde\ytilde, Atilde, [], ytilde, 0.01, 1e-3);
%
% ---- Sparsify Solver Call
%xp = greed_omp(ytilde,Atilde,n);
%
% ---- CVX Solver Call
%sigma = 0.05;
%cvx_begin
% variable xp(k);
% minimize(norm(xp,1));
% subject to
% norm(Atilde*xp - ytilde) lessthan sigma;
%cvx_end
%
% ---- GPSR Solver call
xp = GPSR_BB(ytilde,Atilde, 0.01*max(abs(Atilde'*ytilde)),'Continuation',1);
% plot results of L1 solution
figure; plot(xp,'v');
title('L1 solution (v), initial signal (o)');
hold; plot(x_true,'o');
%
% Step 4
% Calculate residuals (formula 13)
residual = [];
for i=1:k
% only keeps nonzeros in xp where onesvec is nonzero
deltavec(:,i) = onesvecs(i,:)'.* xp;
% calculate residual on only a subset of Atilde
residual(i) = norm(ytilde-Atilde*deltavec(:,i));
end
figure; bar(residual); % plot residuals
title('The lowest bar indicates the class number to which the element belongs to')
Please note the substantial phase space reduction from 101 down to 7 and still a good classification capability with noise over signal ratio above 10 percent for signals in the library.
[1] Robust Face Recognition via Sparse Representation, John Wright, Arvind Ganesh Balasubramanian, Allen Yang and Yi Ma. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Updated version (
Photo Credit: Credits: ESA ©2007 MPS for OSIRIS Team MPS/UPD/LAM/IAA/RSSD/INTA/UPM/DASP/IDA