Wednesday, April 30, 2008

CS: A code, linearized Bregman iteration,Norms of random submatrices, Fast Fourier Sampling, Performance of l0 approximation, Radar Reconstruction

I mentioned earlier the article by G. Hosein Mohimani, Massoud Babaie-Zadeh and Christian Jutten entitled Fast Sparse Representation based on Smoothed l0 norm. G. Hosein Mohimani forwarded me the code implementing the algorithm. It is also available here.

Another reconstruction code can be found in this new paper entitled: Linearized Bregman Iterations for Compressed Sensing by Jian-Feng Cai, Stanley Osher and Zuowei Shen. The abstract reads;
Finding a solution of a linear equation Au = f with various minimization properties
arises from many applications. One of such applications is compressed sensing, where an efficient and robust-to-noise algorithm to find a minimal l1 norm solution is needed. This means that the algorithm should be tailored for large scale and completely dense matrices A, while Au and AT u can be computed by fast transforms and the solution to seek is sparse. Recently, a simple and fast algorithm based on linearized Bregman iteration was proposed in [26, 30] for this purpose. This paper is to analyze the convergence of linearized Bregman iterations and the minimization properties of their limit. Based on our analysis here, we derive also a new algorithm that is proven to be convergent with a rate. Furthermore, the new algorithm is as simple and fast as the algorithm given in [26, 30] in approximating a minimal l1 norm solution of Au = f as shown by numerical simulations. Hence, it can be used as another choice of an efficient tool in compressed sensing.
Of related interest to the random matrices results: Norms of random submatrices and sparse approximation by Joel Tropp. The abstract reads:

Many problems in the theory of sparse approximation require bounds on operator norms of a random submatrix drawn from a fixed matrix. The purpose of this note is to collect estimates for several different norms that are most important in the analysis of l1 minimization algorithms. Several of these bounds have not appeared in detail.

And On the Conditioning of Random Subdictionaries by Joel Tropp. The abstract reads:
An important problem in the theory of sparse approximation is to identify well-
conditioned subsets of vectors from a general dictionary. In most cases, current results do not apply unless the number of vectors is smaller than the square root of the ambient dimension, so these bounds are too weak for many applications. This paper shatters the square-root bottleneck by focusing on random subdictionaries instead of arbitrary subdictionaries. It provides explicit bounds on the extreme singular values of random subdictionaries that hold with overwhelming probability. The results are phrased in terms of the coherence and spectral norm of the dictionary, which capture information about its global geometry. The proofs rely on standard tools from the area of Banach space probability. As an application, the paper shows that the conditioning of a subdictionary is the major obstacle to the uniqueness of sparse representations and the success of l1 minimization techniques for signal recovery. Indeed, if a fixed subdictionary is well conditioned and its cardinality is slightly smaller than the ambient dimension, then a random signal formed from this subdictionary almost surely has no other representation that is equally sparse. Moreover, with overwhelming probability, the maximally sparse representation can be identifid via l1 minimization. Note that the results in this paper are not directly comparable with recent work on subdictionaries of random dictionaries.

On a different note, one can use the knowledge of the sparsity of the result of a transform to produce a faster algorithm for that transform. Anna Gilbert, Martin Strauss, and Joel Tropp present A Tutorial on Fast Fourier Sampling. The abstract reads:

Having obtained all N Fourier coefficients, it is straightforward to locate the m nonzero frequencies and their coefficients. Although you can compute the DFT efficiently by means of the fast Fourier transform (FFT), the fact remains that you must compute a very large number of zero coefficients when the signal involves few frequencies. This approach seems rather inefficient. The discrete uncertainty principle [8] suggests that it might be possible to use fewer samples from the signal. Indeed, if the spectrum of a length-N discrete-time signal contains only m nonzero frequencies, then the time domain has at least N/m nonzero positions. As a result, even if we sample the signal at relatively few points in time, the samples should carry significant information about the spectrum of the signal. This article describes a computational method, called the Fourier sampling algorithm, that exploits this insight [10]. The algorithm takes a small number of (correlated) random samples from a signal and processes them efficiently to produce an approximation of the DFT of the signal. The algorithm offers provable guarantees on the number of samples, the running time, and the amount of storage. As we will see, these requirements are exponentially better than the FFT for some cases of interest. This article describes in detail how to implement a version of Fourier sampling, it presents some evidence of its empirical performance, and it explains the theoretical ideas that underlie the analysis. Our hope is that this tutorial will allow engineers to apply Fourier sampling to their own problems. We also hope that it will stimulate further research on practical implementations and extensions of the algorithm.

This tutorial makes use of the results in A Deterministic Sub-linear Time Sparse Fourier Algorithm via Non-adaptive Compressed Sensing Methods by Mark Iwen. The abstract reads:

We study the problem of estimating the best B term Fourier representation for a given frequency-sparse signal (i.e., vector) A of length N much superior to B. More precisely, we investigate how to deterministically identify B of the largest magnitude frequencies of ˆ A, and estimate their coefficients, in polynomial(B, logN) time. Randomized sub-linear time algorithms, which have a small (controllable) probability of failure for each processed signal, exist for solving this problem. However, for failure intolerant applications such as those involving mission-critical hardware designed to process many signals over a long lifetime, deterministic algorithms with no probability of failure are highly desirable. In this paper we build on the deterministic Compressed Sensing results of Cormode and Muthukrishnan (CM) [26, 6, 7] in order to develop the first known deterministic sublinear time sparse Fourier Transform algorithm suitable for failure intolerant applications. Furthermore, in the process of developing our new Fourier algorithm, we present a simplified deterministic Compressed Sensing algorithm which improves on CM’s algebraic compressibility results while simultaneously maintaining their results concerning exponential decay.

and the Ann Arbor Fast Fourier Transform (AAFFT) in the Fast Approximation Algorithms page.

Finally there is Performance of the l0 approximation in a general dictionary by Francois Malgouyres, Mila Nikolova. The abstract reads:
We consider the minimization of the number of non-zero coefficients (the ℓ0 “norm”) of the representation of a data set in terms of an arbitrary dictionary under any fidelity constraint. This (nonconvex) optimization problem naturally leads to the sparsest representations, compared with other functionals. Our goal is to measure the sets of data yielding an exactly K-sparse solution— involving K nonzero components—from data uniformly distributed on a domain defined by any norm. We describe these sets of data and derive bounds on their Lebesgue measure. They lead to bounds on the probability of getting an exactly K-sparse solution.

One final note, Lee Potter, Phil Schniter, and Justin Ziniel just released Sparse Reconstrution for Radar. It is an interesting paper because it gives reasons as to why they are not using compressed sensing! (paragraph 1.2)

Credit photo: NASA, Hirise view of Hellas Planitia on Mars.

Tuesday, April 29, 2008

CS: Structurally Random Matrices, Combining Geometry And Combinatorics, Alternating l1 Relaxation, Fast Random Projections using Lean Walsh Transforms

Today is a good day for understanding the different measurements means in Compressed Sensing. We have one paper showing us some sort of improvement over current ad-hoc measurements like partial Fourier Transforms and we also have another paper that makes the bridge between geometric and combinatorial measurement matrices. I will update the measurement section of the big picture accordingly.

Trac Tran released his ICASSP presentation and associated software. It is here (thanks to the Rice site for the pointer). In the compressed file, one can find the paper entitled: Fast Compressive Sampling With Structurally Random Matrices by Thong T. Do, Trac Tran and Lu Gan

This paper presents a novel framework of fast and efficient compressive sampling based on the new concept of structurally random matrices. The proposed framework provides four important features. (i) It is universal with a variety of sparse signals. (ii) The number of measurements required for exact reconstruction is nearly optimal. (iii) It has very low complexity and fast computation based on block processing and linear filtering. (iv) It is developed on the provable mathematical model from which we are able to quantify trade-offs among streaming capability, computation/memory requirement and quality of reconstruction. All currently existing methods only have at most three out of these four highly desired features. Simulation results with several interesting structurally random matrices under various practical settings are also presented to verify the validity of the theory as well as to illustrate the promising potential of the proposed

As pointed out in the illuminating presentation slides, here are the issues:

Hence, they seem to use an approach is a little reminiscent of the three stage PHD approach of Nir Ailon and Bernard Chazelle, and Edo Liberty when implementing a Fast Johnson-Lindenstrauss transform (Monday Morning Algorithm Part 6: The Fast Johnson-Lindenstrauss Transform). This approach is also ideally conducive to the operator approach implemented in Sparco.

The graph above shows a typical Structural Random Matrice but the slides have other combinations leading to a family of Sparse Structurally Random Matrices. I also note that the author is looking forward to deterministic compressive sampling by replacing the random downsampler by a deterministic downsampler or a deterministic lattice of measurements such as the one by Yves Meyer and Basarab Matei.

Do you recall this previous entry: Compressed Sensing: There is no theory behind it right now, but does it work ? well it looks like there really is a theory behind it. Radu Berinde, Anna Gilbert, Piotr Indyk, Howard Karloff and Martin Strauss just released the important: Combining Geometry And Combinatorics: A Unified Approach to Sparse Signal Recovery ( but also here). The abstract reads:
There are two main algorithmic approaches to sparse signal recovery: geometric and combinatorial. The geometric approach starts with a geometric constraint on the measurement matrix  and then uses linear programming to decode information about x from x. The combinatorial approach constructs  and a combinatorial decoding algorithm to match. We present a unified approach to these two classes of sparse signal recovery algorithms. The unifying elements are the adjacency matrices of high-quality unbalanced expanders. We generalize the notion of Restricted Isometry Property (RIP), crucial to compressed sensing results for signal recovery, from the Euclidean norm to the ℓp norm for p  close to 1, and then show that unbalanced expanders are essentially equivalent to RIP-p matrices. From known deterministic constructions for such matrices, we obtain new deterministic measurement matrix constructions and algorithms for signal recovery which, compared to previous deterministic algorithms, are superior in either the number of measurements or in noise tolerance.
It's a dense paper, I had to read it several times but it has some far reaching consequences. From the conclusion of the paper (emphasis are mine):
We show in this paper that the geometric and the combinatorial approaches to sparse signal recovery are different manifestations of a common underlying phenomenon. Thus, we are able to show a unified perspective on both approaches—the key unifiying elements are the adjacency matrices of unbalanced expanders.
In most of the recent applications of compressed sensing, a physical device instantiates the measurement of x and, as such, these applications need measurement matrices which are conducive to physical measurement processes. This paper shows that there is another, quite different, large, natural class of measurement matrices, combined with the same (or similar) recovery algorithms for sparse signal approximation. These measurement matrices may or may not be conducive to physical measurement processes but they are quite amenable to computational or digital signal measurement. Our work suggests a number of applications in digital or computational “sensing” such as efficient numerical linear algebra and network coding.
The preliminary experimental analysis exhibits interesting high-dimensional geometric phenomena as well. Our results suggest that the projection of polytopes under Gaussian random matrices is similar to that of projection by sparse random matrices, despite the fact that Gaussian random matrices are quite different from sparse ones.

One now wonders how these two papers are going to overlap, could we use some of the matrices designed in the second paper as global/local randomizers in the first paper?

Following up on a previous entry, Stephane Chretien, now has a preprint of An Alternating l_1 Relaxation for compressed sensing.

Finally, in line with the first two papers, here is one by Edo Liberty, Nir Ailon, Amit Singer entitled Fast Random Projections using Lean Walsh Transforms. The abstract reads:

We present a k x d random projection matrix that is applicable to vectors x element of R^d in O(d) operations if d superior or equal to k^2(k+\delta') . Here, k is the minimal Johnson Lindenstrauss dimension and \delta' is arbitrarily small. The projection succeeds, with probability 1 - 1/n, in preserving vector lengths, up to distortion \epsilon, for all vectors such that ||x||_infty inferior or equal to ||x||_2 k^(1/2) d^(-\delta)(for arbitrary small \delta). Sampling based approaches are either not applicable in linear time or require a bound on ||x||_infty that is strongly dependant on d. Our method overcomes these shortcomings by rapidly applying dense tensor power matrices to incoming vectors.

Of noted interest is the following comment when dealing with dimension higher than 3:

These matrices are constructed using tensor products and can be applied to any vector in Rd in linear time, i.e, in O(d).

Credit: NASA/JPL/Space Science Institute, the Sun shines behind Saturn, a view we can never see from Earth. Released: October 11, 2006

Monday, April 28, 2008

CS: Necessary and Sufficient Conditions on Sparsity Pattern Recovery, Seeing Speech: Capturing Vocal Tract Shaping using Real-time MR and a talk

Here is a fascinating paper . We are generally accustomed to big O notation for the number of measurements necessary to recover signals in CS, this paper addresses the issue of the functional dependence of the constant that goes into the big O notation as a function of the signal-to-noise ratio (SNR) and mean-to-average ratio (MAR). The paper is Necessary and Sufficient Conditions on Sparsity Pattern Recovery and was written by Alyson Fletcher, Sundeep Rangan, and Vivek Goyal. The abstract reads:

The problem of detecting the sparsity pattern of a k-sparse vector in Rn from m random noisy measurements is of interest in many areas such as system identification, denoising, pattern recognition, and compressed sensing. This paper addresses the scaling of the number of measurements m, with signal dimension n and sparsity-level nonzeros k, for asymptotically-reliable detection. We show a necessary condition for perfect recovery at any given SNR for all algorithms, regardless of complexity, is m = (k log(n − k)) measurements. Conversely, it is shown that this scaling of (k log(n − k)) measurements is sufficient for a remarkably simple “maximum correlation” estimator. Hence this scaling is optimal and does not require more sophisticated techniques such as lasso or matching pursuit. The constants for both the necessary and sufficient conditions are precisely defined in terms of the minimum-to average ratio of the nonzero components and the SNR. The necessary condition improves upon previous results for maximum likelihood estimation. For lasso, it also provides a necessary condition at any SNR and for low SNR improves upon previous work. The sufficient condition provides the first asymptotically-reliable detection guarantee at finite SNR.

Much more can be gathered from reading the paper.

The next paper does not make a specific headline with respect to Compressed Sensing. This is good, it means that the Compressed Sensing approach is becoming mature (at least in fMRI) to the point where it enables new ways of obtaining data thereby producing new ways of matching different types of data (here MRI and voice recording). It's a paper by Erik Bresch, Yoon-Chul Kim, Krishna S. Nayak, Dani Bird, and Shrikanth Narayanan entitled Seeing Speech: Capturing Vocal Tract Shaping using Real-time Magnetic Resonance Imaging (also here). The beginning of the paper reads:

Understanding human speech production is of great interest from engineering, linguistic, and several other research points of view. While several types of data available to speech understanding studies lead to different avenues for research, in this article we focus on real-time (RT) magnetic resonance imaging (MRI) as an emerging technique for studying speech production. We discuss the details and challenges of RT magnetic resonance (MR) acquisition and analysis, and modeling approaches that make use of MRI data for studying speech production.

Here is an interesting new approach whereby compressed sensing is used not to reconstruct an image but to grab essential characteristics of the phenomena being monitored. This is along the lines of the manifold techniques (red arrows in the graph in the Compressed Sensing Big Picture graphic). As we all know the reconstruction is a testimony on how many samples are needed to get an information, the real eldorado is in using these compressed measurements directly. In fMRI, there has much work in finding the right trajectories in the k-space to produce adequate full images, the next steps is obviously a patchwork of techniques using different measurement tools and machine learning techniques to enable the right data fusion process and thereby produce new methodologies and new discoveries ( I have yet to write on the process which I think needs a very good data fusion effort).

On a different note, Trac Tran will give a talk at Simon Fraser University on Fast, Efficient, and Practical Algorithms for Compressed Sensing. Let us note the emergence of a new greedy algorithm: GOMP.

Date: Thursday, May 22, 2008
Time: , 3:00 pm to 4:00 pm
Location: SFU ASSC-1: Room 10041

It is going to be listed on the CS calendar. The abstract of the talk is:

In the conventional uniform sampling framework, the Shannon/Nyquist theorem tells us to sample a signal at a rate at least two times faster than its bandwidth for the original signal to be perfectly reconstructed from its samples. Recently, compressed sensing has emerged as a revolutionary signal sampling paradigm which shows that Shannon theorem is indeed overly pessimistic for signals with a high degree of sparsity or compressibility. The compressed sensing framework demonstrates that a small number of random linear projections, called measurements, contains sufficient information for signal reconstruction, even exactly. The two key components of compressed sensing are: (i) the sensing matrix at the encoder must be highly incoherent with the sparsifying signal transformation; and (ii) sophisticated non-linear algorithms such as basis pursuit or orthogonal matching pursuit are employed at the decoder to recover the sparsest signal from the received measurements.

The first part of this talk gives an overview of the new compressed sensing framework along with the most elegant breakthrough results in the field. The second part focuses on two recent compressed sensing discoveries from the JHU Digital Signal Processing Lab. Particularly, a fast and efficient sampling algorithm for compressed sensing based on structurally random matrices will be presented. Our proposed sampling scheme provides several crucial features for practical implementation: fast computable, memory efficient, streaming capable, and hardware friendly while retaining comparable theoretical performance bounds with current state-of-the-art techniques. Secondly, at the decoder side, we present a novel iterative reconstruction algorithm for compressed sensing called Generalized Orthogonal Matching Pursuit (GOMP) that can adaptively, at each iteration step, admit new atoms to join the current selected set from a small candidate set while discard from the selected set atoms that might be highly regarded in previous steps. Simulation results show that GOMP’s performance far exceeds the best existing iterative algorithms with reasonable complexity overhead. Finally, future research directions in compressed sensing are also discussed if time permits.

Friday, April 25, 2008

CS: Coded Aperture, Terahertz Imaging, Finite Rate of Innovation from Noisy Samples, Prior image constrained CS, MRI enhancement, IT and L1

My continued interest in using off the shelf photographic equipment (see CoSaMP, CVX , Mapping and Search and Rescue) with coded aperture (see Learning Bases from Scratch, an Optical Heterodyning Camera and RandomTime Coded Imaging, and What Can Nonlinear Reconstruction Techniques Bring to Coded Aperture in Space and in Nuclear Medicine ?) to produce additional data led me to this new paper Ankit Mohan, Xiang Huang, Ramesh Raskar and Jack Tumblin in Sensing Increased Image Resolution Using Aperture Masks. The abstract reads:

We present a technique to construct increased-resolution images from multiple photos taken without moving the camera or the sensor. Like other super-resolution techniques, we capture and merge multiple images, but instead of moving the camera sensor by sub-pixel distances for each image, we change masks in the lens aperture and slightly defocus the lens. The resulting capture system is simpler, and tolerates modest mask registration errors well. We present a theoretical analysis of the camera and image merging method, show both simulated results and actual results from a crudely modified consumer camera, and compare its results to robust ‘blind’ methods that rely on uncontrolled camera displacements.
This is a very interesting development. I am sure I will come back to it later. The attendant website is here.

Talking about coded aperture and following up on the interest of the astronomy community to look deeper in Compressed Sensing, the ADA conference series organized by ESA is opening to it as can be seen in their welcoming opening statement.

Held regularly since 2001, the ADA conference series is focused on algorithms, signal and data processing. The program includes invited and contributed talks as well as posters. This conference series has been characterized by a range of innovative themes, including curvelet transforms, and clustering in cosmology, while at the same time remaining closely linked to front-line open problems and issues in astrophysics and cosmology. Herschel and PLANCK will be launched in 2008, and so ADA-V will focus in particular on:

  • inverse problems such as map-making and component separation
  • multi-wavelength data analysis

Recent developments in harmonic analysis, especially the "compressed sensing" theory may have major impact on the way we collect, transfer and analyze the data. The ADA-V conference will have an invited expert from this field in order to diffuse these new ideas in the astronomical community.

The invited expert is Emmanuel Candes. Making a presentation in Crete must be bad for jetlag but somebody's got to do it.

From, I found the following: Estimating Signals with Finite Rate of Innovation from Noisy Samples: A Stochastic Algorithm by Vincent Yan Fu Tan, Vivek Goyal. The abstract reads:

As an example of the recently-introduced concept of rate of innovation, signals that are linear combinations of a finite number of Diracs per unit time can be acquired by linear filtering followed by uniform sampling. However, in reality, samples are rarely noiseless. In this paper, we introduce a novel stochastic algorithm to reconstruct a signal with finite rate of innovation from its noisy samples. Even though variants of this problem has been approached previously, satisfactory solutions are only available for certain classes of sampling kernels, for example kernels which satisfy the Strang–Fix condition. In this paper, we consider the infinite-support Gaussian kernel, which does not satisfy the Strang–Fix condition. Other classes of kernels can be employed. Our algorithm is based on Gibbs sampling, a Markov chain Monte Carlo (MCMC) method. Extensive numerical simulations demonstrate the accuracy and robustness of our algorithm.
I also found this actual use of CS in Terahertz imaging by some fo the folks at Rice: Terahertz Imaging with Compressed Sensing and Phase Retrieval by Wai Lam Chan, Matthew Moravec, Richard Baraniuk, and Daniel Mittleman. The abstract reads:
We describe in this paper a novel, high-speed pulsed terahertz (THz) Fourier imaging system based on compressed sensing (CS), a new signal processing theory which allows image reconstruction with fewer samples than traditionally required. Using CS, we successfully reconstruct a 64 × 64 image of an object with pixel size 1.4 mm using a randomly chosen subset of the 4096 pixels which define the image in the Fourier plane, and observe improved reconstruction quality when we apply phase correction. For our chosen image, only about 12% of the pixels are required for reassembling the image. In combination with phase retrieval, our system has the capability to reconstruct images with only a small subset of Fourier amplitude measurements, and thus has potential application in THz imaging with continuous-wave (CW) sources.
One can note the 12% sampling required to get back an OK image (second column of images) using SPGL1 for the reconstruction algorithm. And finally, two papers from the medical imaging community:

The first one I could not find the paper, just the PubMed reference: Prior image constrained compressed sensing (PICCS): a method to accurately reconstruct dynamic CT images from highly undersampled projection data sets by Guang-Hong Chen, Jie Tang, and Shuai Leng
When the number of projections does not satisfy the Shannon/Nyquist sampling requirement, streaking artifacts are inevitable in x-ray computed tomography (CT) images reconstructed using filtered backprojection algorithms. In this letter, the spatial-temporal correlations in dynamic CT imaging have been exploited to sparsify dynamic CT image sequences and the newly proposed compressed sensing (CS) reconstruction method is applied to reconstruct the target image sequences. A prior image reconstructed from the union of interleaved dynamical data sets is utilized to constrain the CS image reconstruction for the individual time frames. This method is referred to as prior image constrained compressed sensing (PICCS). In vivo experimental animal studies were conducted to validate the PICCS algorithm, and the results indicate that PICCS enables accurate reconstruction of dynamic CT images using about 20 view angles, which corresponds to an under-sampling factor of 32. This undersampling factor implies a potential radiation dose reduction by a factor of 32 in myocardial CT perfusion imaging.
and Compressed sensing for resolution enhancement of hyperpolarized (13)C flyback 3D-MRSI. Hu S, Lustig M, Chen AP, Crane J, Kerr A, Kelley DA, Hurd R, Kurhanewicz J, Nelson SJ, Pauly JM, Vigneron DB, The abstract reads:

High polarization of nuclear spins in liquid state through dynamic nuclear polarization has enabled the direct monitoring of (13)C metabolites in vivo at very high signal-to-noise, allowing for rapid assessment of tissue metabolism. The abundant SNR afforded by this hyperpolarization technique makes high-resolution (13)C 3D-MRSI feasible. However, the number of phase encodes that can be fit into the short acquisition time for hyperpolarized imaging limits spatial coverage and resolution. To take advantage of the high SNR available from hyperpolarization, we have applied compressed sensing to achieve a factor of 2 enhancement in spatial resolution without increasing acquisition time or decreasing coverage. In this paper, the design and testing of compressed sensing suited for a flyback (13)C 3D-MRSI sequence are presented. The key to this design was the undersampling of spectral k-space using a novel blipped scheme, thus taking advantage of the considerable sparsity in typical hyperpolarized (13)C spectra. Phantom tests validated the accuracy of the compressed sensing approach and initial mouse experiments demonstrated in vivo feasibility
I note the following interesting bit:
At this point, it is interesting to consider the connection between compressed sensing and existing techniques in NMR, such as maximum entropy [15-16] and minimum area [17] reconstruction, used for the related problem of computation of spectra from short, noisy data records. Recently, Stern et al showed that a specific form of iterative thresholding, a technique similar to maximum entropy and minimum area reconstruction, is equivalent to the minimum l1 norm reconstruction in compressed sensing [18]. Additionally, Stern explains how l1 norm reconstruction gives insight into the performance of maximum entropy and minimum area reconstruction. Thus, compressed sensing could be viewed as a generalization of existing NMR techniques.

Following up on that, I have found only the abstract of the paper by Stern, Donoho and Hoch at PubMed: "NMR data processing using iterative thresholding and minimum l(1)-norm reconstruction" by Stern AS, Donoho DL, Hoch JC. The abstract reads:
Iterative thresholding algorithms have a long history of application to signal processing. Although they are intuitive and easy to implement, their development was heuristic and mainly ad hoc. Using a special form of the thresholding operation, called soft thresholding, we show that the fixed point of iterative thresholding is equivalent to minimum l(1)-norm reconstruction. We illustrate the method for spectrum analysis of a time series. This result helps to explain the success of these methods and illuminates connections with maximum entropy and minimum area methods, while also showing that there are more efficient routes to the same result. The power of the l(1)-norm and related functionals as regularizers of solutions to under-determined systems will likely find numerous useful applications in NMR.

Compressed Sensing: Alternating l_1 Reconstruction Algorithm.

Laurent Duval pointed out to this presentation by Stephane Chretien on Tighter relaxations for the sparsest recovery problem where he presents a new reconstruction called Alternating l_1 reconstruction algorithm:

The main problem addressed in subsequent works and still of great interest now is the one of increasing the value of k for which every k-signal can be reconstructed exactly for a given pair (n,m). We now present a generalization of the l_1 relaxation which we call the Alternating l_1 relaxation.

Using Lemarechal and Oustry’s guidance given in “Semidefinite relaxations of combinatorial optimization problems”. He uses "Lagrangian duality as a convenient framework for building convex relaxations to hard nonconvex optimization problems" and then goes on to explain his method. He then makes a comparison with the reweighted L1 method (mentioned here in Reweighted L1 and a nice summary on Compressed Sampling). Could we just hope it's faster than reweighted l1, please... pretty please?

I have added it to the list of reconstruction algorithms in the big picture section in the categories of code not implemented yet.

Thursday, April 24, 2008

Compressed Sensing: Nuclear Norm for Rank Minimization, Random Matrix Theory, Fundamental Limits on Sensing Capacity for Sensor Networks and CS

We mentioned the type of work that draws a strong parallel between the sparse approximation and rank minimization before (CS is not just Compressed Sampling nor Compressed Sensing. and in Trace: What is it good for ? - How Compressed Sensing relates to Dimensionality Reduction), Babak Hassibi, Benjamin Recht, and Weiyu Xu now go a little further in Necessary and Sufficient Condtions for Success of the Nuclear Norm Heuristic for Rank Minimization. The abstract reads:

Rank minimization–minimizing the rank of a matrix subject to constraints–is a challenging problem that arises in many control applications including controller design, realization theory and model reduction. The general formulation of rank minimization subject to convex constraints is NP-hard, and for most practical problems there are no efficient exact algorithms. A popular heuristic algorithm is to replace the rank function with the sum of the singular values of the decision variable. In this paper, we provide a necessary and sufficient condition that quantifies when this heuristic successfully finds the minimum rank solution of a linear constraint set. We further show that most of the problems of interest in control can be formulated as rank minimization subject to such linear constraints. We additionally provide a probability distribution over instances of the affine rank minimization problem such that instances sampled from this distribution satisfy our necessary conditions for success with overwhelming probability provided the number of constraints is appropriately large. Finally we give empirical evidence that these probabilistic bounds are realistic in numerical simulations.

There is also now the second version of the CoSAMP paper by Deanna Needel and Joel Tropp.

The subject of Random Matrices is central to some aspect of the measurement problems in Compressed Sensing. For the theoretically inclined, here is a small compilation of what is happening in the field:

Let us first note two entries in Terry Tao's blog:Lewis lectures and On the permanent of a random Bernoulli matrix.

Following ChapterZero links, I found these lecture notes by Roman Vershynin on Deanna Needel page. The course is entitled: Non-Asymptotic Random Matrix Theory
(updated thanks to an anonymous commenter)

Lecture Notes

Attention: this new tutorial partially supersedes the old lecture notes below, see the publications page. However, geometric topics (Lectures 12-14) are not covered in the new tutorial. They are going to be included in the Lecture notes in geometric functional analysis which are being written.
These notes were taken by students, and they have not been edited.
Lecture 1. Background, Techniques, Methods. 
Lecture 2. Concentration of Measure. 
Lecture 3. Concentration of Measure (cont'd). 
Lecture 4. Dimension Reduction. 
Lecture 5. Subgaussian Random Variables. 
Lecture 6. Norm of a Random Matrix. 
Lecture 7. Largest, Smallest, Singular Values of Random Rectangular Matrices. 
Lecture 8. Dudley's Integral Inequality. 
Lecture 9. Applications of Dudley's Inequality -- Sharper Bounds for Random Matrices. 
Lecture 10. Slepian's Inequality - Sharpness Bounds for Gaussian Matrices. 
Lecture 11. Gordon's Inequality. 
Lecture 12. Sudakov's Minoration. 
Lecture 13. Sections of Convex Sets via Entropy and Volume. 
Lecture 14. Sections of Convex Sets via Entropy and Volume (cont'd). 
Lecture 15. Invertibility of Square Gaussian Matrices, Sparse Vectors. 
Lecture 16. Invertibility of Gaussian Matrices and Compressible/Incompressible Vectors. 
Lecture 17. Invertibility of Subgaussian Matrices -- Small Ball Probability via the Central Limit Theorem. 
Lecture 18. Strong Invertibility of Subgaussian Matrices and Small Ball Probability via Arithmetic Progressions. 
Lecture 19. Small Ball Probability via Sum-Sets. 
Lecture 20. The Recurrence Set (Ergodic Approach). 

Other sites of interest include that of Jiri Matousek and this post by Suresh Venkatasubramanian and Piotr Indyk on the concentration of measure

Following up on their work in sensor networks and compressed sensing (Compressed Sensing: Sensor Networks, Learning Compressed Sensing by Uncertain Component Analysis, Sparsity or Positivity ?, New CVX build) Shuchin Aeron, Manqi Zhao, and Venkatesh Saligrama just released on the following preprint: Fundamental Limits on Sensing Capacity for Sensor Networks and Compressed Sensing. The abstract reads:
Modern applications of signal processing problems arising in sensor networks require efficient sensing of multi-dimensional data or phenomena. In this context it becomes important to understand fundamental performance limits of both sensing and communication between sensors. In this paper we focus primarily on sensing aspects. We propose the notion of sensing capacity to characterize the performance and effectiveness of various sensing configurations. We define Sensing Capacity as the maximal number of signal dimensions reliably identified per sensor deployed. The inverse of sensing capacity is the compression rate, i.e., the number of measurements required per signal dimension for accurate reconstruction, a concept that is of interest in compressed sensing(CS). Using information theoretic arguments we quantify sensing capacity (compression rate) as a function of information rate of the event domain, SNR of the observations, desired distortion in the reconstruction and diversity of sensors. In this paper we consider fixed SNR linear observation models for sensor network (SNET) and CS scenarios for different types of distortions motivated by detection, localization and field estimation problems. The principle difference between the SNET and CS scenarios is the way in which the signal-tonoise ratio (SNR) is accounted. For SNET scenarios composed of many sensors, it makes sense to impose fixed SNR for each individual sensor which leads to row-wise normalization of the observation matrix. On the other hand CS has dealt with column-wise normalization, i.e., the case where the vector valued observation corresponding to each signal component is normalized. The two cases lead to qualitatively different results. In the SNET scenario sensing capacity is generally small and correspondingly the number of sensors required does not scale linearly with the target density(sparsity) in contrast to the CS setting. Specifically, the number of measurements are generally proportional to the signal dimension and is weakly dependent on target density or sparsity. This raises questions on compressive gains in power limited sensor network applications based on sparsity of the underlying source domain. For compressed sensing scenario, we also illustrate gaps between existing state of the art convex programming methods and optimal estimators.

Image Credit: NASA/JPL/Space Science Institute, Saturn F-ring as seen by Cassini on March 31, 2007, at a distance of about 2 million kilometers (1.2 million miles) from Saturn.

Compressed Sensing: Sparse Representations: From Source Separation to Compressed Sensing, the video.

Remi Gribonval 's HDR presentation entitled "Sparse Representations: From Source Separation to Compressed Sensing" and featured yesterday (in Compressed Sensing: Building Dictionaries, What is it good for ?) is also in a video and in an audio only format. The accompanying slides are here. The presentation is in French but the slides are in English. Let us note the great work of putting both slides and the speaker in the same video.

Wednesday, April 23, 2008

Compressed Sensing: Building Dictionaries, What is it good for ?

In the big picture on Compressed Sensing, a significant amount of time is spent on figuring out the difference between what the theorems can provably show and what in practice works pretty often. Let us recall that the initial results showing that the L1 regularization provided the sparsest decomposition were in found in geophysics. Only afterwards did it catch the attention of more mathematically inclined folks. This is why surprising results such as the one by Laurent Jacques and others are always welcome because they will eventually be proven for some category of signal or other parameters. Until then we are left with a lot of tweaking and trials.

Part of the big picture is built on the assumption that once the dictionaries have been found, one can directly proceed into the sensing (measurement) part with the knowledge that the two ensembles (the dictionary and the measurement ensemble) are somehow incoherent. The intent is then to produce some sort of hardware that will in effect project as efficiently as possible on this measurement ensemble. The expectation is that the projection measurement will be linear and will therefore reduce power consumption at the sensor level.

But how do you produce the dictionaries in the first place and what type of properties should they have ? Initially the thought is to use known signals, decompose them in some sort of atoms and then cross one's fingers and hope for the best. In particular, hope that with different training data, one would obtain the same dictionary. Well, the story is a little bit more complicated than that, as eloquently presented by Remi Gribonval in his Habilitation a Diriger Des Recherches. In France, this document is part of a process whereby upon completion, one is allowed to become the thesis advisor of Ph.D and M.S. students. The title is in French but the main body of the report is in English:

Sur quelques problèmes mathématiques de modélisation parcimonieuse.
[The presentation slides of that defense (more visual) can be found here (and they are in English)]

The style of the document uses "I" pretty often as the goal of the document is to feature what accomplishments the individual has made in the field to the reviewers. In all, this review is an excellent snapshot of what we know and what we do not know in that field. Because it is an important document, I have added it to the CS big picture page. Maybe instead of being called the big picture, the page should be renamed: Compressed Sensing Technology Watch.

As made clear in the document, Remi Gribonval is one of the co-author of the Matching Pursuit Toolkit listed in the code section. Let us note the incredible capabilities of this software:

MPTK provides an implementation of Matching Pursuit which is:

  • FAST: e.g., extract 1.5 million atoms from a 1 hour long, 16kHz audio signal (15dB extracted) in 0.25x real time on a Pentium IV@2.4GHz, out of a dictionary of 178M Gabor atoms. Such incredible speed makes it possible to process "real world" signals.

  • FLEXIBLE: multi-channel, various signal input formats, flexible syntax to describe the dictionaries => reproducible results, cleaner experiments.

  • OPEN: modular architecture => add your own atoms ! Free distribution under the GPL.

But if you think this is fast, wait till people can get their hands on the one of these Red cameras at 12 MP, 100 fps and want to process these movies! Talk about data flood.... We had a similar camera we wanted to put on the International Space Station but our main issue was data storage. Data transmission could really only be done by carrying the hard drives back to Earth with the Space Shuttle. TDRSS was never designed for that purpose.

The table of content of the document features the following paragraphs:

1 Audio source separation
1.1 Introduction
1.2 Evaluation of audio source separation algorithms
1.2.1 Performance measures
1.2.2 Oracle source separation estimators
1.3 Multichannel separation by sparse decomposition
1.3.1 Identification of a mixing matrix
1.3.2 Separation by sparse decomposition, given the mixing matrix
1.3.3 Dictionary design
1.4 Single-channel source separation
1.4.1 "Morphological" source diversity through sparsity ?
1.4.2 Gaussian Mixture Models
1.5 Conclusion and perspectives
2 Sparse approximation with redundant dictionaries
2.1 Introduction
2.2 Example : approximation with an orthonormal basis
2.3 Jackson and Bernstein inequalities
2.4 Approximation with general redundant dictionaries
2.4.1 Sparsity spaces
2.4.2 Approximation spaces
2.4.3 General Jackson type embeddings
2.5 A quest for Bernstein inequalities
2.5.1 Approximation with framelet systems
2.5.2 Approximation with localized frames ?
2.5.3 Approximation with decomposable incoherent dictionaries
2.6 Conclusion
3 Algorithms for sparse approximation
3.1 Introduction
3.2 Recovery of sparse representations by global optimization
3.2.1 Recovery in arbitrary incoherent dictionaries
3.2.2 Simultaneous recovery with l_tau representations
3.2.3 From representations to approximations : stable recovery
3.3 Matching Pursuit
3.3.1 The Matching Pursuit ToolKit (MPTK)
3.3.2 Stable recovery with Matching Pursuit
3.4 Bridging the gap between theory and practice
3.4.1 How incoherent are real-life dictionaries ?
3.4.2 Why is incoherence too pessimistic a tool ?
3.4.3 Structured approximation
3.4.4 Recovery with high probability
3.5 Conclusion and perspectives

Tuesday, April 22, 2008

Monday Morning Algorithm 16: S-POCS, Using Non Convex Sparse Signal Space Instead of the Convex l1 Ball Improves POCS Recovery

This is a first, Laurent Jacques is a guest blogger on this blog and since he mentioned that he had worked on it on Monday I made this algorithm part of the Monday Morning Algorithm series. But more seriously, here is what he has to say :

I have spent some time today working with some CS reconstructions using the alternate projection algorithms (POCS) of Candes and Romberg [1] and Lu Gan [2].

In the context of CS where y = Phi x (length y = m, length x = N, Phi is m by N, x is K sparse and Phi follows the RIP), [1] explains that the solution of a BP program lies at the unique (thanks to RIP) intersection between the hyperplane P={x : Phi x = y} and the l1 ball B of radius ||x||_1. Using the extra a-priori information about the l1 norm of x, it is then possible to design an iterative alternate projection algorithm (or POCS) onto these two convex sets P and B in order to find their common intersection point ( i.e. to recover x from y.). The algorithm uses then two projectors : one involving the pseudo-inverse of Phi to project any signal u on P. i.e. to a solution of Phi x = y, and another using soft-thresholding so that the new thresholded signal falls inside B.

This method (denoted hereafter by l1-POCS and implemented in cspocs_l1.m with attendant description and example in the header) works very well as shown in [1]. Experimentally however, I have observed that it is highly sensitive to potential mis-evaluation of ||x||_1. Initially, I wanted to somehow approximate this norm using y and RIP inequalities, but I gave up since a simple multiplication or division of ||x||_1 by the small factor of 1.01 either breaks the recovery algorithm or does not converge.

In [2], the author designed another algorithm made of two POCS-like stages. I have implemented only a simplified version of the first stage since it seems that the second stage is more specific to the final goal of [2], i.e. removing artifacts due both to noise on the measurements and to the use of a special blocked measurement matrix. In this first stage, the POCS structure is again an iterative alternate projection process between the previous hyperplane P and the (non convex) space of K-sparse signals. A hard thresholding keeping the K largest components of a vector is used as the projector on this second space, while the same pseudo-inverse projector than in l1-POCS is used for projection onto P. Therefore, in [2], the a priori (input) extra information is not the l1 norm of x as for l1-POCS but the sparsity of x (as in CoSaMP). I have implemented this new "sparse POCS" (or S-POCS) approach in cspocs_K.m (description and example in the header too).

Even though the space of K-sparse signals is not convex, the new algorithm works surprisingly well : fewer iterations are needed when compared to l1-POCS, and small errors on K are acceptable. An input sparsity of K' = K+10 (if K is the sparsity of x) still works in the example provided with the algorithm. Moreover, If K' is less than K, S-POCS seems to converge very slowly to a wrong solution, while for K' = K the recovery is perfect and the rate of convergence is very fast (e.g. the quantity || y - Phi x || / ||x|| converges quickly below a user defined tolerance value, with ||.|| the l2 norm). This suggests a brute force method (not implemented) to create a similar algorithm but without the a priori knowledge of the sparsity level K. Indeed, one could just test the S-POCS above on a K' ranging between 1 and N and stop this loop when S-POCS converges "sufficiently well".

A zip file containing both scripts featuring l1-POCS and S-POCS can be found here and in the Compressed Sensing codes section. After downloading the archive into a folder, you may want to run trial_s_pocs.m and trial_l1_pocs.m to convince yourselves.

References :
[1] E. J. Candès and J. Romberg. Practical signal recovery from random projections. Wavelet Applications in Signal and Image Processing XI, Proc. SPIE Conf. 5914.
[2] Lu Gan, Block compressed sensing of natural images. Proc. Int. Conf. on Digital Signal Processing (DSP), Cardiff, UK, July 2007.

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Compressed Sensing: Learning Dictionaries with MoTIF

In the big picture on Compressed Sensing, we noted that in order to perform this new kind of sampling, we first needed to know that the signal can be compressible with respect to a family of functions. How do we find these families of functions ? well, first build a dictionary of atoms using some training datasets of the known signal. I missed this series of paper for some reason, but here they are ( I could not find a software implementation readily available) :

MoTIF : an Efficient Algorithm for Learning Translation Invariant Dictionaries by Philippe Jost, Pierre Vandergheynst, Sylvain Lesage, and Rémi Gribonval. The abstract reads:
The performances of approximation using redundant expansions rely on having dictionaries adapted to the signals. In natural high-dimensional data, the statistical dependencies are, most of the time, not obvious. Learning fundamental patterns is an alternative to analytical design of bases and is nowadays a popular problem in the field of approximation theory. In many situations, the basis elements are shift invariant, thus the learning should try to find the best matching filters. We present a new algorithm for learning iteratively generating functions that can be translated at all positions in the signal to generate a highly redundant dictionary.
A more recent paper deals with a more difficult problem of dictionaries of multimodal signals in Learning Multi-Modal Dictionaries by Holger Rauhut, Karin Schnass, Gianluca Monaci, Philippe Jost, Pierre Vandergheynst, Boris Mailhé, Sylvain Lesage, and Rémi Gribonval. The abstract reads:

Real-world phenomena involve complex interactions between multiple signal modalities. As a consequence, humans are used to integrate at each instant perceptions from all their senses in order to enrich their understanding of the surrounding world. This paradigm can be also extremely useful in many signal processing and computer vision problems involving mutually related signals. The simultaneous processing of multimodal data can, in fact, reveal information that is otherwise hidden when considering the signals independently. However, in natural multimodal signals, the statistical dependencies between modalities are in general not obvious. Learning fundamental multimodal patterns could offer deep insight into the structure of such signals. In this paper, we present a novel model of multimodal signals based on their sparse decomposition over a dictionary of multimodal structures. An algorithm for iteratively learning multimodal generating functions that can be shifted at all positions in the signal is proposed, as well. The learning is defined in such a way that it can be accomplished by iteratively solving a generalized eigenvector problem, which makes the algorithm fast, flexible, and free of user-defined parameters. The proposed algorithm is applied to audiovisual sequences and it is able to discover underlying structures in the data. The detection of such audio-video patterns in audiovisual clips allows to effectively localize the sound source on the video in presence of substantial acoustic and visual distractors, outperforming state-of-the-art audiovisual localization algorithms.

Some additional explanation is given in Sylvain Lesage's Ph.D thesis and defense (100 MB ppt) (both in French) . I noted the following in the previous paper:

In this work, we analyze a broad class of multimodal signals exhibiting correlations along time. In many different research fields, the temporal correlation between multimodal data is studied: in neuroscience, electroencephalogram (EEG) and functional magnetic resonance imaging (fMRI) data are jointly analyzed to study brain activation patterns [5].

This passage got me thinking about another set of data that is currently unfolding and links to another subject area of this blog, but that is the subject of another entry.

And then finally, the issue of trying to figure out an approximation to a signal using these redundant dictionaries is treated in Atoms of all channels, unite! Average case analysis of multi-channel sparse recovery using greedy algorithms by Rémi Gribonval , Holger Rauhut, Karin Schnass, Pierre Vandergheynst. The abstract reads:

This paper provides new results on computing simultaneous sparse approximations of multichannel signals over redundant dictionaries using two greedy algorithms. The first one, p-thresholding, selects the S atoms that have the largest p-correlation while the second one, p-simultaneous matching pursuit (p-SOMP), is a generalisation of an algorithm studied by Tropp in [28]. We first provide exact recovery conditions as well as worst case analyses of all algorithms. The results, expressed using the standard cumulative coherence, are very reminiscent of the single channel case and, in particular, impose stringent restrictions on the dictionary. We unlock the situation by performing an average case analysis of both algorithms. First, we set up a general probabilistic signal model in which the coefficients of the atoms are drawn at random from the standard gaussian distribution. Second, we show that under this model, and with mild conditions on the coherence, the probability that p-thresholding and p-SOMP fail to recover the correct components is overwhelmingly small and gets smaller as the number of channels increases. Furthermore, we analyse the influence of selecting the set of correct atoms at random. We show that, if the dictionary satisfies a uniform uncertainty principle [5], the probability that simultaneous OMP fails to recover any sufficiently sparse set of atoms gets increasingly smaller as the number of channels increases. To conclude, we study the robustness of these algorithms to an imperfect knowledge of the dictionary, a situation met in sparsity-based blind source separation where the dictionary, which corresponds to a mixing matrix, is only approximately known. In this framework, we estimate the probability of failure of the considered algorithms as a function of the similarity between the reference dictionary and the approximate one, which we measure with the smallest correlation between corresponding pairs of atoms.
Of particular interest is this tidbit:

To illustrate the role of sparsity, let us introduce the coherence of the dictionary, i.ethe strongest correlation between any two distinct vectors in : $$\mu = max_{i<>j} |\phi_i . \phi_j|$$. Schematically, if a signal is a superposition of less than 1/μ elements of , this representation is unique and can be recovered by standard algorithms.

Credit: NASA/JPL/University of Arizona, Disappearing Dunes on Mars

Compressed Sensing: What Can Nonlinear Reconstruction Techniques Bring to Coded Aperture in Space and in Nuclear Medicine ?

Last week, at a presentation Jean-Luc Starck mentioned the French-Chinese ECLAIRs mission [1] still on the drawing board (it is actually very advanced as it will launch in 2009) that may benefit from some of the thinking that currently goes in the reconstruction techniques used in Compressed Sensing. As mentioned before here and here, coded aperture masks have been used in the past forty years because there was a linear reconstruction technique available. Compressed Sensing now enlarges this field by providing non-linear reconstruction techniques. And so the question is: How far will the field change because of these new reconstruction techniques ? Let me try to mention the issues that may be aided and the attendant questions that may need trade studies evaluations.

* In the current configuration of ECLAIRs, it seems that the current mask is not optimal with regards to the linear reconstruction paradigm, can the nonlinear reconstruction technique provided better recovery with the same mask or with a different mask that does not rely on the linear results ?

* In Roberto Accorsi's thesis [2], it is painfully obvious that one needs to think of the coded aperture in nuclear systems in a slightly different way than when considering light based instrumentation. This is because radiation goes through everything. Using Olivier Godet's thesis [3] one can see that similar issues pop up in the space based system. In the case of ECLAIRs, some of the weight added to the camera has to do with shielding the camera on the side. On top of this, the mask itself has to be designed so that it removes 95% of the rays passing through it in the 4-50 keV band.

Since, we can model most of the rays going through the camera (Olivier Godet's thesis used Geant and other Monte Carlo codes), is the mask over-designed ? Since radiation is a linear process, could we get more information by delineating some of the signal going through the mask, in other words, can we reduce the 95% mark down to 50% and use the nonlinear reconstruction techniques. This could initially be a mass saving issue but I am sure that this type of activity would do in advancing the state of the art in gamma ray camera for medical purposes on earth.

* Finally, as shown in the presentation of ECLAIRs, the satellite is there to respond to unanticipated bursts. Most data goes through the X-band for transmission, but the VHF band is used to alert the ground that an event is unfolding.

Can the alert system be a simple one based on the smaller number of compressed measurements directly taken by the coded aperture ?

[1] The ECLAIRs micro-satellite for multi-wavelength studies of gamma-ray burst prompt emission
[2] Roberto Accorsi, Design of near-field coded aperture cameras for high resolution medical and industrial gamma-ray imaging. June 2001, MIT.
[3] Olivier Godet, Simulations de la Camera d'imagerie grand champ d'ECLAIRs