Tuesday, May 06, 2008

CS: Iterative Hard Thresholding for Compressed Sensing and Stagewise Weakly Gradient Pursuits, A necessary and sufficient condition for l1 recovery

Three new papers from Thomas Blumensath and Mike Davies on two different reconstruction algorithms. Both algorithm are implemented in the Sparsify package.

Iterative Hard Thresholding for Compressed Sensing, also at arXiv. The abstract reads:

Compressed sensing is a technique to sample compressible signals below the Nyquist rate, whilst still allowing near optimal reconstruction of the signal. In this paper we present a theoretical analysis of the iterative hard thresholding algorithm when applied to the compressed sensing recovery problem. We show that the algorithm has the following properties (made more precise in the main text of the paper)
  • It gives near-optimal error guarantees.
  • It is robust to observation noise.
  • It succeeds with a minimum number of observations.
  • It can be used with any sampling operator for which the operator and its adjoint can be computed.
  • The memory requirement is linear in the problem size.
  • Its computational complexity per iteration is of the same order as the application of the measurement operator or its adjoint.
  • It requires a fixed number of iterations depending only on the logarithm of a form of signal to noise ratio of the signal.
  • Its performance guarantees are uniform in that they only depend on properties of the sampling operator and signal sparsity.
From the paper:

In this paper, it is shown that one of these algorithms (termed from now on IHTs) has similar performance Iterative Hard Thresholding for Compressed Sensing guarantees to those of CoSaMP.
They then make a specific claim with regards to CoSaMP:

The number of iterations required for IHTs is logarithmical in the signal to noise ratio. This means that for noiseless observations, IHTs would require an infinite number of iterations to reduce the error to zero. This is a well known property of algorithms that use updates of the form y[n] + T (x − y[n]). CoSaMP on the other hand is guaranteed to estimate y with precision 20\epsilon s in at most 6(s + 1) iterations, however, to achieve this, CoSaMP requires the solution to an inverse problem in each iteration, which is costly. IHTs does not require the exact solution to an inverse problem. If CoSaMP is implemented using fast partial solutions to the inverse problems, the iteration count guarantees become similar to the ones derived here for IHTs.
If you recall Joel Tropp mentions the need to use a conjugate gradient technique that is supposed to converge in about 3 steps. I'd like to think that this blog contributed in getting this very helpful statement of comparison between what seem to be very fast algorithms (but I have been wrong before :-))

They have also devised a greedy algorithm in two papers:
Stagewise Weak Gradient Pursuits. Part I: Fundamentals and Numerical Studies. The abstract reads:
Finding sparse solutions to underdetermined inverse problems is a fundamental challenge encountered in a wide range of signal processing applications, from signal acquisition to source separation. Recent theoretical advances in our understanding of this problem have further increased interest in their application to various domains. In many areas, such as for example medical imaging or geophysical data acquisition, it is necessary to find sparse solutions to very large underdetermined inverse problems. Fast methods have therefore to be developed. In this paper, we promote a greedy approach. In each iteration, several new elements are selected. The selected coefficients are then updated using a conjugate update direction. This is an extension of the previously suggested Gradient Pursuit framework to allow an even greedier selection strategy. A large set of numerical experiments, using artificial and real world data, demonstrate the performance of the method. It is found that the approach performs consistently better than other fast greedy approaches, such as Regularised Orthogonal Matching Pursuit and Stagewise Orthogonal Matching Pursuit and is competitive with other fast approaches, such as those based on ℓ1 minimisation. It is also shown to have the unique property to allow a smooth trade-off between signal sparsity (or observation dimension) and computational complexity. Theoretical properties of the method are studied in a companion paper [1].

Stagewise Weak Gradient Pursuits. Part II: Theoretical Properties
. The abstract reads:

In a recent paper [1] we introduced the greedy Gradient Pursuit framework. This is a family of algorithms designed to find sparse solutions to underdetermined inverse problems. One particularly powerful member of this family is the (approximate) conjugate gradient pursuit algorithm, which was shown to be applicable to very large data sets and which was found to perform nearly as well as the traditional Orthogonal Matching Pursuit algorithm. In Part I of this paper [2], we have further extended the Gradient Pursuit framework and introduced a greedier stagewise weak selection strategy that selects several elements per iteration. Combining the conjugate gradient update of [1] with this selection strategy led to a very fast algorithm, applicable to large scale problems, which was shown in [2] to perform well in a wide range of applications. In this paper, we study theoretical properties of the approach. In particular, we present a proof that shows that the conjugate gradient update is guaranteed to be better than a simple gradient update. A corollary of this result is that the convergence rate of the conjugate gradient pursuit algorithm is at least as fast as that of the approach based on the gradient. The other contribution of this paper is to derive theoretic guarantees that give conditions under which the stagewise weak selection strategy can be used to exactly recover sparse vectors from a few measurements, that is, guarantees under which the algorithm is guaranteed to find the sparsest solution.

Also to note, Laurent Duval forwarded me a very interesting paper entitled A necessary and sufficient condition for exact recovery by l_1 minimization by Charles Dossal. Thanks to several readers, here is the abstract:

The minimum l1-norm solution to an underdetermined system of linear equations y = Ax, is often, remarkably, also the sparsest solution to that system. Since the seminal work of Donoho and co-workers, we have witnessed a flurry of research activity which has focused on sufficient conditions ensuring a unique sparsest solution, in both noiseless and noisy settings. This sparsity-seeking property is of interest in many practical areas such as image and signal processing, communication and information theory, etc. However, most of these sufficient conditions are either too pessimistic although easily computable (e.g. bounds with mutual coherence), or sharp but difficult to check in practice. In this paper, we provide a necessary and sufficient condition for x to be identifiable for a large set of matrices A; that is to be the unique sparsest solution to the l1-norm minimization problem. Furthermore, we prove that this sparsest solution is stable under a reasonable perturbation of the observations y. We also propose an efficient semi-greedy algorithm to check our condition for any vector x. We present numerical experiments showing that our condition is able to predict almost perfectly all identifiable solutions x, whereas other previously proposed criteria are too pessimistic and fail to identify properly some identifiable vectors x. Beside the theoretical proof, this provides empirical evidence to support the sharpness of our condition.

Credit: NASA/JPL/University of Arizona, Martian North Polar Layered Deposits in Springtime, Image acquired on March 19th, 2008.


Anonymous said...

There's also the slides from a talk given by Charles Dossal and Gabriel Peyre here.

JackD said...

Funny to see that Iterative Hard Thresholding quoted above is equivalent (but not equal) to the S-POCS algo proposed before in the Monday Morning Algos if you replace the pseudoinverse (used to project solution onto the hyperplane Phi x = y) by the transposition of the measurement matrix Phi^T.

Laurent J.