## Thursday, June 30, 2011

### ICASSP 2011 videos

Petros boufounos let us know on the twitter that the ICASSP 2011 videos of the talks given there are out on this site. Enjoy!

### The Generalized Approximate Message Passing (GAMP) MATLAB package

Jort Gemmeke continues on providing us some information from SPARS11. Here is the latest refering to the last post about an implementation of the Approximate Message Passing (AMP). There is another, probably more sturdy and supported instance, the

Oops, missed this one somehow. Referred to in Volkan's oral presentation today.

http://gampmatlab.sourceforge.net

Jort

From the Gamp wiki:
Overview:   Generalized Approximate Message Passing (GAMP) is an approximate, but computationally efficient method for estimation problems withlinear mixing. In the linear mixing problem an unknown vector, , with independent components, is first passed through linear transform and then observed through a general probabilistic, componentwise measurement channel to yield a measurement vector . The problem is to estimate and from and . This problem arises in a range of applications including compressed sensing.
Optimal solutions to linear mixing estimation problems are, in general, computationally intractable as the complexity of most brute force algorithms grows exponentially in the dimension of the vector . GAMP approximately performs the estimation through a Gaussian approximation of loopy belief propagation that reduces the vector-valued estimation problem to a sequence of scalar estimation problems on the components of the vectors and .
This project is intended to develop the theory and applications of GAMP. We also provide open source MATLAB software for others who would like to experiment with the algorithm.

The contributors of this fine package include:
• Jason Parker, Ohio State
Thanks Jort.

### Approximate Message Passing (AMP) Code for Compressive Sensing.

Jort Gemmeke sent me the following:

Hi Igor,

David Donoho had a nice plenary here at SPARS2011 about approximate message passing for CS. After a bit of googling the subject, I stumbled over a matlab solver. Although its a very simple algorithm, still nice that someone implemented it already...you can find it here:

http://people.epfl.ch/ulugbek.kamilov

Maybe something for your big picture pages? Unless I didn't look well enough, I didnt see it there.

Cheers,

Jort

Jort is right, it is not there. Ulugbek Kamilov did make available his code featuring the Approximate Message Passing (AMP) code and I will feature it soon in the big pictureUlugbek Kamilov's master's Thesis is Optimal Quantization for Sparse Reconstruction with Relaxed Belief Propagation. The abstract of the thesis reads:

Compressive sensing theory has demonstrated that sparse signals can be recovered from a small number of random linear measurements. However, for practical purposes, like storage, transmission, or processing with modern digital equipment, continuous-valued compressive sensing measurements need to be quantized. In this thesis we examine the topic of optimal quantization of compressive sensing measurements under reconstruction with messagepassing algorithms by following the work on generalization of relaxed belief propagation (BP) for arbitrary measurement channels. Relaxed BP is an iterative reconstruction algorithm proposed for the task of estimation from random linear measurements. It was inspired by the traditional belief propagation algorithm widely used in decoding of low-density parity-check (LDPC) codes. One of the aspects that makes relaxed belief propagation so appealing is the state evolution framework, which predicts asymptotic error behavior of the algorithm. We utilize the predictive capability of the framework to design mean-square optimal scalar quantizers under relaxed BP signal reconstruction. We demonstrate that error performance of the reconstruction can be signiﬁcantly improved by using state evolution optimized quantizers, compared to quantizers obtained via traditional design schemes. We ﬁnally propose relaxed BP as a practical algorithm for reconstruction from measurements digitized with binned quantizers, which further improve error performance of the reconstruction.

The attendant code for AMP and phase transition related computation is here. I just tried and compared it with SL0, a fast code on its own but relying on finding the SVD of a potentially large matrix and the results are indeed very impressive for a 1-d signal of size 100,000. The code I used:

clear
% Number of iterations of the algorithm
T = 1000;

% Stopping criterion (tolerance for successfull decoding)
tol = 1e-4;
m = 200;
k = 2;
N = 100000;

A = (1/sqrt(m)) .* randn(m, N);

% Sparse signal (with uniform distribution of non-zeros)
x = [sign(rand(k,1) - 0.5); zeros(N-k,1)];
x = x(randperm(N));

% Generate Measurements
y = A*x;
tstart = tic;
% Recover x using AMP
x1 = reconstructAmp(A, y, T, tol);
t_elapsed_AMP = toc(tstart)
sigma_min = 0.00004;
sdf=0.95;
tstart = tic;
x2= SL0(A, y, sigma_min,sdf);
t_elapsed_SL0 = toc(tstart)
figure(1)
plot(x1,'*')
figure(2)
plot(x2,'o')

t_elapsed_AMP =

3.2762

t_elapsed_SL0 =

172.2139
Thanks Jort.

## Wednesday, June 29, 2011

### Multicore LASSO

Danny Bickson let me know the following about a recent blog entry:
Today we released very efficient multicore code for computing LASSO / sprase logistic regression, following our ICML paper. The code is called via a matlab wrapper.

Let us note the paper uses a battery of tests coming from compressive sensing. Thanks Danny! The project's page is here.

## Tuesday, June 28, 2011

### Open Problems in Data Streams, Property Testing, and Related Topics

I just received an email and it's for you!

Hi Igor,

.....We were wondering if we could ask you for a favor. As you know, there was a workshop on sublinear algorithms organized in Bertinoro about a month ago. During the workshop we came up with a list of open problems in the area. The list is here:

and the relevant blog article is here:

http://polylogblog.wordpress.com/2011/06/14/open-problems-the-bertinoro-and-kanpur-lists/

We could certainly use insights and help from other (related) fields to solve the problems. Would it be possible for you to post the link on your blog ? This would help reaching out.

Thanks!

Thanks guys! Reading through the question list, there are indeed numerous questions related in some shape or another to Compressive sensing. Check it out and RT on the Twitter.

Credit: NASA/JPL/Space Science InstituteN00173198.jpg was taken on June 25, 2011 and received on Earth June 25, 2011. The camera was pointing toward TITAN at approximately 2,239,177 kilometers away, and the image was taken using the CL1 and CB3 filters.

## Monday, June 27, 2011

### Jim Gray's Search

Joe Hellerstein has a blog entry on his recent CACM paper on Jim Gray's search. Other entries related to this search can be found here.

### Spring-Summer School on Random matrices - Stochastic geometry - Compressed sensing Presentations

The organizers of the Spring-Summer School on Random matrices - Stochastic geometry - Compressed sensing at IHP last week, have made available the slides of their two speakers:

Thanks Djalil Chafaï, Alain Pajor, Alexandre Tsybakov and Christiane Lafargue.

## Sunday, June 26, 2011

### SPARS11

Evidently, if you are at SPARS11 and want to report on the meeting, you are welcome to send Nuit Blanche your blurb. You can also use the #spars11 tag on the Twitter. In the meantime, you may want to check the fantastic book of abstracts

Lots of good times to be had in Edinburgh!

## Friday, June 24, 2011

### Compressive Sensing Literature for the Week-End.

Here are three CS papers for the weekend and an intriguing paper on BCI utilization with no calibration period. Enjoy!

Sub-Nyquist Sampling: Bridging Theory and Practice by Moshe Mishali, Yonina Eldar. The abstract reads:
Sampling theory encompasses all aspects related to the conversion of continuous-time signals to discrete streams of numbers. The famous Shannon-Nyquist theorem has become a landmark in the development of digital signal processing. In modern applications, an increasingly number of functions is being pushed forward to sophisticated software algorithms, leaving only those delicate finely-tuned tasks for the circuit level.
In this paper, we review sampling strategies which target reduction of the ADC rate below Nyquist. Our survey covers classic works from the early 50's of the previous century through recent publications from the past several years. The prime focus is bridging theory and practice, that is to pinpoint the potential of sub-Nyquist strategies to emerge from the math to the hardware. In that spirit, we integrate contemporary theoretical viewpoints, which study signal modeling in a union of subspaces, together with a taste of practical aspects, namely how the avant-garde modalities boil down to concrete signal processing systems. Our hope is that this presentation style will attract the interest of both researchers and engineers in the hope of promoting the sub-Nyquist premise into practical applications, and encouraging further research into this exciting new frontier.

OFDM pilot allocation for sparse channel estimation by Pooria Pakrooh, Arash Amini, Farrokh Marvasti. The abstract reads:
In communication systems, efficient use of the spectrum is an indispensable concern. Recently the use of compressed sensing for the purpose of estimating Orthogonal Frequency Division Multiplexing (OFDM) sparse multipath channels has been proposed to decrease the transmitted overhead in form of the pilot subcarriers which are essential for channel estimation. In this paper, we investigate the problem of deterministic pilot allocation in OFDM systems. The method is based on minimizing the coherence of the submatrix of the unitary Discrete Fourier Transform (DFT) matrix associated with the pilot subcarriers. Unlike the usual case of equidistant pilot subcarriers, we show that non-uniform patterns based on cyclic difference sets are optimal. In cases where there are no difference sets, we perform a greedy search method for finding a suboptimal solution. We also investigate the performance of the recovery methods such as Orthogonal Matching Pursuit (OMP) and Iterative Method with Adaptive Thresholding (IMAT) for estimation of the channel taps.

Tight Measurement Bounds for Exact Recovery of Structured Sparse Signals by Nikhil Rao, Benjamin Recht, Robert Nowak. The abstract reads:
Standard compressive sensing results state that to exactly recover an s sparse signal in R^p, one requires O(s\cdotlog p) measurements. While this bound is extremely useful in practice, often real world signals are not only sparse, but also exhibit structure in the sparsity pattern. We focus on group-structured patterns in this paper. Under this model, groups of signal coefficients are active (or inactive) together. The groups are predefined, but the particular set of groups that are active (i.e., in the signal support) must be learned from measurements. We show that exploiting knowledge of groups can further reduce the number of measurements required for exact signal recovery, and derive near optimal bounds for the same. The number of measurements needed only depends on the number of groups under consideration, and not the particulars of the groups (e.g., compositions, sizes, extents, overlaps, etc.). The results are also shown to predict experimental performance quite well.

A Subject-Independent Brain-Computer Interface based on Smoothed, Second-Order Baselining by Boris Reuderink, Jason Farquhar, Mannes Poel, Anton Nijholt.

Image Credit: NASA/JPL/Space Science Institute
N00172964.jpg was taken on June 22, 2011 and received on Earth June 23, 2011. The camera was pointing toward TITAN at approximately 817,420 kilometers away, and the image was taken using the CL1 and CB3 filters. This image has not been validated or calibrated.

## Thursday, June 23, 2011

### What's Next ?

Here is a short story of a venture capitalist talking about The Making of Lytro, a company that wants to sell light field cameras. The main selling aspect is take a picture now and worry later. We know of other technologies that can even provide low cost modulation but it does not seem to have caught the imaging scene by storm.

In the imaging world, I know of two companies that are developing something based on compressive sensing .i.e. use a different types of modulation than the one (refractive) developed by Lytro to investigate niche markets:

Inview in particular is developing products based on technology developed at Rice where the modulation is performed with a DMD as in the single pixel camera. Works seems to be in the Infrared region where pixels are expensive. Applied Quantum Technologies seems to be focused on coded masks as developed in the CASSI system by the folks at Duke. That technology is showing interesting applications in hyperspectral imaging. Are there any larger markets that could be served with these compressive imaging technologies ?

That brings me to the next question: when I take a shot, do I generally worry about refocusing ? what do you worry about ?

On a totally different subject, Danny Bickson with his graphlab algorithm is making it to 4th place on the KDD Yahoo! contest. For those of you interested in using GraphLab (a way of performing parallel computation on many different cores), here is a small tutorial on Jacobi method in GraphLab. It would seem that one could think of using this technique for solving very large matrices when using SL0 for instance since the most expensive part of that algorithm is an SVD computation if I recall. Definitely something sparseman should consider if Google is interested in his Exacycle request

## Wednesday, June 22, 2011

### Convex Geometry, Stoichiometry and Combinatorial Chemistry

While on Twitter, I came across this arxiv entry on approaching chemistry through linear algebra: Convex Geometry and Stoichiometry by Jer-Chin (Luke) Chuang. The abstract reads:

We demonstrate the benefits of a convex geometric perspective for questions on chemical stoichiometry. We show that the balancing of chemical equations, the use of "mixtures" to explain multiple stoichiometry, and the half-reaction for balancing redox actions all yield nice convex geometric interpretations. We also relate some natural questions on reaction mechanisms with the enumeration of lattice points in polytopes. Lastly, it is known that a given reaction mechanism imposes linear constraints on observed stoichiometries. We consider the inverse question of deducing reaction mechanism consistent with a given set of linear stoichiometric restrictions.

I see many things here that I did not appreciate before about chemistry. With the lens of compressive sensing, I see underdetermined systems, larger than rank one nullspaces ( a good thing when looking for sparsest solutions), polytopes and convex geometry. And then there is this issue of combinatorial chemistry where I  wonder aloud whether if my initial dislike of chemistry is not grounded in rules and conventions that maybe are the sparsest solution of a combinatorial problem. Maybe it's time to talk to a chemist ?

### This Week in Compressive Sensing

Namrata Vaswani just let me know of her recent paper and code:

This work studies the recursive robust principal components' analysis (PCA) problem. Here, "robust" refers to robustness to both independent and correlated sparse outliers, although we focus on the latter. A key application where this problem occurs is in video surveillance where the goal is to separate a slowly changing background from moving foreground objects on-the-fly. The background sequence is well modeled as lying in a low dimensional subspace, that can gradually change over time, while the moving foreground objects constitute the correlated sparse outliers. In this and many other applications, the foreground is an outlier for PCA but is actually the "signal of interest" for the application; where as the background is the corruption or noise. Thus our problem can also be interpreted as one of recursively recovering a time sequence of sparse signals in the presence of large but spatially correlated noise.
This work has two key contributions. First, we provide a new way of looking at this problem and show how a key part of our solution strategy involves solving a noisy compressive sensing (CS) problem. Second, we show how we can utilize the correlation of the outliers to our advantage in order to even deal with very large support sized outliers. The main idea is as follows. The correlation model applied to the previous support estimate helps predict the current support. This prediction serves as "partial support knowledge" for solving the modified-CS problem instead of CS. The support estimate of the modified-CS reconstruction is, in turn, used to update the correlation model parameters using a Kalman filter (or any adaptive filter). We call the resulting approach "support-predicted modified-CS".
The Recursive Projected Compressive Sensing code is available and hosted on this website. The main idea of the approach reads:

Main idea:
This work studies the recursive robust principal components' analysis (PCA) problem. Here, robust" refers to robustness to both independent and correlated sparse outliers, although we focus on the latter. A key application where this problem occurs is in video surveillance where the goal is to separate a slowly changing background from moving foreground objects on-the-fly. The background sequence is well modeled as lying in a low dimensional subspace, that can gradually change over time, while the moving foreground objects constitute the correlated sparse outliers. In this and many other applications, the foreground is an outlier for PCA but is actually the signal of interest" for the application; where as the background is the corruption or noise. Thus our problem can also be interpreted as one of recursively recovering a time sequence of sparse signals in the presence of large but spatially correlated noise.

This work has two key contributions. First, we provide a new way of looking at this problem and show how a key part of our solution strategy involves solving a noisy compressive sensing (CS) problem. Second, we show how we can utilize the correlation of the outliers to our advantage in order to even deal with very large support sized outliers. The main idea is as follows. The correlation model applied to the previous support estimate helps predict the current support. This prediction serves as partial support knowledge" for solving the modified-CS problem instead of CS. The support estimate of the modified-CS reconstruction is, in turn, used to update the correlation model parameters using a Kalman filter (or any adaptive filter). We call the resulting approach support-predicted modified-CS".

From the paper, I note the following excerpt:
In this work we misuse terminology a little and use “compressive sensing” or “CS” to refer to the ℓ1 minimization problem.

Thanks you for acknowledging this. While we are on the L1 regularization subject, did you know that the Youtube Video editor enabled Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths, well you do now. You can learn more here on the Google Research blog and test it here.

Focusing back on comrpressive sensing, we have the following papers:

Efficient Incremental Analysis of On-Chip Power Grid via Sparse Approximation by Pei Sun and Xin Li, Ming-Yuan Ting. The abstract reads:

In this paper, a new sparse approximation technique is proposed for incremental power grid analysis. Our proposed method is motivated by the observation that when a power grid network is locally updated during circuit design, its response changes locally and, hence, the incremental “change” of the power grid voltage is almost zero at many internal nodes, resulting in a unique sparse pattern. An efficient Orthogonal Matching Pursuit (OMP) algorithm is adopted to solve the proposed sparse approximation problem. In addition, several numerical techniques are proposed to improve the numerical stability of the proposed solver, while simultaneously maintaining its high efficiency. Several industrial circuit examples demonstrate that when applied to incremental power grid analysis, our proposed approach achieves up to 130 runtime speed-up over the traditional Algebraic Multi-Grid (AMG) method, without surrendering any accuracy.

Matrix Co-Factorization on Compressed Sensing by Jiho Yoo and Seungjin Choi. The abstract reads:
In this paper we address the problem of matrix factorization on compressively-sampled measurements which are obtained by random projections. While this approach improves the scalability of matrix factorization, its performance is not satisfactory. We present a matrix co-factorization method where compressed measurements and a small number of uncompressed measurements are jointly decomposed, sharing a factor matrix. We evaluate the performance of three matrix factorization methods in terms of Cram´er-Rao bounds, including: (1) matrix factorization on uncompressed data (MF); (2) matrix factorization on compressed data (CSMF); (3) matrix co-factorization on compressed and uncompressed data (CS-MCF). Numerical experiments demonstrate that CS-MCF improves the performance of CS-MF, emphasizing the useful behavior of exploiting side information (a small number of uncompressed measurements).

Ramsey theory reveals the conditions when sparse coding on subsampled data is unique by Christopher J. Hillar, Friedrich T. Sommer. The abstract reads:

Sparse coding or dictionary learning has been widely used to reveal the sparse underlying structure of many kinds of sensory data. A related advance in signal processing is compressed sensing, a theory explaining how sparse data can be subsampled below the Nyquist-Shannon limit and then efficiently recovered from these subsamples. Here we study whether the conditions for recovery in compressed sensing are sufficient for dictionary learning to discover the original sparse causes of subsampled data. Using combinatorial Ramsey theory, we completely characterize when the learned dictionary matrix and sparse representations of subsampled data are unique (up to the natural equivalences of permutation and scaling). Surprisingly, uniqueness is shown to hold without any assumptions on the learned dictionaries or inferred sparse codes. Our result has implications for the learning of overcomplete dictionaries from subsampled data and has potential applications in data analysis and neuroscience. For instance, it identifies sparse coding as a possible learning mechanism for establishing lossless communication through severe bottlenecks, which might explain how different brain regions communicate through axonal fiber projections.

In this paper, we consider the problem of compressed sensing where the goal is to recover almost all the sparse vectors using a small number of fixed linear measurements. For this problem, we propose a novel partial hard-thresholding operator that leads to a general family of iterative algorithms. While one extreme of the family yields well known hard thresholding algorithms like ITI (Iterative Thresholding with Inversion) and HTP (Hard Thresholding Pursuit), the other end of the spectrum leads to a novel algorithm that we call Orthogonal Matching Pursuit with Replacement (OMPR). OMPR, like the classic greedy algorithm OMP, adds exactly one coordinate to the support at each iteration, based on the correlation with the current residual. However, unlike OMP, OMPR also removes one coordinate from the support. This simple change allows us to prove that OMPR has the best known guarantees for sparse recovery in terms of the Restricted Isometry Property (a condition on the measurement matrix). In contrast, OMP is known to have very weak performance guarantees under RIP. Given its simple structure, we are able to extend OMPR using locality sensitive hashing to get OMPR-Hash, the first provably sub-linear (in dimensionality) algorithm for sparse recovery. Our proof techniques are novel and flexible enough to also permit the tightest known analysis of popular iterative algorithms such as CoSaMP and Subspace Pursuit. We provide experimental results on large problems providing recovery for vectors of size up to million dimensions. We demonstrate that for large-scale problems our proposed methods are more robust and faster than existing methods.

Total Variation Minimization Based Compressive Wideband Spectrum Sensing for Cognitive Radios by Yipeng Liu, Qun Wan. The abstract reads:
Wideband spectrum sensing is a critical component of a functioning cognitive radio system. Its major challenge is the too high sampling rate requirement. Compressive sensing (CS) promises to be able to deal with it. Nearly all the current CS based compressive wideband spectrum sensing methods exploit only the frequency sparsity to perform. Motivated by the achievement of a fast and robust detection of the wideband spectrum change, total variation mnimization is incorporated to exploit the temporal and frequency structure information to enhance the sparse level. As a sparser vector is obtained, the spectrum sensing period would be shorten and sensing accuracy would be enhanced. Both theoretical evaluation and numerical experiments can demonstrate the performance improvement.

Efficient Two-Stage Group Testing Algorithms for DNA Screening by Michael Huber. The abstract reads:
Group testing algorithms are very useful tools for DNA library screening. Building on recent work by Levenshtein (2003) and Tonchev (2008), we construct in this paper new infinite classes of combinatorial structures, the existence of which are essential for attaining the minimum number of individual tests at the second stage of a two-stage disjunctive testing algorithm.

Sparse spike train deconvolution is a classical inverse problem which gave rise to many deterministic and stochastic algorithms since the mid-80’s. In the past decade, sparse approximation has been an intensive ﬁeld of research, leading to the development of a number of algorithms including greedy strategies and convex relaxation methods. Spike train deconvolution can be seen as a speciﬁc sparse approximation problem, where the observation matrix contains highly correlated columns and where the focus is set on the exact recovery of the spike locations. The objective of this paper is to evaluate the performance of algorithms proposed in both ﬁelds in terms of detection statistics, with Monte-Carlo simulations of spike deconvolution problems.

Image Credit: NASA/JPL/Space Science Institute, N00172883.jpg was taken on June 18, 2011 and received on Earth June 20, 2011. The camera was pointing toward HELENE, and the image was taken using the CL1 and CL2 filters.

## Tuesday, June 21, 2011

### Two Solvers: R1Magic and YALL1 Group

I just came across a CS solver written in R as shown in this presentation: Compressive Sampling with R: A Tutorial by Mehmet Suzen. The R1Magic package is here. A reference manual is here.

YALL1 now include a group sparsity solving capability. Let us look at what he can do. From the webpage:

YALL1 package now includes:
• YALL1 Basic
• , a solver for sparse reconstruction: Version 1.3, Apr 07, 2011.
• YALL1 Group
• , a solver for group/joint sparse reconstruction: Version 1.0, June 09, 2011. Wiki
YALL1 Basic
solves the following L1-minimization problems:
(BP) min ||Wx||w,1 s.t. Ax = b
(L1/L1) min ||Wx||w,1 + (1/ν)||Ax - b||1 (L1/L2) min ||Wx||w,1 + (1/2ρ)||Ax - b||22 (L1/L2con) min ||Wx||w,1, s.t. ||Ax - b||2 <= δ
(BP+) min ||x||w,1 s.t. Ax = b and x >= 0
(L1/L1+) min ||x||w,1 + (1/ν)||Ax - b||1 s.t. x >= 0
(L1/L2+) min ||x||w,1 + (1/2ρ)||Ax - b||22 s.t. x >= 0
(L1/L2con+) min ||x||w,1, s.t. ||Ax - b||2 <= δ, x >= 0
where
• A
•  is an m-by-n matrix with m << n,
• the solution x (or its representation Wx) is supposed to be (approximately) sparse,
• the data and solution can be real or complex, (If complex, then no non-negativity constraint is allowed)
• a unitary sparsifying basis W is optional,
• the 1-norm can be optionally weighted by a nonnegative vector w.
Go to discussions and Q&As.
Supported Features
• Multiple types of A
• explicit matrix
• ensembles of fast transforms such as FFT, DCT, wavelets
• Both real and complex data
• Both sparse and compressible signals
• Non-negative signals
YALL1 Group
The group sparsity code solves the following model
(GroupBP) min sumi wi ||x_gi||2 s.t. Ax = b
where g1, g2, … are groups of coordinates and w1,w2, … are their weights.
Joint sparsity is a special case of group sparsity for recovering X = [x1,x2, ..., xl] where the xi‘s share a common sparse support.
(JointBP) min sumi wi ||Xi,:||2 s.t. AX = B.
Supported Features
• Multiple types of A
• explicit matrix
• ensembles of fast transforms such as FFT, DCT, wavelets
• general function handle
• Groups can overlap
• The union of groups does not need to cover all coordinates
• Easy to modify for the support of complex numbers and your signals
Contributors
Yin Zhang*, Wei Deng, Junfeng Yang, and Wotao Yin.
* The original author of YALL1 (beta 1 – 6).
Tech report
YALL1 Basic: Alternating Direction Algorithms for L1 Problems in Compressive Sensing, Rice CAAM Report TR09-37, 2009.
YALL1 Group/Joint Sparsity: Group Sparse Optimization by Alternating Direction Method, Rice CAAM Report TR11-06, 2011.