Monday, February 28, 2011

What Island Is Next ?

When I referred to islands of knowledge recently, I meant that research is populated with a series of communities that are themselves self sufficient. As time goes, some work provide a connection between these communities, a little bit as if water is receding and some new landscape emerges thereby showing the connection between some of these island communities.
For instance, in compressive sensing, back in the early 1970's, the L1 solvers were recognized to be sparsity promoting.... 
but it took the work of Tao, Candes, Romberg and Donoho to see the relationship between these solvers and provide admissible conditions for measurement matrices. Think of the mathematician as changing the landscape by draining the waters while the engineer sail from one island to the other. The mathematician gets a hint that there ought to be a connection between these islands as the engineer gets things working artificially connecting these communities by sea. The engineer becomes trusted by all when all see the deeper connections becoming obvious as the water receeds. 

If you are a mathematician changing the landscape of knowledge: What island is next ?
If, like a pirate, you are sailing from one island to the next:  What are you waiting for ?

PS: The concept of self sustaining communities as islands is not far from the concept of James Glieck in his new book reviewed in the New Yorker Review of Books:
 It is our task as humans to bring meaning back into this wasteland. As finite creatures who think and feel, we can create islands of meaning in the sea of information

Side view of the drawings shown above. 

Sunday, February 27, 2011

What Are You Waiting For ?

Bob Cringely reminded me of this defining moment when two DEC executives looked at a dismantled PC on their desk:
...But the most telling story about Olsen that Avram tells isn’t in his blog.  It was about how he and Ken Olsen bought one of the first IBM PCs and disassembled it on a table in Olsen’s office.
“He was amazed at the crappy power supply,” Avram said, “that it was so puny.  Olsen thought that if IBM used such poor engineering then Digital didn’t have anything to worry about.”
Clearly Olsen was wrong.
I find this little story very telling because it shows Olsen in 1980 very much stuck in the late 1960s when it came to what mattered and what didn’t in a computer.  Yes, having a robust power supply was good for computer reliability, but not as important as having a great operating system and applications. But that’s not the way Olsen saw it.  In a world that had to that point been dominated by computer companies building expensive products aimed primarily at engineers, he simply had no concept of a computer as a consumer product....

What would somebody making expensive hyperspectral imagers say about a webcam and some prisms ? what would an optical engineer say about a microscope with no lens ? Very likely something along the lines of what was said in that DEC office. I have actually asked that question to some specialists, a shrug is a nice way of framing their answers. Not to avail I like being the candid ones in these discussions. But going back to the issue of making it in the real world, one has to face the unbearable sluggishness of how things move along.  More precisely, Jerry Weinberg pointed out that

Most of the time, for most of the world, no matter how hard people work at it, nothing of significance happens.

And as much as what Hollywood wants to convince you, in the journey of maturing a technology there are few aha moments. You're in the dip and the reality of it is that most technologies take some extreme combativeness to be born in reality. More precisely, most of them don't survive the climb on the TRL scale.

When John Mankins created a scale of Technology Readiness Levels, he was mainly trying to rationalize the different investments NASA was undertaking in a wide series of technologies. Now we know that NASA funds more in the realm of TRL 3-6 for its R\&;D, while other agencies like the NSF focus on TRL 0-1 and maybe 2 (if you're lucky). John did not mention TRL 0 as it could be a placeholder for theoretical work that does not precisely hinges on an actual technology. Using this scale here is how I see the life and death of technologies and some examples related to Compressive Sensing.

All the theoretical work fits into the left hand side of the graph. In order to prove to the world that CS is real, you can always point to both the TRL 9 MRI example or the Herschel PACS camera. However, those examples are just accidents in that either the space camera was built to be underused in the first place or that in MRI you were already sampling in the right phase space. Aside from these accidents, one has to push things to the right. Most other technologies described in the hardware page go from 1 to 3-4 and some won't survive the process because they are in direct competition with other technologies. Let's take the example of the single pixel camera. One of the reasoning for using it is that the detecting piece is expensive (say IR or Teraherz sensors), some technology breakthrough could change that and make irrelevant a different scanning systems such as that enabled by compressive sensing. What is truly amazing with compressive sensing is the ability to use an extraordinary set of tools such as the many reconstruction solvers, dictionary learning and the Donoho-Tanner phase transition to see if there is a fighting chance that the technology can go the next level.  What are you waiting for ?

Saturday, February 26, 2011

CS Meetings: BASP, ICML Structured Sparsity, SADH

Yves Wiaux sent me the following about a meeting on September 4-9, 2011.

The international BASP Frontiers workshop was created to promote synergies between the Astronomical and Biomedical communities, around selected common challenges for signal processing. The 2011 workshop is organized by the EPFL Signal Processing Laboratories and will take place in September [04-09] in a very nice resort in the Swiss Alps [Villars] close to Lausanne and Lake Geneva. It will focus on Fourier Sampling for Magnetic Resonance and Radio Interferometric Imaging.
Do not hesitate to have a look at the website [BASP Frontiers website:], where you will find all required information regarding scientific context, committes, venue, keynote and invited participants [including R. Baraniuk, E. Candes, V. Goyal, T. Strohmer, and P. Vandergheynst], program, and registration [opening April 1]. If you would like to participate, please contact us [BASP email contact:] and provide information as detailed on the "invitation request page". Do not hesitate either to disseminate this announcement.
Y. Wiaux, Workshop Chair

Thanks Yves.

Mladen Kolar also sent me the following:

Dear Igor,
I've been following your blog for a while and it has been very useful for keeping me on top of ongoing research in sparsity. I believe that the following workshop may be of interest to readers of your blog -- Structured Sparsity: Learning and Inference Workshop [1]. It is organized by Francis Bach, Han Liu, Guillaume Obozinski, Eric Xing and myself. The workshop is organized as a part of the 28th International Conference on Machine Learning [2] (June 28 through July 2, 2011). Our invited speakers mainly come from machine learning and statistics community, but we also hope to attract some signal processing folks due to the commitment of prof. Volkan Cevher. Call for contributions can be found here [3].
We would highly appreciate if you could publicize the workshop.
King regards,
Thanks Mladen.

In other news, the Duke University Workshop on Sensing and Analysis of High-Dimensional Data (SAHD) is planned for July 26-28, 2011. The meeting will be held at the Washington Duke Inn and Golf Club, adjacent to the Duke campus. The meeting is being organized and hosted by the following Duke faculty: David Brady, Robert Calderbank, Lawrence Carin, Ingrid Daubechies, David Dunson, Mauro Maggioni and Rebecca Willett.

Since there have been a few. I am listing some of the meetings related to CS on this CS meetings page.

Friday, February 25, 2011

CS: Sparse Signal Recovery with Temporally Correlated Source Vectors Using Sparse Bayesian Learning, Dagstuhl presentations, JOINC 2011

Zhilin Zhang just sent me the following:
Hi, Igor,
I am a reader of your Nuit Blanche, and have benefited a lot from it. Recently I have some works on the joint sparsity recovery, or called the multiple measurement vector (MMV) model. As you know, the MMV model is an extension of the basic CS model (single measurement vector model). Most algorithms for the basic CS model can be extended for the MMV model by imposing $\ell_p$ norm on the rows of the solution matrix, e.g. $\ell_2$ and $\ell_{\infity}$. However, these norm operations are blind to the structures of each row of the solution matrix. In other words, they ignore (not exploit) the correlations of elements in each row. In my paper, I derived two algorithms exploiting these correlations and achieved much better performance. There are also some interesting phenomena in the paper (see Fig.7 and Fig.8). And I want to know if there exist any theories that can explain these.
The paper can be downloaded from the link:
Thank you for your attention. Any comments are welcome.
Best regards,

I then asked Zhilin if his codes would be released at some point in time. Zhilin kindly answered with:

... I will release the codes on my homepage once the paper is accepted. But anybody can email to me to ask for the codes before that time...

We address the sparse signal recovery problem in the context of multiple measurement vectors (MMV) when elements in each nonzero row of the solution matrix are temporally correlated. Existing algorithms do not consider such temporal correlations and thus their performance degrades significantly with the correlations. In this work, we propose a block sparse Bayesian learning framework which models the temporal correlations. In this framework we derive two sparse Bayesian learning (SBL) algorithms, which have superior recovery performance compared to existing algorithms, especially in the presence of high temporal correlations. Furthermore, our algorithms are better at handling highly underdetermined problems and require less row-sparsity on the solution matrix. We also provide analysis of the global and local minima of their cost function, and show that the SBL cost function has the very desirable property that the global minimum is at the sparsest solution to the MMV problem. Extensive experiments also provide some interesting results that motivate future theoretical research on the MMV model.

Mário Figueiredo also sent me the following:
Hi Igor,
This workshop (which took place a couple of weeks ago in Dagstuhl, Germany), may be of interest to Nuit Blanche:
The slides of many of the talks are available here:
Best regards,

The presentations related to compressed sensing are listed below:

Thanks Mário

Last year, there was a Journées Imagerie Optique Non Conventionnelle at ESPCI that showed some folks from Saint Etienne )or waa it Clermont) who were doing the same type of holographic work as above (except they did not use the sparsity of the scene). I was struck by how CS could be used for that purpose, well it looks like this is already undertaken. Now the question ishow do you change the sensor as a result.. This year, the meeting will take place at ESPCI again on March 28-29. The featured presentation will be:

« Mesure de la matrice de transmission d'un milieu complexe: application à l'imagerie », Sylvain Gigan, Institut Langevin, Paris.
« Détection et localisation de défauts dans des environnements bruités », Josselin Garnier, Université Paris VII

we mentioned these two research areas before.

Thursday, February 24, 2011

Five continents in two days

This map represents the location of the readers who came directly to the blog in the past two days or about 800 hits. Let us not forget the 962 readers getting the same news directly through their feedreaders (with about 2500 views in total or about 2.5 views per reader for 3 entries. i.e. This group of nearly a thousand readers is exposed to most entries) and finally about 360 people getting the same information directly by e-mail.

The Linkedin Group on Compressive Sensing grew by 10 people in a week and has now reached 731 members. (who's going to be the 1000th members?)  I note that I am now not the only one that asks and answers questions. This is some good news with regards to how this community of people from many different background (students, researchers to engineers at companies, theoreticians to more applied folks) is getting above a critical threshold..

If you know anybody who needs to be exposed to some of the information mentioned here, you can tell them to join the Linkedin Group, use this Feed ( ) or follow this twitter stream.

DT-Fest, Universal and Efficient Compressed Sensing Strategy through Spread Spectrum Modulation, Cosparse Analysis Modeling - Uniqueness and Algorithms, Sparse recovery for spherical harmonic expansions, Robust Matrix Completion with Corrupted Columns, the last space shuttle flight

I am slowly getting back online but I noticed a definite upsurge of papers now integrating the Donoho-Tanner phase transition (or similar) in their analysis. So today is not Deuterium-Tritium fest but the  the Donoho-Tanner phase transition fest, enjoy:

We propose a universal and efficient compressed sensing strategy based on the use of a spread spectrum technique. The method essentially consists in a random pre-modulation of the signal of interest followed by projections onto randomly selected vectors of an orthonormal basis. Firstly, we show that the effectiveness of the technique is induced by a decrease of coherence between the sparsity and the sensing bases. Secondly, we prove that the sensing scheme is universal for a family of sensing bases, for which the number of measurements needed for accurate recovery is optimal and independent of the sparsity matrix. It is also efficient as sensing matrices with fast matrix multiplication algorithms can be used. These results are confirmed experimentally through analyses of the phase transition of the ℓ1-minimization problem. Finally, we illustrate the effectiveness of the technique by a study of its application to radio interferometry and magnetic resonance imaging.

I asked the authors the following:

 Do you think the use of the DT phase transition is central in your argument/paper ?

Pierre  said:

For me yes: it is the argument we are somehow trying to optimize by getting as close as possible to the phase transition of the Dirac Fourier pair

Yves followed through with.

... and, as a sparkling example, it is nice to see that in the presence of a random pre-modulation the DT phase transition is reached even for the Fourier-Fourier pair (we mean "sparsity-sensing" bases pair)
Remi concluded with:
From a strict theoretical perspective, as far as I know the DT phase transition is only proved for Gaussian matrices. Here, it serves as a benchmark. Empirically we get very close to it using pre-modulation, even in the Fourier-Fourier pair. This is appealing in practice and also raise once again the theoretical question of the universality of the DT phase transition for L1 with a wide range of random systems.
That last fact hit me in the face back three years ago when Jared presented his finding at Texas A&M University. In one of his slides, he had included the sparse measurement matrices of Piotr et al, (mentioned here as well) I think I recall my jaws dropped while at the same thinking: Whaaaat The..... Since then it seems to have been more universal than one could think, but I can't talk about it for the moment. 
Thanks Yves Remi and Pierre.

In the past decade there has been a great interest in a synthesis-based model for signals, based on sparse and redundant representations. Such a model assumes that the signal of interest can be composed as a linear combination of few columns from a given matrix (the dictionary). An alternative analysis-based model can be envisioned, where an analysis operator multiplies the signal, leading to a cosparse outcome. In this paper, we consider this analysis model, in the context of a generic missing data problem (e.g., compressed sensing, inpainting, source separation, etc.). Our work proposes a uniqueness result for the solution of this problem, based on properties of the analysis operator and the measurement matrix. This paper also considers two pursuit algorithms for solving the missing data problem, an L1-based and a new greedy method. Our simulations demonstrate the appeal of the analysis model, and the success of the pursuit techniques presented.

Sparse recovery for spherical harmonic expansions by Holger Rauhut and Rachel Ward. The abstract reads:
We show that sparse spherical harmonic expansions can be e fficiently recovered from a small number of randomly chosen samples on the sphere. To establish the main result, we verify the restricted isometry property of an associated preconditioned random measurement matrix using
recent estimates on the uniform growth of Jacobi polynomials.

Robust Matrix Completion with Corrupted Columns by Yudong Chen, Huan Xu, Constantine Caramanis, and Sujay Sanghavi. The abstract reads:
This paper considers the problem of matrix completion, when some number of the columns are arbitrarily corrupted, potentially by a malicious adversary. It is well-known that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. What can be done if a large number, or even a constant fraction of columns are corrupted? In this paper, we study this very problem, and develop an efficient algorithm for its solution. Our results show that with a vanishing fraction of observed entries, it is nevertheless possible to succeed in performing matrix completion, even when the number of corrupted columns grows. When the number of corruptions is as high as a constant fraction of the total number of columns, we show that again exact matrix completion is possible, but in this case our algorithm requires many more – a constant fraction – of observations. One direct application comes from robust collaborative filtering. Here, some number of users are so-called manipulators, and try to skew the predictions of the algorithm. Significantly, our results hold without any assumptions on the number, locations or values of the observed entries of the manipulated columns. In particular, this means that manipulators can act in a completely adversarial manner

The last Space Shuttle launch is supposed to launch at 15h30 Texas time. Discovery's launch is ending a series of launches that started thirty years ago. You can watch this launch here.

Tuesday, February 22, 2011

SPARS11: Signal Processing with Adaptive Sparse Structured Representations

Jared Tanner sent me the following this past week:

Dear Igor,

... Earlier I emailed you about the SPARS11 conference taking place in Edinburgh from June 27th to the 30th, 2011. The webpage has just been updated to accept registration and abstract submission. Would you mind posting an update and link to the event site?  Also, if there is a calendar  of events that it could be posted on that would be superb.
The conference website is:
Important Dates:
April 1st, abstract submission (one page)
April 21st, notification for contributed talks and posters
May 15th, registration and fees due (70 GPB for students, 140 for faculty)
The Plenary speakers are:
Indeed it looks like a good event to attend!

Monday, February 21, 2011

This is not a hardware-only or solver-only problem ...

There is a gap of five years between these two shots [1] [2]:

Let me say something stupid based on nothing else than a hunch. If both images need TV-regularization to be reconstructed and that similar regularizations are used to rid one of multiplicative noise [3][4][5], then maybe, just maybe, these experiments suffer from multiplicative noise ?

Since both of them use a DMD chip, I can see several sources for that type of noise. My foray in MEMS , yielded a few facts. Among these facts was that the switching of the mirrors on the TI DMD [6] being electromechanical in nature (this is the point of MEMS) is likely to have some amount of stabilization issues. This is an issue that is unimportant to the human eye and the reason why the DMD is so successful for digital projectors, however, while tiny it contributes directly to affecting the coefficients of the measurement matrix.

Unless one does a better job of quantifying multiplicative noise on Donoho-Tanner phase transition we are bound to have the same photos as above in five years. Do we want that ? Of course not. Aren't we scared that this noise is being so bad that it will make any CS solution worthless: No surely not, it is not because the folks in the solver business are vaillantly coming up with suboptimal results stemming from the physics of the image acquisition that we cannot do anything. We want to design systems that can devise what this noise is as the image is being acquired. 

[1] Michael Wakin, Jason Laska, Marco Duarte, Dror Baron, Shriram Sarvotham, Dharmpal Takhar, Kevin Kelly, and Richard Baraniuk, An Architecture for Compressive Imaging (Proc. International Conference on Image Processing -- ICIP 2006, Atlanta, GA, Oct. 2006)
[2] Active illumination single-pixel camera based on compressive sensing, Filipe Magalhães, Francisco M. Araújo, Miguel V. Correia, Mehrdad Abolbashari, and Faramarz Farahi
[3] Multiplicative Noise Cleaning via a Variational Method Involving Curvelet Coefficients by Sylvain Durand, Jalal Fadili, and Mila Nikolova
[4] Multiplicative Noise Removal Using L1 Fidelity on Frame Coefficients by Sylvain DurandJalal Fadili,  Mila Nikolova
[5] Multiplicative Noise Removal Using Variable Splitting and Constrained Optimization, José M. Bioucas-Dias, and Mário A. T. Figueiredo.
[6] Digital Light Processing TM for High-Brightness, High-Resolution Applications, Larry J. Hornbeck

Friday, February 11, 2011

Nuit Blanche Unplugged

I am going to unplug for a while and get some fresh air around Grenoble where I'll be thinking about matrix completion, calibration, Donoho-tanner transition diagram and multiplicative noise. In the meantime, here are blogs/papers to reflect on, enjoy!:
Back in December, I asked  What was the most interesting paper on Compressive Sensing you read in 2010 ? Here is a compilation of y'alls answers:

One of you readers recently let me know of the February Fourier Talks ( being held on February 17 and 18 at the University of Maryland Norbert Wiener Center for Harmonic Analysis and Application ( you anonymnous reader.

Recent entries I'll probably be re-reading include:

And remember these series of random measurements, they rotate in a randomized carousel .....

Wednesday, February 09, 2011

The Dip

"Will sparsity ever solve an actual problem? 3 days to make up my mind ..."
It seems to me that we are in some sort of a dip as Seth Godin describes it. People ask whether compressed sensing has peaked or any of such nonsense. Here is Seth's summary of what the dip looks like:
....Every new project (or job, or hobby, or company) starts out exciting and fun. Then it gets harder and less fun, until it hits a low point-really hard, and not much fun at all. And then you find yourself asking if the goal is even worth the hassle. Maybe you're in a Dip-a temporary setback that will get better if you keep pushing. But maybe it's really a Cul-de-Sac, which will never get better, no matter how hard you try. What really sets superstars apart from everyone else is the ability to escape dead ends quickly, while staying focused and motivated when it really counts.
Winners quit fast, quit often, and quit without guilt-until they commit to beating the right Dip for the right reasons. In fact, winners seek out the Dip. They realize that the bigger the barrier, the bigger the reward for getting past it. If you can become number one in your niche, you'll get more than your fair share of profits, glory, and long-term security. Losers, on the other hand, fall into two basic traps. Either they fail to stick out the Dip-they get to the moment of truth and then give up-or they never even find the right Dip to conquer.
Some of you may feel that you are in that region of doubts. But let me give you some example as to why I think it is nonsense. Take for instance the case of hyperspectral imagers. Hyperspectral imagers are likely to drop in prices from one to two orders of magnitude as a result of implementing a CS approach. David Brady thinks it is about one order of magnitude drop, I am of the opinion that it is two. The reason you have never heard of hyperspectral imaging or that you cannot get to tinker with hyperspectral data or read a good paper on hyperspectral classification and AI related subject is because for you to do that, you need to buy yourself a $100,000 camera. A good many companies offering hyperspectral cameras as a product do not list the price of these items on their website some provide only leasing information  Make that price tag  10K$ or 1K$ and now we are talking about appealing to a much larger section of the population from hackers and makers to consumers. We are talking about creating a niche although I can only guess it might be a bigger niche than 50MP DSLR cameras aficionados that essentially resembles perfectly a low end disruption scenario as described by Clayton Christensen (from wikipedia)

Christensen distinguishes between "low-end disruption" which targets customers who do not need the full performance valued by customers at the high end of the market and "new-market disruption" which targets customers who have needs that were previously unserved by existing incumbents....In low-end disruption, the disruptor is focused initially on serving the least profitable customer, who is happy with a good enough product. This type of customer is not willing to pay premium for enhancements in product functionality. Once the disruptor has gained foot hold in this customer segment, it seeks to improve its profit margin. To get higher profit margins, the disruptor needs to enter the segment where the customer is willing to pay a little more for higher quality. To ensure this quality in its product, the disruptor needs to innovate. The incumbent will not do much to retain its share in a not so profitable segment, and will move up-market and focus on its more attractive customers. After a number of such encounters, the incumbent is squeezed into smaller markets than it was previously serving. And then finally the disruptive technology meets the demands of the most profitable segment and drives the established company out of the market.
"New market disruption" occurs when a product fits a new or emerging market segment that is not being served by existing incumbents in the industry..[6]

The reason this technology is currently in the dip stems in part because it needs to be hardened. But where are the other niches you ask ? Like any new concepts, some will yield some implementable technologies some will die. But one thing is for sure, you cannot decide which one will die until you go through the process of hardening it. This process is treacherous for anybody involved because as Jerry Weinberg reminds us, the stakes are high. After you have invested all your time in a specific idea

The thought that disaster is impossible often leads to an unthinkable disaster.

You cannot let something sink if you have spent so much capital in it. In that process however, you could  discover that it answers questions that are not being asked or to entirely different problems. For instance, Compressed Genotyping, is clearly answering a question that is being asked at the individual level but is seldom asked at a group level. If the whole population were to be affected by Tay-Sachs, I am sure it would be a question that whole society would ask and there would be clear incentives for solving it ... now. But as nobody expects the Spanish inquisition, it is likely to be a bad bet to devise an hypothetical test procedures for a future unknown disease that could decimate our civilization. And this leads me to Nick Trefethen's maxim :
If the answer is highly sensitive to perturbations, you have probably asked the wrong question.
And so when I hear about other dips ? I am thinking and wonder if  beating the diffraction limit with nothing or solving crazy problems could fit the bill of a problem worth fighting for. But then, I get reminded that it doesn't matter really because sparsity is a proxy for power laws, and if your sensor cannot take advantage of that, then maybe you have the wrong sensor or the wrong approach. Like Forrest, you should run... 

Monday, February 07, 2011

CS: The Long Post of The Day

I don't usually make a mention of patent but this one caught my attention, especially as it pertains to the company involved: Wipo Assigns Patent To Research In Motion For "sensor-based Wireless Communication Systems Using Compressive Sampling"

Today, we have a slew of papers for this long post of the day, enjoy:

Compressible Priors for High-dimensional Statistics by Rémi Gribonval, Volkan Cevher, Mike E. Davies . The abstract reads:
: We develop a principled way of identifying probability distributions whose independent and identically distributed (iid) realizations are compressible, i.e., can be approximated as sparse. We focus on the context of Gaussian random underdetermined linear regression (GULR) problems, where compressibility is known to ensure the success of estimators exploiting sparse regularization. We prove that many priors revolving around maximum a posteriori (MAP) interpretation of the $\ell^1$ sparse regularization estimator and its variants are in fact incompressible, in the limit of large problem sizes. To show this, we identify non-trivial undersampling regions in GULR where the simple least squares solution almost surely outperforms an oracle sparse solution, when the data is generated from a prior such as the Laplace distribution. We provide rules of thumb to characterize large families of compressible (respectively incompressible) priors based on their second and fourth moments. Generalized Gaussians and generalized Pareto distributions serve as running examples for concreteness.

Compressed-Sensing-Based Clutter Metric by  Cui Yang, Jie Wu, Qian Li, and Qi Zhang. The abstract reads:
The background clutter has been becoming one of the most important factors affecting the target acquisition performance of electro-optical imaging systems. A novel clutter metric based on sparse representation is proposed in this paper. Based on the sparse representation, the similarity vector is defined to describe the similarity between the background and the target in the feature domain, which is a typical feature of the background clutter. This newly proposed metric is applied to the search_2 dataset and the experiment results show that its prediction correlates well with the detection probability of observers.

A collaborative convex framework for factoring a data matrix X into a non-negative product AS, with a sparse coefficient matrix S, is proposed. We restrict the columns of the dictionary matrix A to coincide with certain columns of the data matrix X, thereby guaranteeing a physically meaningful dictionary and dimensionality reduction. We use l1;1 regularization to select the dictionary from the data and show this leads to an exact convex relaxation of l0 in the case of distinct noise free data. We also show how to relax the restriction-to-X constraint by initializing an alternating minimization approach with the solution of the convex model, obtaining a dictionary close to but not necessarily in X. We focus on applications of the proposed framework to hyperspectral end member and abundances identification and also show an application to blind source separation of NMR data.
Total variation regularization for fMRI-based prediction of behaviour by Vincent Michel, Alexandre Gramfort  Gaël Varoquaux , Evelyn Eger, Bertrand Thirion. The abstract reads:
While medical imaging typically provides massive amounts of data, the extraction of relevant information for predictive diagnosis remains a difficult challenge. Functional MRI (fMRI) data, that provide an indirect measure of task-related or spontaneous neuronal activity, are classically analyzed in a mass-univariate procedure yielding statistical parametric maps. This analysis framework disregards some important principles of brain organization: population coding, distributed and overlapping representations. Multivariate pattern analysis, i.e., the prediction of behavioural variables from brain activation patterns better captures this structure. To cope with the high dimensionality of the data, the learning method has to be regularized. However, the spatial structure of the image is not taken into account in standard regularization methods, so that the extracted features are often hard to interpret. More informative and interpretable results can be obtained with the l_1 norm of the image gradient, a.k.a. its Total Variation (TV), as regularization. We apply for the first time this method to fMRI data, and show that TV regularization is well suited to the purpose of brain mapping while being a powerful tool for brain decoding. Moreover, this article presents the first use of TV regularization for classification.
Compressed sensing with preconditioning for sparse recovery with subsampled matrices of Slepian prolate functions by Laurent Gosse. The abstract reads;
Efficient recovery of smooth functions which are $s$-sparse with respect to the base of so--called Prolate Spheroidal Wave Functions from a small number of random sampling points is considered. The main ingredient in the design of both the algorithms we propose here consists in establishing a uniform $L^\infty$ bound on the measurement ensembles which constitute the columns of the sensing matrix. Such a bound provides us with the Restricted Isometry Property for this rectangular random matrix, which leads to either the exact recovery property or the ``best $s$-term approximation" of the original signal by means of the $\ell^1$ minimization program. The first algorithm considers only a restricted number of columns for which the $L^\infty$ holds as a consequence of the fact that eigenvalues of the Bergman's restriction operator are close to 1 whereas the second one allows for a wider system of PSWF by taking advantage of a preconditioning technique. Numerical examples are spread throughout the text to illustrate the results.

Sparsity Driven People Localization with a Heterogeneous Network of Cameras by Alexandre Alahi, Laurent Jacques, Yannick Boursier and Pierre Vandergheynst. The abstract
This paper addresses the problem of localizing people in low and high density crowds with a network of heterogeneous cameras. The problem is recasted as a linear inverse problem. It relies on deducing the discretized occupancy vector of people on the ground, from the noisy binary silhouettes observed as foreground pixels in each camera. This inverse problem is regularized by imposing a sparse occupancy vector, i.e. made of few non-zero elements, while a particular dictionary of silhouettes linearly maps these non-empty grid locations to the multiple silhouettes viewed by the cameras network. The proposed framework is (i) generic to any scene of people, i.e. people are located in low and high density crowds, (ii) scalable to any number of cameras and already working with a single camera, (iii) unconstraint on the scene surface to be monitored, and (iv) versatile with respect to the camera’s geometry, e.g. planar or omnidirectional. Qualitative and quantitative results are presented on the APIDIS and the PETS 2009 Benchmark datasets. The proposed algorithm successfully detects people occluding each other given severely degraded extracted features, while outperforming state-of-the-art people localization techniques.

Recent attention on performance analysis of singleuser multiple-input multiple-output (MIMO) systems has been on understanding the impact of the spatial correlation model on ergodic capacity. In most of these works, it is assumed that the statistical degrees of freedom (DoF) in the channel can be captured by decomposing it along a suitable eigen-basis and that the transmitter has perfect knowledge of the statistical DoF. With an increased interest in large-antenna systems in state-of-theart technologies, these implicit channel modeling assumptions in the literature have to be revisited. In particular, multi-antenna measurements have showed that large-antenna systems are sparse where only a few DoF are dominant enough to contribute towards capacity. Thus, in this work, it is assumed that the transmitter can only afford to learn the dominant statistical DoF in the channel. The focus is on understanding ergodic capacity scaling laws in sparse channels. Unlike classical results, where linear capacity scaling is implicit, sparsity of MIMO channels coupled with a knowledge of only the dominant DoF is shown to result in a new paradigm of sub-linear capacity scaling that is consistent with experimental results and physical arguments. It is also shown that uniform-power signaling over all the antenna dimensions is wasteful and could result in a significant penalty over optimally adapting the antenna spacings in response to the sparsity level of the channel and transmit SNR.

This paper addresses the problem of inferring sparse causal networks modeled by multivariate auto-regressive (MAR) processes. Conditions are derived under which the Group Lasso (gLasso) procedure consistently estimates sparse network structure. The key condition involves a “false connection score”  . In particular, we show that consistent recovery is possible even when the number of observations of the network is far less than the number of parameters describing the network, provided that   < 1. The false connection score is also demonstrated to be a useful metric of recovery in non-asymptotic regimes. The conditions suggest a modified gLasso procedure which tends to improve the false connection score and reduce the chances of reversing the direction of causal influence. Computational experiments and a real network based electrocorticogram (ECoG) simulation study demonstrate the effectiveness of the approach.

Abstract. This note demonstrates that it is possible to bound the expectation of an arbitrary norm of a random matrix drawn from the Stiefel manifold in terms of the expected norm of a standard Gaussian matrix with the same size. A related comparison holds for any convex function of a random matrix drawn from the Stiefel manifold. For certain norms, a reversed inequality is also valid.

Statistical Compressed Sensing of Gaussian Mixture Models by Guoshen Yu, Guillermo Sapiro. The abstract reads:
A novel framework of compressed sensing, namely statistical compressed sensing (SCS), that aims at efficiently sampling a collection of signals that follow a statistical distribution, and achieving accurate reconstruction on average, is introduced. SCS based on Gaussian models is investigated in depth. For signals that follow a single Gaussian model, with Gaussian or Bernoulli sensing matrices of O(k) measurements, considerably smaller than the O(k log(N/k)) required by conventional CS based on sparse models, where N is the signal dimension, and with an optimal decoder implemented via linear filtering, significantly faster than the pursuit decoders applied in conventional CS, the error of SCS is shown tightly upper bounded by a constant times the best k-term approximation error, with overwhelming probability. The failure probability is also significantly smaller than that of conventional sparsity-oriented CS. Stronger yet simpler results further show that for any sensing matrix, the error of Gaussian SCS is upper bounded by a constant times the best k-term approximation with probability one, and the bound constant can be efficiently calculated. For Gaussian mixture models (GMMs), that assume multiple Gaussian distributions and that each signal follows one of them with an unknown index, a piecewise linear estimator is introduced to decode SCS. The accuracy of model selection, at the heart of the piecewise linear decoder, is analyzed in terms of the properties of the Gaussian distributions and the number of sensing measurements. A maximum a posteriori expectation-maximization algorithm that iteratively estimates the Gaussian models parameters, the signals model selection, and decodes the signals, is presented for GMM-based SCS. In real image sensing applications, GMM-based SCS is shown to lead to improved results compared to conventional CS, at a considerably lower computational cost.
Since its emergence, compressive sensing (CS) has attracted many researchers' attention. In the CS, recovery algorithms play an important role. Basis pursuit (BP) and matching pursuit (MP) are two major classes of CS recovery algorithms. However, both BP and MP are originally designed for one-dimensional (1D) sparse signal recovery, while many practical signals are two-dimensional (2D), e.g. image, video, etc. To recover 2D sparse signals effectively, this paper develops the 2D orthogonal MP (2D-OMP) algorithm, which shares the advantages of low complexity and good performance. The 2D-OMP algorithm can be widely used in those scenarios involving 2D sparse signal processing, e.g. image/video compression, compressive imaging, etc.
Recursive $\ell_{1,\infty}$ Group lasso by Yilun Chen, Alfred O. Hero III. The abstract reads:
We introduce a recursive adaptive group lasso algorithm for real-time penalized least squares prediction that produces a time sequence of optimal sparse predictor coefficient vectors. At each time index the proposed algorithm computes an exact update of the optimal $\ell_{1,\infty}$-penalized recursive least squares (RLS) predictor. Each update minimizes a convex but nondifferentiable function optimization problem. We develop an online homotopy method to reduce the computational complexity. Numerical simulations demonstrate that the proposed algorithm outperforms the $\ell_1$ regularized RLS algorithm for a group sparse system identification problem and has lower implementation complexity than direct group lasso solvers.

Active Sensing and Learning by Rui Castro and Robert Nowak. The introduction reads:
Consider the problem of estimating a signal from noisy samples. The conventional approach (e.g., Shannon-Nyquist sampling) is to sample at many locations in a non-adaptive and more-or-less uniform manner. For example, in a digital camera we collect samples at locations on a square lattice (each sample being a pixel). Under certain scenarios though there is an extra exibility, and an alternative and more versatile approach is possible: choose the sample locations `on-the-y', depending on the information collected up to that time. This is what we call adaptive sampling, as opposed to the conventional approach, referred to here as passive sampling. Although intuitively very appealing, adaptive sampling is seldom used because of the difficulty of design and analysis of such feedback techniques, especially in complex settings. The topic of adaptive sampling, or active learning as it is sometimes called, has attracted signi cant attention from various research communities, in particular in the elds of computer science and statistics. A large body of work exists proposing algorithmic ideas and methods [1, 2, 3, 4, 5], but unfortunately there are few performance guarantees for many of those methods. Further most of those results take place in very special or restricted scenarios (e.g., absence of noise or uncertainty, yielding perfect decisions). Under the adaptive sampling framework there are a few interesting theoretical results, some of which are presented here, namely the pioneering work of [6] regarding the estimation of step functions, that was later rediscovered in [7] using di erent algorithmic ideas and tools. Building on some of those ideas, the work in [8, 9, 10, 11] provides performance guarantees for function estimation under noisy conditions, for several function classes that are particularly relevant to signal processing and analysis. In this chapter we provide an introduction to adaptive sampling techniques for signal estimation, both in parametric and non-parametric settings. Note that the scenarios we consider do not have the Markovian structure inherent to the Markov Decision Processes (MDPs), that are the topic of many chapters in this book, and that the `actions' (sample locations, or whether or not to collect a sample) do not a ect the environment. Another major di erence between the active learning problems considered and the MDPs is that, in the former, the set of possible actions/sample locations is generally uncountable.

We consider spatially coupled code ensembles. A particular instance are convolutional LDPC ensembles. It was recently shown that, for transmission over the memoryless binary erasure channel, this coupling increases the belief propagation threshold of the ensemble to the maximum a-posteriori threshold of the underlying component ensemble. This paved the way for a new class of capacity achieving low-density parity check codes. It was also shown empirically that the same threshold saturation occurs when we consider transmission over general binary input memoryless channels. In this work, we report on empirical evidence which suggests that the same phenomenon also occurs when transmission takes place over a class of channels with memory. This is confirmed both by simulations as well as by computing EXIT curves.
Gelfand and Kolmogorov numbers of embeddings in weighted function spaces by Shun Zhang, Gensun Fang.. The abstract reads;
In this paper we study the Gelfand and Kolmogorov numbers of compact embeddings between weighted function spaces of Besov and Triebel-Lizorkin type with polynomial weights. The sharp estimates are determined in the non-limiting case where the quasi-Banach setting is included.
Widths of embeddings in weighted function spaces by Shun Zhang, Gensun Fang. The abstract reads;
We mainly consider the asymptotic behaviour of the approximation, Gelfand and Kolmogorov numbers of the compact embeddings between weighted Besov spaces with a general class of weights. We determine the exact estimates in almost all nonlimiting cases where the quasi-Banach setting is included. At the end we present complete results on related widths in the case of polynomial weights with small perturbations, in particular the exact estimates for the case $\alpha = d/{p^*}>0$ therein.

We present an uncertainty relation for the representation of signals in two different general (possibly redundant or incomplete) signal sets. This uncertainty relation is relevant for the analysis of signals containing two distinct features each of which can be described sparsely in a suitable general signal set. Furthermore, the new uncertainty relation is shown to lead to improved sparsity thresholds for recovery of signals that are sparse in general dictionaries. Specifically, our results improve on the well-known $(1+1/d)/2$-threshold for dictionaries with coherence $d$ by up to a factor of two. Furthermore, we provide probabilistic recovery guarantees for pairs of general dictionaries that also allow us to understand which parts of a general dictionary one needs to randomize over to "weed out" the sparsity patterns that prohibit breaking the square-root bottleneck.

Credit: NASA/JPL/University of Arizona/USGS, Gully with Bright Deposits, Mars.