Friday, October 29, 2010

CS: CS-ECG, ParallelCamp, Une Competition, UPS Delivers Optimal Phase Diagram in High Dimensional Variable Selection, Diffuse Optical Tomography, Exploiting Statistical Dependencies

I was at another meeting last week, the Parallel Camp  a specific instance of a Barcamp, a networking event of sorts on a specific subject. Here the meeting was about parallelism and mostly GPUs. To get some conversation going, I made this presentation entitled: Parallel Computing for Compressive Sensing: What is it Good For ?. The public was pretty sparse but focused as I was competing with a much larger session. One of the fascinating things was that I kept on talking about sensors and how CS could be affecting them and how,  I  believe, parallel  computing fitted right into this new paradigm. Yet some of the people in the audience eventually asked about what a sensor was. My take from this is that sensors are always taken for granted: i.e. nobody wants to be Superman and have X-ray vision  anymore:-) because everybody is convinced that people working for CSI can do anything. On the other hand, the power level of GPUs sure makes the issue of reducing power at the sensor level a little quaint issue. Of note, Eric Mahé of MassiveRand, one of the organizers of the meeting, pitched a GPU based solution that spits out random numbers that can never be reproduced. Other discussions included some of the folks involved in Finance and the need to get good Random Numbers out of current technology (an issue Christian Robert just talked about recently on his blog). Most of the answers on the subject were pretty much summarized in the comment of that blog entry. Also I did not realize this but a GPU can currently spit out 6 Gbit of random numbers per second for a memory on the card reaching 6 GB. The PCI Express link between the CPU and the GPU seems to really be the main bottleneck as far as I could tell. The viewpoint of Compressive Sensing tells me that in order to use GPUs at their full extent, maybe we ought to be looking at how the 6Gb/s of RNG can be used on the GPU or on peripherals like the CRT/LCDs that have direct access to this flood of data. On a different note, you may notice in the presentation one of my new subject of interest...

Emily Allstot sent me the following:

I am an undergraduate Electrical Engineering student at the University of Washington, doing research related involving compressed sensing. I also have been a fan of your blog for a while.

For once, I actually have some interesting things to share!

Our department has invited Emmanuel Candes to give a colloquium on December 7, 2010, and anyone in the Seattle area who is interested can come. Videos will also be posted afterward, at the link.

Also, next week is the BioCAS 2010 conference in Paphos, Cyprus. I will be presenting my very first publication during the BioSignal Processing lecture session. The paper is titled, "Compressive Sampling of ECG Bio-Signals: Quantization Noise and Sparsity Considerations" by Emily Allstot, Andrew Y. Chen, Anna M.R. Dixon, Daibashish Gangopadhyay, and David J. Allstot (Dept. of Electrical Engineering, Univ. of Washington). This paper is part of the groundwork for integrated circuit implementations of CS that are being designed by the PhD students in our group.

This is the abstract:

"Compressed sensing (CS) is an emerging signal processing paradigm that enables the sub-Nyquist processing of sparse signals; i.e., signals with significant redundancy. Electrocardiogram (ECG) signals show significant time-domain sparsity that can be exploited using CS techniques to reduce energy consumption in an adaptive data acquisition scheme. A measurement matrix of random values is central to CS computation. Signal-to-quantization noise ratio (SQNR) results with ECG signals show that 5- and 6-bit Gaussian random coefficients are sufficient for compression factors up to 6X and from 8X-16X, respectively, whereas 6-bit uniform random coefficients are needed for 2X-16X compression ratios. "
Thanks Emily. I look forward to see how some of these encodings (quantized Gaussian random) perform compared to the expander graph approach of the EPFL folks.

In a different direction,  Matthijs den Besten is organizing a new "Compressive Sesnsing" competition not unlike that the one Matlab ran a while back. It's in French. and called "Le grand concours de programmation MATLAB". There are prizes associated with winning the big prize or any of the intermediary prizes. Bonne Chance!

Here is a new paper investigating the Donoho-Tanner phase transition from a statistics viewpoint with a new algorithm: UPS Delivers Optimal Phase Diagram in High Dimensional Variable Selection by Pengsheng Ji, Jiashun Jin. The abstract reads:
Consider linear regression in the so-called regime of p much larger than n. We propose the UPS as a new variable selection method. This is a Screen and Clean procedure [Wasserman and Roeder (2009)], in which we screen with the Univariate thresholding, and clean with the Penalized MLE. In many situations, the UPS possesses two important properties: Sure Screening and Separable After Screening (SAS). These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. We measure the performance of variable selection procedure by the Hamming distance. In many situations, we find that the UPS achieves the optimal rate of convergence, and also yields an optimal partition of the so-called phase diagram. In the two-dimensional phase space calibrated by the signal sparsity and signal strength, there is a three-phase diagram shared by many choices of design matrices. In the first phase, it is possible to recover all signals. In the second phase, exact recovery is impossible, but it is possible to recover most of the signals. In the third phase, successful variable selection is impossible. The UPS is able to recover most of the signals as long as successful variable selection is possible. The lasso and the subset selection are well-known approaches to variable selection. However, somewhat surprisingly, there are regions in the phase space where neither of them is rate optimal, even in very simple cases. The lasso is non-optimal for it is too loose on false positives, and the subset selection is non-optimal for it is too harsh on true positive clusters.
The figures are using a different set of variables than the ones we are accustomed to in Donoho-Tanner phase transition, but I found the following Rosetta stone description enlightening (yes statisticians speak Greek to me, not that there is anything wrong with speaking Greek!). From the text:
....The lasso and the subset selection (also known as the L1- and L0-penalization methods, respectively) are well-known approaches to variable selection. However, somewhat surprisingly, there are regions in the phase space where neither the lasso nor the subset selection is rate optimal, even for very simple. The lasso is non-optimal because it is too loose in filtering out fake signals (i.e. noise that is highly correlated with a signal), and the subset selection is non-optimal because it tends to kill one or more signals when signals appear in pairs, triplets, etc..
Similarly, Sergey points me to this arxiv preprint which made a passing reference to CS: Theory and Applications of Robust Optimization by Dimitris Bertsimas, David B. Brown, Constantine Caramanis. The abstract reads:
In this paper we survey the primary research, both theoretical and applied, in the area of Robust Optimization (RO). Our focus is on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying prominent theoretical results of RO, we also present some recent results linking RO to adaptable models for multi-stage decision-making problems. Finally, we highlight applications of RO across a wide spectrum of domains, including finance, statistics, learning, and various areas of engineering.
Reading the paper yields to this other paper I had mentioned back in April:
which makes a statement about Robust Linear Regression which in our world translates into multiplicative noise. More Rosetta Stone moments....In the meatime, you might also be interested in the NIPS 2010 Workshop, entitled Robust Statistical learning (robustml):

Checking the Google and Arixv, I finally got the following two papers:

Diffuse optical tomography (DOT) allows tomographic (3D), non-invasive reconstructions of tissue optical properties for biomedical applications. Severe under-sampling is a common problem in DOT which leads to image artifacts. A large number of measurements is needed in order to minimize these artifacts. In this work, we introduce a compressed sensing (CS) framework for DOT which enables improved reconstructions with under-sampled data. The CS framework uses a sparsifying basis, ℓ1-regularization and random sampling to reduce the number of measurements that are needed to achieve a certain accuracy. We demonstrate the utility of the CS framework using numerical simulations. The CS results show improved DOT results in comparison to “traditional” linear reconstruction methods based on singular-value decomposition (SVD) with ℓ2-regularization and with regular and random sampling. Furthermore, CS is shown to be more robust against the reduction of measurements in comparison to the other methods. Potential benefits and shortcomings of the CS approach in the context of DOT are discussed.

Signal modeling lies at the core of numerous signal and image processing applications. A recent approach that has drawn considerable attention is sparse representation modeling, in which the signal is assumed to be generated as a combination of a few atoms from a given dictionary. In this work we consider a Bayesian setting and go beyond the classic assumption of independence between the atoms. The main goal of this paper is to introduce a statistical model that takes such dependencies into account and show how this model can be used for sparse signal recovery. We follow the suggestion of two recent works and assume that the sparsity pattern is modeled by a Boltzmann machine, a commonly used graphical model. We show that for general dependency models, exact MAP estimation of the sparse representation becomes computationally complex. To simplify the computations, we propose a greedy approximation for the MAP estimator. We then consider a special case where exact MAP is feasible, by assuming that the dictionary is unitary and the dependency model corresponds to a certain sparse graph. Exploiting this structure, we develop an efficient message-passing algorithm that recovers the underlying signal. The effectiveness of our developed pursuit methods is demonstrated on synthetic signals, where we compare the denoising performance to that of previous recovery methods that do not exploit the statistical dependencies. Finally, when the model parameters defining the underlying graph are unknown, we suggest an algorithm that learns these parameters directly from the data, leading to an iterative scheme for adaptive sparse signal recovery.

No comments: