Nuit Blanche: CS: This week longest entry.

Monday, February 22, 2010

CS: This week longest entry.

Here is this week's long post. I mentioned to you that I am really interested in a compressive sensing EEG system and it looks like I am not the only one. Last case in point is Compressive sampling of EEG signals with finite rate of innovation by Kok-Kiong Poh, and Pina Marziliano. The abstract reads:

Analyses of biomedical signals such as electroencephalogram and subsequent diagnoses can only be done effectively on long term recordings. Furthermore, the morphology of these signals must be preserved in the recordings. Therefore, a reliable, accurate and efficient compression and reconstruction technique is necessary to store and retrieve such data. Currently, electroencephalographic signals are obtained at Nyquist rate or higher, which introduces ’extras’ - redundancies and irrelevancies - in the signals. Existing compression methods then remove these ’extras’, thereby achieving compression. In this article, we propose a compression scheme based on a sampling theory developed for signals with a finite rate of innovation. A signal acquisition scheme which compresses electroencephalographic signals during acquisition is described. The signals can then be effectively represented by a small set of Fourier coefficients corresponding to the signals’ rate of innovation. Using the new sampling theory, we can reconstruct the original signals using this small set of Fourier coefficients. Our scheme will be presented by firstly modelling the electroencephalographic signals as signals with finite rate of innovation and then sampling them at their rate of innovation. We examined our method on 3 sets of electroencephalographic signals comprising 72 hours of recording and present results based on factors commonly used in compression literature such as compression ratio, percent root difference and root mean square error. Metrics which are of greater significance to biomedical signals such as the cross correlation, maximum error and the peak amplitude related error criteria are also measured to study the morphological relations between the original and reconstructed electroencephalographic signals. It is observed that the proposed method achieves results comparable to that of wavelet compression methods, in terms of low reconstruction errors while preserving the morphological information of the electroencephalographic signals. More importantly, it introduces a new framework to acquire electroencephalographic signals at their rate of innovation, thus entailing a less costly low-rate sampling device that does not waste precious computational resources.

Compressive nonstationary spectral estimation using parsimonious randomsampling of the ambiguity function by Alexander Jung, Georg Tauböck, and Franz Hlawatsch. The abstract reads:

We propose a compressive estimator for the discrete Rihaczek spectrum (RS) of a time-frequency sparse, underspread, nonstationary random process. The new estimator uses a compressed sensing technique to achieve a reduction of the number of measurements. The measurements are randomly located samples of the ambiguity function of the observed signal. We provide a bound on the mean-square estimation error and demonstrate the performance of the estimator by means of simulation results. The proposed RS estimator can also be used for estimating the Wigner-Ville spectrum (WVS) since for an underspread process the RS and WVS are almost equal.

Information-theoretic limits on sparse signal recovery: Dense versus sparse measurement matrices by Wei Wang, Martin J. Wainwright, Kannan Ramchandran. The abstract of the technical report reads:

We study the information-theoretic limits of exactly recovering the support of a sparse signal using noisy projections defined by various classes of measurement matrices. Our analysis is high-dimensional in nature, in which the number of observations n, the ambient signal dimension p, and the signal sparsity k are all allowed to tend to infinity in a general manner. This paper makes two novel contributions. First, we provide sharper necessary conditions for exact support recovery using general (non-Gaussian) dense measurement matrices. Combined with previously known sufficient conditions, this result yields sharp characterizations of when the optimal decoder can recover a signal for various scalings of the sparsity k and sample size n including the important special case of linear sparsity (k = (p)) using a linear scaling of observations (n = (p)). Our second contribution is to prove necessary conditions on the number of observations n required for asymptotically reliable recovery using a class of -sparsified measurement matrices, where the measurement sparsity (n, p, k) 2 (0, 1] corresponds to the fraction of non-zero entries per row. Our analysis allows general scaling of the quadruplet (n, p, k, ), and reveals three different regimes, corresponding to whether measurement sparsity has no effect, a minor effect, or a dramatic effect on the information-theoretic limits of the subset recovery problem.

Fast Bayesian Matching Pursuit: Model Uncertainty and Parameter Estimation for Sparse Linear Models by Philipp Schniter, Lee Potter, and Justin Ziniel. The abstract reads:

A low-complexity recursive procedure is presented for model selection and minimum mean squared error (MMSE) estimation in linear regression. Emphasis is given to the case of a sparse parameter vector and fewer observations than unknown parameters. A Gaussian mixture is chosen as the prior on the unknown parameter vector. The algorithm returns both a set of high posterior probability models and an approximate MMSE estimate of the parameter vector. Exact ratios of posterior probabilities serve to reveal potential ambiguity among multiple candidate solutions that are ambiguous due to observation noise or correlation among columns in the regressor matrix. Algorithm complexity is O(MNK), with M observations, N coefficients, and K nonzero coefficients. For the case that hyperparameters are unknown, an approximate maximum likelihood estimator is proposed based on the generalized expectation-maximization algorithm. Numerical simulations demonstrate estimation performance and illustrate the distinctions between MMSE estimation and maximum a posteriori probability model selection.

Video rate spectral imaging using a coded aperture snapshot spectral imager by Ashwin Wagadarikar, Nikos Pitsianis, Xiaobai Sun, and David Brady. The abstract reads:

We have previously reported on coded aperture snapshot spectral imagers (CASSI) that can capture a full frame spectral image in a snapshot. Here we describe the use of CASSI for spectral imaging of a dynamic scene at video rate. We describe significant advances in the design of the optical system, system calibration procedures and reconstruction method. The new optical system uses a double Amici prism to achieve an in-line, direct view configuration, resulting in a substantial improvement in image quality. We describe NeAREst, an algorithm for estimating the instantaneous three-dimensional spatio-spectral data cube from CASSI’s two-dimensional array of encoded and compressed measurements. We utilize CASSI’s snapshot ability to demonstrate a spectral image video of multi-colored candles with live flames captured at 30 frames per second.

From Wotao Yin, we have some new slides about the Bregman Methods

We also have: Error bounds for random matrix approximation schemes by Alex Gittens and Joel Tropp. The abstract reads:

Randomized matrix sparsification has proven to be a fruitful technique for producing faster algorithms in applications ranging from graph partitioning to semidefinite programming. In the decade or so of research into this technique, the focus has been--with few exceptions--on ensuring the quality of approximation in the spectral and Frobenius norms. For certain graph algorithms, however, the $(\infty,1)$ norm may be a more natural measure of performance. This paper addresses the problem of approximating a real matrix A by a sparse random matrix X with respect to several norms. It provides the first results on approximation error in the $(\infty, 1)$ and $(\infty, 2)$ norms, and it uses a result of Latala to study approximation error in the spectral norm. These bounds hold for random sparsification schemes which ensure that the entries of X are independent and average to the corresponding entries of A. Optimality of the $(\infty, 1)$ and $(\infty,2)$ error estimates is established. Concentration results for the three norms hold when the entries of X are uniformly bounded. The spectral error bound is used to predict the performance of several sparsification and quantization schemes that have appeared in the literature; the results are competitive with the performance guarantees given by earlier scheme-specific analyses.

Algorithmic linear dimension reduction in the ell-1 norm for sparse vectors (and attendant commentary), by Anna Gilbert, Martin Strauss, Joel Tropp, Roman Vershynin. The abstract reads:

We can recover approximately a sparse signal with limited noise, i.e, a vector of length d with at least d − m zeros or near-zeros, using little more than mlog(d) nonadaptive linear measurements rather than the d measurements needed to recover an arbitrary signal of length d. Several research communities are interested in techniques for measuring and recovering such signals and a variety of approaches have been proposed. We focus on two important properties of such algorithms.
• Uniformity. A single measurement matrix should work simultaneously for all signals.
• Computational Efficiency. The time to recover such an m-sparse signal should be close to
the obvious lower bound, mlog(d/m).
To date, algorithms for signal recovery that provide a uniform measurement matrix with approximately the optimal number of measurements, such as first proposed by Donoho and his collaborators, and, separately, by Cand`es and Tao, are based on linear programming and require time poly(d) instead of mpolylog(d). On the other hand, fast decoding algorithms to date from the Theoretical Computer Science and Database communities fail with probability at least 1/ poly(d), whereas we need failure probability no more than around 1/dm to achieve a uniform failure guarantee. This paper develops a new method for recovering m-sparse signals that is simultaneously uniform and quick. We present a reconstruction algorithm whose run time, O(mlog2(m) log2(d)), is sublinear in the length d of the signal. The reconstruction error is within a logarithmic factor (in m) of the optimal m-term approximation error in `1. In particular, the algorithm recovers m-sparse signals perfectly and noisy signals are recovered with polylogarithmic distortion. Our algorithm makes O(mlog2(d)) measurements, which is within a logarithmic factor of optimal. We also present a small-space implementation of the algorithm. These sketching techniques and the corresponding reconstruction algorithms provide an algorithmic dimension reduction in the `1 norm. In particular, vectors of support m in dimension d can be linearly embedded into O(mlog2 d) dimensions with polylogarithmic distortion. We can reconstruct a vector from its low-dimensional sketch in time O(mlog2(m) log2(d)). Furthermore, this reconstruction is stable and robust under small perturbations.

poolMC : Smart pooling of mRNA samples in microarray experiments. (Supp. Material here) by Raghu Kainkaryam, Angela Bruex, Anna Gilbert, Peter Woolf & John Schiefelbein.

Background: Typically, pooling of mRNA samples in microarray experiments implies mixing mRNA from several biological-replicate samples before hybridization onto a microarray chip. Here we describe an alternative smart pooling strategy in which different samples, not necessarily biological replicates, are pooled in an information theoretic efficient way. Further, each sample is tested on multiple chips, but always in pools made up of different samples. The end goal is to exploit the compressibility of microarray data to reduce the number of chips used and increase the robustness to noise in measurements.
Results: A theoretical framework to perform smart pooling of mRNA samples in microarray experiments was established and the software implementation of the pooling and decoding algorithms was developed in MATLAB. A proof-of-concept smart pooled experiment was performed using validated biological samples on commercially available gene chips.
Conclusions: The theoretical developments and experimental demonstration in this paper provide a useful starting point to investigate smart pooling of mRNA samples in microarray experiments. Important conditions for its successful implementation include linearity of measurements, sparsity in data, and large experiment size.

Sparse optimization with least-squares constraints by Michael Friedlander. The abstract reads:

The use of convex optimization for the recovery of sparse signals from incomplete or compressed data is now common practice. Motivated by the success of basis pursuit in recovering sparse vectors, new formulations have been proposed that take advantage of different types of sparsity. In this paper we propose an efficient algorithm for solving a general class of sparsifying formulations. For several common types of sparsity we provide applications, along with details on how to apply the algorithm, and experimental results.

Fast L1-Minimization Algorithms and An Application in Robust Face Recognition: A Review by Allen Yang, Arvind Ganesh, Shankar Sastry and Yi Ma. The abstract reads:

L1-minimization solves the minimum L1-norm solution to an underdetermined linear system y=Ax. It has recently received much attention, mainly motivated by the new compressive sensing theory that shows that under certain conditions an L1-minimization solution is also the sparsest solution to that system. Although classical solutions to L1-minimization have been well studied in the past, including primal-dual interior-point methods and orthogonal matching pursuit, they suffer from either expensive computational cost or insufficient estimation accuracy in many real-world, large-scale applications. In the past five years, many new algorithms have been proposed. We provide a comprehensive review of five representative approaches, namely, gradient projection, homotopy, iterative shrinkage-thresholding, proximal gradient, and alternating direction. The repository is intended to fill in a gap in the existing literature to systematically benchmark the performance of these algorithms using a consistent experimental setting. In addition, the experiment will be focused on the application of robust face recognition, where a sparse representation framework has recently been developed to recover human identities from facial images that may be affected by illumination, occlusion, and facial disguise. The paper also provides useful guidelines to practitioners working in similar fields.

There is a new workshop coming up at the Intractibility center:

“Barriers in Computational Complexity II” workshop

August 30 to September 3, 2010

Continuing the program that started with the first Barriers in Computational Complexity workshop in 2009, this workshop will focus on identifying and circumventing barriers that are preventing progress in several sub-fields of Computational Complexity.

This workshop will focus on five sub-fields that were not covered in the 2009 workshop: approximation algorithms and hardness of approximation, quantum computation, cryptography, communication and information in complexity and (geo)metric algorithms.

Organizing committee:
Michael Saks (chair), Scott Aaronson, Sanjeev Arora, Paul Beame, Moses Charikar, Shafi Goldwasser, Piotr Indyk, Yuval Ishai, Robert Tarjan, Umesh Vazirani

Tentative schedule of topics:

Monday, August 30
Approximaion Algorithms and Hardness of Approximation
Organizer: Moses Charikar
Tuesday, August 31
Quantum computation
Organizers: Scott Aaronson and Umesh Vazirani
Wednesday, September 1
Recent Advances and Challenges in Cryptography
Organizers: Shafi Goldwasser and Yuval Ishai
Thursday, September 2
Communication and information in complexity
Organizer: Paul Beame
Friday, September 3
(Geo)metric algorithms
Organizer: Piotr Indyk

And finally we have two thesis offers but they are both in French as I could not find an English
translation for them (I am not sure what that means). Here they are:

Offre de thèse: Compressed sensing pour la manipulation de bases de données multimédia volumineuses

Les contenus audiovisuels et multimédia génèrent de volumineux flux de données (audio, vidéo, texte, autres données associées, etc.). La manipulation de grandes bases de données de ce type de contenu nécessite le développement de techniques efficaces de détection, classification et apprentissage pour : segmenter les flux en séquences homogènes; les annoter en fonction des mots utilisé, de la langue, de l'identité du locuteur, et plus généralement du type de contenu; les indexer pour faciliter les requêtes et l'accès aux données, etc. En vue de développer les prochaines générations d'outils de recherche basés sur le contenu, il devient critique de réduire la complexité des tâches associées à la manipulation de volumineuses bases de données de ce type de contenu, ce qui nécessite une description concise des données préservant l'information recherchée, et des techniques d'apprentissage capables de travailler directement sur les données compressée.

Une approche émergente pour la compression de données est le nouveau paradigme du "compressed sensing", qui permet en particulier d'acquérir des données par échantillonnage bien au-dessous du taux d'échantillonnage de Shannon à condition que les données soient compressibles en un certain sens. Le principe consiste à exploiter le fait que les données à traiter, bien que de grande dimension, peuvent être décrites de façon concise à l'aide d'un petit jeu de paramètres. Concrètement, les données sont bien approchées par des combinaisons linéaires de quelques vecteurs de base appelés atomes et choisis dans une famille très redondante appelée dictionnaire [1]. On parle d'approximations parcimonieuses. Ces dernières années, des progrès important ont permis de comprendre les fondements statistiques d'algorithmes qui calculent des approximations parcimonieuses quasi-optimales. Dans des conditions contrôlées, on dispose de garanties de performance pour des algorithmes explicites de complexité limitée, notamment des approches basées sur l'optimisation convexe. Le compressed sensing en est une application émergente particulièrement prometteuse, qui permet de réduire la dimension des données par projection aléatoire en petite dimension, tout en préservant la capacité de capturer dans la projection obtenue l'essentiel de l'information contenue dans les données d'origine. La validité de cette approche découle de résultats théoriques fins en concentration de la mesure et en théorie des grandes matrices aléatoires.

L'objectif de cette thèse est d'explorer, à la fois numériquement et théoriquement, le potentiel du compressed sensing pour le traitement de grandes bases de données. Il s'agira notamment de proposer, mettre en oeuvre et évaluer des techniques d'apprentissage [2] travaillant sur une version fortement compressée d'un grand corpus de données, le but étant de remonter aux caractéristiques essentielles du corpus telles que la distribution statistique des données qui le composent. L'approche envisagée combinera des expérimentations sur des données synthétiques en petite dimension, la mise en oeuvre d'algorithmes d'optimisation en Matlab puis en C/C++, le développement d'une analyse théorique du problème, et l'évaluation de l'approche sur de grands corpus de données audiovisuelles.

Bibliographie:
[1] S. Mallat, Wavelet Tour of Signal Processing, 3ème édition: The Sparse Way. Academic Press, 2008.
[2] B. Schölkopf and A.J. Smola. Learning with Kernels. MIT Press, 2002.

Contact: Rémi Gribonval

Offre de thèse: Modélisation parcimonieuse de l'activité du cerveau pour l'imagerie cérébrale et les interfaces cerveau-machine

Les techniques d'imagerie cérébrale -qui mesurent la dynamique spatiale et temporelle de l'activité du cerveau- visent à comprendre le fonctionnement du cerveau, avec des applications fondamentales (la compréhension des mécanismes cognitifs) mais aussi médicales (observer, diagnostiquer et soigner des maladies telles que l'épilepsie) et technologiques (le développement d'interfaces cerveau-machine pour les jeux ou l'assistance aux handicapés). Les différentes modalités d'acquisition existantes ne permettent pas à ce jour d'obtenir des images à la fois bien résolues spatialement et temporellement, et certaines requièrent des équipements d'acquisition extrêmement lourds à mettre en oeuvre et coûteux.

Des études récentes, dont la thèse d'Alexandre Gramfort [1], ont montré le potentiel de techniques de modélisation parcimonieuse pour améliorer la résolution via la résolution de problèmes inverses de grande dimension, à partir de données acquises avec des équipements EEG de coût réduit. Cependant, outre la complexité algorithmique des techniques d'optimisation à mettre en oeuvre, qui demeure élevée, un important obstacle à l'utilisation pratique de ce type de techniques est sa sensibilité au choix d'un modèle parcimonieux, associé un "dictionnaire", qui soit adapté à la dynamique spatio-temporelle que l'on cherche à reconstruire tout en permettant des calculs rapides.

L'objectif de cette thèse est de s'attaquer à ces verrous afin de proposer, mettre en oeuvre et valider expérimentalement des techniques haute-résolution d'imagerie cérébrale. Il s'agira notamment de développer des techniques présentant un bon compromis résolution/complexité pour localiser des sources d'activité dans le cerveau à partir d'enregistrements EEG, en s'appuyant sur les algorithmes récents d'inversion parcimonieuse dont on cherchera à identifier et éliminer les principaux verrous de complexité. On s'intéressera également au problème de l'apprentissage automatique de dictionnaires [4] pour la modélisation parcimonieuse, et à évaluer son impact sur la résolution des résultats d'imagerie obtenue.

Ce stage se déroulera au sein de l'équipe METISS de l'IRISA, qui a développé des techniques précises de localisation et de séparation de sources [5] dans un contexte audio (sources sonores correspondant à différents locuteurs ou instruments de musique contribuant à un enregistrement) et possède une expérience reconnue en modélisation parcimonieuse. L'acquisition et le traitement de données EEG se feront notamment en collaboration avec l'équipe BUNRAKU de l'IRISA, qui possède l'équipement et l'expérience en interfaces cerveau-machine.

Bibliographie:
[1] Alexandre Gramfort, Mapping, timing and tracking cortical activations with MEG and EEG: Methods and application to human vision, These de l'ENST, 2009, http://tel.archives-ouvertes.fr/tel-00426852/fr/
[2] S. Mallat, Wavelet Tour of Signal Processing, 3ème édition: The Sparse Way. Academic Press, 2008.
[3] K. Kreutz-Delgado & al, Dictionary learning algorithms for sparse representation, Neural Comput., 2003, Vol. 15, No. 2, pp. 349--396
[4] K. Schnass & R. Gribonval, Dictionary Identification - Sparse Matrix-Factorisation via L1-Minimisation, arXiv preprint 0904.4774,April 2009.
[5] Arberet, Simon and Gribonval, Rémi and Bimbot, Frédéric, A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture, to appear in IEEE Trans. Signal Processing, 2010.

Contact: Rémi Gribonval