Wednesday, September 01, 2010

CS: Almost Lossless Analog Compression, Optimal Subspace Clustering Models, Sharp oracle inequalities for the prediction of a high-dimensional matrix, and more.

I found the following papers a little bit by accident:

In Shannon theory, lossless source coding deals with the optimal compression of discrete sources. Compressed sensing is a lossless coding strategy for analog sources by means of multiplication by real-valued matrices. In this paper we study almost lossless analog compression for analog memoryless sources in an information-theoretic framework, in which the compressor or decompressor is constrained by various regularity conditions, in particular linearity of the compressor and Lipschitz continuity of the decompressor. The fundamental limit is shown to be the information dimension proposed by Rényi in 1959.

The following three preprints just showed up on Arxiv:
Given a set of vectors $\F=\{f_1,\dots,f_m\}$ in a Hilbert space $\HH$, and given a family $\CC$ of closed subspaces of $\HH$, the {\it subspace clustering problem} consists in finding a union of subspaces in $\CC$ that best approximates (models) the data $\F$. This problem has applications and connections to many areas of mathematics, computer science and engineering such as the Generalized Principle Component Analysis (GPCA), learning theory, compressed sensing, and sampling with finite rate of innovation. In this paper, we characterize families of subspaces $\CC$ for which such a best approximation exists. In finite dimensions the characterization is in terms of the convex hull of an augmented set $\CC^+$. In infinite dimensions however, the characterization is in terms of a new but related notion of contact hull. As an application, the existence of best approximations from $\pi(G)$-invariant families $\CC$ of unitary representations of abelian groups is derived.

We observe $(X_i,Y_i)_{i=1}^n$ where the $Y_i$'s are real valued outputs and the $X_i$'s are $m\times T$ matrices. We observe a new entry $X$ and we want to predict the output $Y$ associated with it. We focus on the high-dimensional setting, where $m T \gg n$. This includes the matrix completion problem with noise, as well as other problems. We consider linear prediction procedures based on different penalizations, involving a mixture of several norms: the nuclear norm, the Frobenius norm and the $\ell_1$-norm. For these procedures, we prove sharp oracle inequalities, using a statistical learning theory point of view. A surprising fact in our results is that the rates of convergence do not depend on $m$ and $T$ directly. The analysis is conducted without the usually considered incoherency condition on the unknown matrix or restricted isometry condition on the sampling operator. Moreover, our results are the first to give for this problem an analysis of penalization (such nuclear norm penalization) as a regularization algorithm: our oracle inequalities prove that these procedures have a prediction accuracy close to the deterministic oracle one, given that the reguralization parameters are well-chosen.
Here is a paper that we don't see everyday in the CS literature:

Rapid Simultaneous Mapping of Total and Myelin Water Content, T1 and T2* in Multiple Sclerosis by Volker Arhelger, Felix Kasten, Detlef Gliedstein, Marie-Sofie Lafontaine, Vyara Tonkova, Dietrich Holz, Andreas Böer, Jochen Schenk, Heiko Neeb. The abstract reads:
Quantitative magnetic resonance imaging might provide a more specific insight into disease process, progression and therapeutic response of multiple sclerosis. We present an extension of a previously published approach for the simultaneous mapping of brain T1, T2* and total water content. In addition to those three parameters, the method presented in the current work allows for the measurement of myelin bound water content, a surrogate marker of tissue myelination. Myelin water was measured based on its distinct relaxation with reduced T2*, resulting in a multiexponential decay signal. However, only 10 points could be acquired on the relaxation curve within a maximum echo time of less than 40ms as the quantitative protocol has been adapted previously for fast acquisitions with whole brain coverage. The sparse sampling required an adaption of the optimisation approach with additional constraints necessary in order to obtain reliable results. Therefore, the corresponding pool fractions were determined using linear optimisation instead of the standard nonnegative least squares (NNLS). The whole approach including the proper choice of constraints was optimised and validated in simulation studies. Furthermore, the independent determination of total water content allows for an absolute quantification of myelin water content, resulting in a more reliable measurement in oedemateous tissue. Whole brain maps of T1, T2*, total and myelin water content were acquired in 12 patients suffering from multiple sclerosis with mild disease grade. With an acquisition time of 10 minutes only, the presented multidimensional quantitative MRI protocol provides an interesting option for the clinical assessment and monitoring of multiple sclerosis, even on a routine basis.

Of related interest, we also have:

Spontaneous brain activity, as observed in functional neuroimaging, has been shown to display reproducible structure that expresses brain architecture and carries markers of brain pathologies. An important view of modern neuroscience is that such large-scale structure of coherent activity reflects modularity properties of brain connectivity graphs. However, to date, there has been no demonstration that the limited and noisy data available in spontaneous activity observations could be used to learn full-brain probabilistic models that generalize to new data. Learning such models entails two main challenges: i) modeling full brain connectivity is a difficult estimation problem that faces the curse of dimensionality and ii) variability between subjects, coupled with the variability of functional signals between experimental runs, makes the use of multiple datasets challenging. We describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population. We show that individual models learned from functional Magnetic Resonance Imaging (fMRI) data using this population prior generalize better to unseen data than models based on alternative regularization schemes. To our knowledge, this is the first report of a cross-validated model of spontaneous brain activity. Finally, we use the estimated graphical model to explore the large-scale characteristics of functional architecture and show for the first time that known cognitive networks appear as the integrated communities of functional connectivity graph.

Finally, here is the presentation slides of Mark Iwen for his defense of a while back. And there is an International Workshop on Approximation Techniques in Data Analysis on September 2-3, 2010 at University of Alberta, Edmonton, Canada.

Morning Session, September 2, 2010 Chairman: Rong-Qing Jia
9:00–9:50: Qianshun Chang
A Robust and Fast Combination Algorithms of Split Bregman Method for Deblurring and Denoising
9:50–10:40: Jianzhong Wang
Construction of Local Nonlinear Kernel in Image Restoration
10:40–11:30: Jianfeng Cai
Singular Value Thresholding Algorithms for Low-rank Matrix Completion
Afternoon Session, September 2, 2010 Chairman: Jianzhong Wang
2:00–2:50: Bin Han
Nonhomogeneous Wavelet Systems and Frequency-based Framelets
2:50–3:40: Jun Xian
Local Sampling and Reconstruction in Shift-invariant Spaces and Their Applications in Spline Subspaces
Morning Session, September 3, 2010 Chairman: Bin Han
9:00–9:50: Rong-Qing Jia
Relaxation Methods for Image Denoising Based on Difference Schemes
9:50–10:40: Song Li
Some Results of `p Minimization Problems for 0 p 1 in Compressive Sampling

No comments: