Tuesday, July 08, 2008

CS: MMDS papers, a CS page, A2I Converters, CS in Space.

I just stumbled upon the MMDS 2008. Workshop on Algorithms for Modern Massive Data Sets that took place at Stanford University on June 25–28, 2008. All the abstracts are located here. All the presentations are here. Among the ones that caught my interest:

Piotr Indyk has a presentation entitled Sparse recovery using sparse random matrices/ Or: Fast and Effective Linear Compression where he presents a joint work with: Radu Berinde, Anna Gilbert, Piotr Indyk, Howard Karloff, Milan Ruzic and Martin Strauss that was partially shown in Anna Gilbert's presentation at the L1 meeting at Texas A&M). The interesting new part of this presentation is the recent result with Milan Ruzic where it is shown that if a measurement matrix follows RIP-1, then OMP converges. The previous results was on LP-decoding only working. The abstract reads:

Over the recent years, a new *linear* method for compressing high-dimensional data (e.g., images) has been discovered. For any high-dimensional vector x, its *sketch* is equal to Ax, where A is an m x n matrix (possibly chosen at random). Although typically the sketch length m is much smaller than the number of dimensions n, the sketch contains enough information to recover an *approximation* to x. At the same time, the linearity of the sketching method is very convenient for many applications, such as data stream computing and compressed sensing.

The major sketching approaches can be classified as either combinatorial (using sparse sketching matrices) or geometric (using dense sketching matrices). They achieve different trade-offs, notably between the compression rate and the running time. Thus, it is desirable to understand the connections between them, with the ultimate goal of obtaining the "best of both worlds" solution.

In this talk we show that, in a sense, the combinatorial and geometric approaches are based on different manifestations of the same phenomenon. This connection will enable us to obtain several novel algorithms and constructions, which inherit advantages of sparse matrices, such as lower sketching and recovery times.

Also, Anna Gilbert had a presentation entitled Combinatorial Group Testing in Signal Recovery. The abstract reads:

Traditionally, group testing is a design problem. The goal is to construct an optimally efficient set of tests of items such that the test results contains enough information to determine a small subset of items of interest. It has its roots in the statistics community and was originally designed for the Selective Service to find and to remove men with syphilis from the draft. It appears in many forms, including coin-weighing problems, experimental designs, and public health. We are interested in both the design of tests and the design of an efficient algorithm that works with the tests to determine the group of interest. Many of the same techniques that are useful for designing tests are also used to solve algorithmic problems in analyzing and in recovering statistical quantities from streaming data. I will discuss some of these methods and briefly discuss several recent applications in high throughput drug screening.
Joint work with Radu Berinde, Piotr Indyk, Howard Karloff, Martin Strauss, Raghu Kainkaryam and Peter Woolf.

By the way, it's not a rat, it's a Chef.

Tong Zhang, An Adaptive Forward/Backward Greedy Algorithm for Learning Sparse Representations (the technical report is here: Forward-Backward Greedy Algorithm for Learning Sparse Representations).

Consider linear least squares regression where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. This problem is NP-hard. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever necessary. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory.

Nir Ailon, Efficient Dimension Reduction. The abstract reads:
The Johnson-Lindenstrauss dimension reduction idea using random projections was discovered in the early 80's. Since then many "computer science friendly" versions were published, offering constant factor but no big-Oh improvements in the runtime. Two years ago Ailon and Chazelle showed a nontrivial algorithm with the first asymptotic improvement, and suggested the question: What is the exact complexity of J-L computation from d dimensions to k dimensions? An O(d log d) upper bound is implied by A-C for k up to d^{1/3} (in the L2 to L2) case. In this talk I will show how to achieve this bound for k up to d^{1/2} combining techniques from the theories of error correcting codes and probability in Banach spaces. This is based on joint work with Edo Liberty.

Yoram Singer, Efficient Projection Algorithms for Learning Sparse Representations from High Dimensional Data.
Many machine learning tasks can be cast as constrained optimization problems. The talk focuses on efficient algorithms for learning tasks which are cast as optimization problems subject to L1 and hyper-box constraints. The end result are typically sparse and accurate models. We start with an overview of existing projection algorithms onto the simplex. We then describe a linear time projection for dense input spaces. Last, we describe a new efficient projection algorithm for very high dimensional spaces. We demonstrate the merits of the algorithm in experiments with
large scale image and text classification.

Kenneth Clarkson, Tighter Bounds for Random Projections of Manifolds (this is the report). The abstract reads:
The Johnson-Lindenstrauss random projection lemma gives a simple way to reduce the dimensionality of a set of points while approximately preserving their pairwise Euclidean distances. The most direct application of the lemma applies to a finite set of points, but recent work has extended the technique to affine subspaces, smooth manifolds, and sets of bounded doubling dimension; in each case, a projection to a sufficiently large dimension k implies that all pairwise distances are approximately preserved with high probability. Here the case of random projection of a smooth manifold (submanifold of R^m) is considered, and a previous analysis is sharpened, giving an upper bound for k that depends on the surface area, total absolute curvature, and a few other quantities associated with the manifold, and in particular does not depend on the ambient dimension m or the reach of the manifold.

Sanjoy Dasgupta, Random Projection Trees and Low Dimensional Manifolds. The abstract reads:
The curse of dimensionality has traditionally been the bane of nonparametric statistics (histograms, kernel density estimation, nearest neighbor search, and so on), as reflected in running times and convergence rates that are exponentially bad in the dimension. This problem is all the more pressing as data sets get increasingly high dimensional. Recently the field has been rejuvenated substantially, in part by the realization that a lot of real-world data which appears high-dimensional in fact has low "intrinsic" dimension in the sense of lying close to a low-dimensional manifold. In the past few years, there has been a huge interest in learning such manifolds from data, and then using the learned structure to transform the data into a lower dimensional space where standard statistical methods generically work better.

I'll exhibit a way to benefit from intrinsic low dimensionality without having to go to the trouble of explicitly learning its fine structure. Specifically, I'll present a simple variant of the ubiquitous k-d tree (a spatial data structure widely used in machine learning and statistics) that is provably adaptive to low dimensional structure. We call this a "random projection tree" (RP tree).

Along the way, I'll discuss different notions of intrinsic dimension -- motivated by manifolds, by local statistics, and by analysis on metric spaces -- and relate them. I'll then prove that RP trees require resources that depend only on these dimensions rather than the dimension of the space in which the data happens to be situated. This is work with Yoav Freund (UC San Diego).

Lars Kai Hansen, Generalization in High-Dimensional Matrix Factorization. The abstract reads:
While the generalization performance of high-dimensional principal component analysis is quite well understood, matrix factorizations like independent component analysis, non-negative matrix factorization, and clustering are less investigated for generalizability. I will review theoretical results for PCA and heuristics used to improve PCA test performance, and discuss extensions to high-dimensional ICA, NMF, and clustering.

Holly Jin (at LinkedIn !), Exploring Sparse NonNegative Matrix Factorization. The abstract reads:
We explore the use of basis pursuit denoising (BPDN) for sparse nonnegative matrix factorization (sparse NMF). A matrix A is approximated by low-rank factors UDV', where U and V are sparse with unit-norm columns, and D is diagonal. We use an active-set BPDN solver with warm starts to compute the rows of U and V in turn. (Friedlander and Hatz have developed a similar formulation for both matrices and tensors.) We present computational results and discuss the benefits of sparse NMF for some real matrix applications. This is joint work with Michael Saunders.

Compressed Counting and Stable Random Projections by Ping Li. The abstract reads:
The method of stable random projections has become a popular tool for dimension reduction, in particular, for efficiently computing pairwise distances in massive high-dimensional data (including dynamic streaming data) matrices, with many applications in data mining and machine learning such as clustering, nearest neighbors, kernel methods etc.. Closely related to stable random projections, Compressed Counting (CC) is recently developed to efficiently compute Lp frequency moments of a single dynamic data stream. CC exhibits a dramatic improvement over stable random projections when p is about 1. Applications of CC include estimating entropy moments of data streams and statistical parameter estimations in dynamic data using low memory.

Ronald Coifman, Diffusion Geometries and Harmonic Analysis on Data Sets. The abstract reads:
We discuss the emergence of self organization of data, either through eigenvectors of affinity operators, or equivalently through a multiscale ontology. We illustrate these ideas on images and audio files as well as molecular dynamics.

Thomas Blumensath and Mike Davies have set up a larger web page on Compressed Sensing and their attendant research in that field.

This one is a little old but it just came out on my radar screen and it is not on the Rice page yet. It features an A2I paper by Sami Kirolos, Tamer Ragheb, Jason Laska, Marco F. Duarte, Yehia Massoud and Richard Baraniuk entitled Practical Issues in Implementing Analog-to-Information Converters. The abstract reads:

The stability and programmability of digital signal processing systems has motivated engineers to move the analog-to-digital conversion (ADC) process closer and closer to the front end of many signal processing systems in order to perform as much processing as possible in the digital domain. Unfortunately, many important applications, including radar and communication systems, involve wideband signals that seriously stress modern ADCs; sampling these signals above the Nyquist rate is in some cases challenging and in others impossible. While wideband signals by definition have a large bandwidth, often the amount of information they carry per second is much lower; that is, they are compressible in some sense. The first contribution of this paper is a new framework for wideband signal acquisition purpose-built for compressible signals that enables sub-Nyquist data acquisition via an analog-to-information converter (AIC). The framework is based on the recently developed theory of compressive sensitng in which a small number of non-adaptive, randomized measurements are sufficient to reconstruct compressible signals. The second contribution of this paper is an AIC implementation design and study of the tradeoffs and nonidealities introduced by real hardware. The goal is to identify and optimize the parameters that dominate the overall system performance.
Finally, CS in Space

The eighth annual NASA Earth Science Technology Conference (ESTC2008) was held June 24-26. 2008, at the University of Maryland University College and showcased a wide array of technology research and development related to NASA's Earth science endeavors. The papers are here. Of particular interest is this presentation:

Novel Distributed Wavelet Transforms and Routing Algorithms for Efficient Data Gathering in Sensor Webs. Antonio Ortega, G. Shen S. Lee S.W. Lee S. Pattem A. Tu B. Krishnamachari, M. Cheng S. Dolinar A. Kiely M. Klimesh, H. Xie
on page 13, there is : Compressed Sensing for Sensor Networks.

The GRETSI conference occured a year ago. This is one of the papers (in French with an English abstract) entitled Quelques Applications du Compressed Sensing en Astronomie, David Mary and Olivier Michel. The abstract reads:
We investigate in this communication how recently introduced “ Compressed Sensing” methods can be applied to some important problems in observational Astronomy. The mathematical background is first outlined. Some examples are then described in stellar variability and in image reconstruction for space-based observations. We finally illustrate the interest of such techniques for direct imaging through stellar interferometry.

Nous proposons dans cette communication d'évaluer le potentiel des méthodes de « Compressed Sensing » récemment introduites, à travers leur application à quelques problèmes importants en astrophysique observationelle. Après avoir rapidement décrit les bases mathématiques de ces approches, des exemples sont développés dans la cadre de la variabilité stellaire, de la reconstruction d'images satellitaire et enfin dans le cadre plus prospectif des futures possibilités offertes par les grands projets d'interféromètres à multiples bases pour l'imagerie directe dans le plan de Fourier.