Saturday, July 30, 2011

New challenges for PET and Cone-Beam CT with hybrid pixels

While CT is covered in the compressive sensing literature, one of the most interesting aspect of CS is how it changes the parameter space especially if there is an improvement in the hardware. The following presentation shows if/how such improvement may be dealt with new techniques emerging out of compressive sensing:

Wednesday, July 27, 2011

Finding Structure with Randomness

The other day, Cable and I were looking for the PCA of a very large matrix. We compared the randomized approach to a direct one... it was not even funny. After four hours on one of the latest Mac, the direct SVD was still clogging all the processes while I was done in less than 20 seconds on a small PC. Today, the question is, when are we going to have a Random-LAPACK ?

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

For reminder:

Tuesday, July 26, 2011

The Duke Workshop on Sensing and Analysis of High-Dimensional Data (SAHD) starts today

The Duke Workshop on Sensing and Analysis of High-Dimensional Data (SAHD) should start in a few hours. Here is the list of abstracts:

Regularization with variance-mean mixtures, James ScottWe provide a data-augmentation scheme that unifies many common regularized estimators into a single class.  This leads to simple algorithms based on iterative least squares for fitting models involving arbitrary combinations of likelihood and penalty functions within the class.  The class itself is quite large: for example, it includes quantile regression, support vector machines, logistic and multinomial logistic regression, topic models, and autologistic models, along with the usual ridge regression, lasso, bridge, and Student-t models.  To arrive at this unified framework, we represent a wide class of objective functions as variance-mean mixtures of Gaussians involving both the likelihood and penalty functions.  This generalizes existing theory based solely on variance mixtures for the penalty function, and allows the theory of conditionally normal linear models to be brought to bear on a much wider class of models.  We focus on two possible choices of the mixing measures: the generalized inverse-Gaussian and Polya distributions, leading to the hyperbolic and Z distributions, respectively. We exploit this conditional normality to find sparse, regularized estimates using a new algorithm called tilted, iteratively re-weighted conjugate gradient.  This algorithm is efficient, easily parallelizable, and offers a block update in situations where previous algorithms necessitate coordinate-wise updates.  It also offers an exact MAP approach for estimating many models, including topic models, that have typically been fit using variational EM.
A SURE Estimator for a Heteroscedastic Hierarchical Model, Lawrence D. Brown. Hierarchical models are extensively studied and widely used in statistics and many other scientific areas. They provide an effective tool for combining information from similar resources and achieving partial pooling of inference. Since the seminal work by James and Stein (1961) and Stein (1962), shrinkage estimation has become one major focus for hierarchical models. For the homoscedastic normal model, it is well known that shrinkage estimators, especially the James-Stein estimator, have good risk properties. The heteroscedastic model, though more appropriate for practical applications, is less well studied and it is unclear what types of shrinkage estimators are superior in terms of risk properties. We propose in this paper a class of shrinkage estimators based on the Stein's unbiased estimate of risk (SURE). We establish an asymptotic optimality property for these estimators as the number of means p → ∞. We then extend our construction to create a class of semi-parametric shrinkage estimators and establish corresponding asymptotic optimality results. We emphasize that though the form of our SURE estimators is partially obtained through a normal-normal hierarchical model, their optimality properties do not heavily depend on such distribution assumptions. We apply the methods to two real data sets and obtain encouraging results. This is joint work with Samuel Kou and Xianchao Xie.
The Return of PCA: Statistical Compressed Sensing of GMM and its Applications to Inverse problems, Guillermo Sapiro. A novel framework of compressed sensing, namely statistical compressed sensing (SCS), that aims at efficiently sampling a collection of signals that follow a statistical distribution, and achieving accurate reconstruction on average, is introduced. SCS based on Gaussian models is investigated in depth. For signals that follow a single Gaussian model, with Gaussian or Bernoulli sensing matrices of O(k) measurements, considerably smaller than the O(k log(N/k)) required by conventional CS based on sparse models, where N is the signal dimension, and with an optimal decoder implemented via linear filtering, significantly faster than the pursuit decoders applied in conventional CS, the error of SCS is shown tightly upper bounded by a constant times the best k-term approximation error, with overwhelming probability. The failure probability is also significantly smaller than that of conventional sparsity-oriented CS. Stronger yet simpler results further show that for any sensing matrix, the error of Gaussian SCS is upper bounded by a constant times the best k-term approximation with probability one, and the bound constant can be efficiently calculated. For Gaussian mixture models (GMMs), that assume multiple Gaussian distributions and that each signal follows one of them with an unknown index, a piecewise linear estimator is introduced to decode SCS. The accuracy of model selection, at the heart of the piecewise linear decoder, is analyzed in terms of the properties of the Gaussian distributions and the number of sensing measurements. A maximum a posteriori expectation-maximization algorithm that iteratively estimates the Gaussian models parameters, the signals model selection, and decodes the signals, is presented for GMM-based SCS. In real image sensing applications, GMM-based SCS is shown to lead to improved results compared to conventional CS, at a considerably lower computational cost. Joint work with Guoshen Yu and Stephane Mallat.
Vector Diffusion Maps and the Connection Laplacian, Amit Singer. Motivated by problems in structural biology, specifically cryo-electron microscopy, we introduce vector diffusion maps (VDM), a new mathematical framework for organizing and analyzing high dimensional data sets, images and shapes. VDM is a mathematical and algorithmic generalization of diffusion maps and other non-linear dimensionality reduction methods, such as LLE, ISOMAP and Laplacian eigenmaps. While existing methods are either directly or indirectly related to the heat kernel for functions over the data, VDM is based on the heat kernel for vector fields. VDM provides tools for organizing complex data sets, embedding them in a low dimensional space and interpolating and regressing vector fields over the data. In particular, it equips the data with a metric, which we refer to as the vector diffusion distance. In the manifold learning setup, where the data set is distributed on (or near) a low dimensional manifold M^d embedded in R^p, we prove the relationship between VDM and the connection-Laplacian operator for vector fields over the manifold. Applications to structural biology (cryo-electron microscopy and NMR spectroscopy), computer vision and shape space analysis will be discussed. Joint work with Hau-tieng Wu.
Sparsity and Rank Minimization in Unions of Subspaces, Rene Vidal. We consider the problem of fitting a union of subspaces to a collection of data points drawn from the subspaces and corrupted by noise/outliers. We pose this problem as a rank minimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean dictionary plus a matrix of noise/outliers. By constraining the dictionary to be self-expressive, i.e., its elements can be written as a linear combination of other elements, we formulate this problem as one of minimizing the nuclear norm of the matrix of linear combinations. Our first contribution is to show that, with noisy data, this problem can be solved in closed form from the SVD of the data matrix. Remarkably, this is true for both one or more subspaces. A key difference with respect to existing methods is that our framework results in a polynomial thresholding of the singular values with minimal shrinkage. Indeed, a particular case of our framework in the case of a single subspace leads to classical PCA, which requires no shrinkage. In the case of multiple subspaces, our framework provides an affinity matrix that can be used to cluster the data according to the subspaces. A major difference with respect to existing methods is that, rather than using the corrupted data as a dictionary, we build this affinity from a clean dictionary. Moreover, both the clean dictionary and the affinity matrix can be computed in closed form. In the case of data corrupted by outliers, a closed-form solution appears elusive. We thus use an augmented Lagrangian optimization framework, which requires a combination of our proposed polynomial thresholding operator with the more traditional shrinkage-thresholding operator.
Alternating Direction Optimization: from Imaging Inverse Problems to Inference in Graphical Models, Mário A. T. Figueiredo. This talk will cover our recent work on the application of a class of techniques known as ADMM (alternating direction method of multipliers, which belongs to the family of augmented Lagrangian methods) to several imaging inverse problems under non-smooth sparsity-inducing regularization, namely: standard image restoration/reconstruction from linear observations (e.g., compressive sensing) with a Gaussian noise model, using either analysis or synthesis regularization and unconstrained or constrained optimization formulations; image restoration from Poissonian observations, also using analysis or synthesis regularization. We show that, in all these cases,  the proposed methods inherit the theoretic convergence guarantees of ADMM and achieve state-of-the-art performance in terms of speed. To further illustrate the flexibility of this class of methods, we show how it can be used to seamlessly and efficiently address other problems and formulations: hybrid analysis-synthesis regularization; group-sparsity regularization (with or without group overlap); maximum a posteriori (MAP) inference in probabilistic graphical models.
The Convex Geometry of Linear Inverse Problems, Ben Recht. Building on the success of generalizing compressed sensing to matrix completion, this talk discusses progress on further extending the catalog of objects and structures that can be recovered from partial information. I will focus on a suite of data analysis algorithms designed to decompose signals into sums of atomic signals from a simple---but not necessarily discrete---set. These algorithms are derived in a convex optimization framework that encompasses previous methods based on l1-norm minimization and nuclear norm minimization for recovering sparse vectors and low-rank matrices. I will provide sharp estimates of the number of generic measurements required for exact and robust recovery of a variety of structured models.  I will then detail several example applications and describe how to scale the corresponding inference algorithms to massive data sets.
Multitask Sparsity via Maximum Entropy Discrimination, Tony Jebara. A multitask learning framework is developed for discriminative classification and regression where multiple large-margin classifiers are estimated for different prediction problems. A maximum entropy discrimination (MED) framework is used to derive the multitask algorithm which involves only convex optimization problems that are straightforward to implement. Four multitask scenarios are described. The first produces support vector machines that learn a shared sparse feature selection over the input space. The second method produces multiple support vector machines that learn a shared conic kernel combination. The third multitask method produces a pooled classifier as well as adaptively specialized individual classifiers. Finally, graphical model structure estimation is posed as a multitask sparse classification problem. The maximum entropy optimization problems are implemented via a sequential quadratic programming method which leverages recent progress in fast SVM solvers. Convergence bounds are provided by bounding the MED sparsifying cost function with a quadratic function and ensuring only a constant factor runtime increase above standard independent SVM solvers. Results are shown on multitask data sets and favor multitask learning over single-task or tabula rasa methods. Time permitting, I will also discuss results where, instead of maximizing margin, we maximize relative margin and obtain significant accuracy improvements.
Matrix Sparse Coding, John Lafferty. We study the problem of multivariate regression where the data are naturally grouped, and a regression or covariance matrix is to be estimated for each group. We propose an approach in which a large dictionary of low rank matrices is estimated across groups, and a sparse linear combination of the dictionary elements is estimated to form a model within each group. This exploits the same intuition behind sparse coding that has been successfully developed in computer vision. We propose algorithms for matrix sparse coding and show that the high-dimensional statistical convergence rate improves with the strength of our assumptions on the shared structure of the groups in the underlying model. Joint work with Min Xu (CMU).
Principled Regularization for Matrix Factorization, Robert Bell. Regularized matrix factorization has proven to be a very effective tool for dealing with high dimensional data, particularly in building recommender systems.  L2 regularization is implemented by adding to the optimization criterion one of more terms that are proportional to the L2 norm of various sets of parameters.  Implementation requires specifying the structure of these regularization terms and values for regularization parameters used as multipliers.  We focus on the question of how many distinct regularization parameters should be used for a typical matrix factorization model.  Theory and a simulation experiment suggest using more regularization parameters than are often used in practice.  This is joint work with Suhrid Balakrishnan.
The CLASH Operator, Volkan Cevher. The least absolute shrinkage and selection operator (Lasso) for linear regression exploits the geometric interplay of the ell2-data error objective and the ell1-norm constraint to arbitrarily select sparse models. Guiding this uninformed selection process with sparsity models has been precisely the center of attention over the last decade in order to improve learning performance. To this end, we alter the selection process of Lasso in this talk to explicitly leverage combinatorial sparsity models (CSMs) via the combinatorial selection and least absolute shrinkage operator (CLASH). A highlight is the introduction of a new algorithmic definition of CSMs, which we dub as the Polynomial time Modular epsilon-Approximation Property (PMAP_epsilon). PMAP_epsilon enables us to determine the impact of approximate combinatorial projections within CLASH. We then provide concrete guidelines how to leverage sets with PMAP_epsilon within CLASH, and characterize CLASH’s estimation guarantees as a function of epsilon as well as the set restricted isometry constants of the regression matrix. Finally, we present experimental results using both simulated and real world data to demonstrate the effectiveness of CLASH.
Finding structure with randomness: Probabilistic algorithms for constructing low-rank matrix decompositions, Joel Tropp. Computer scientists have long known that randomness can be used to improve the performance of algorithms. A familiar application is the process of dimension reduction, in which a random map transports data from a high-dimensional space to a lower-dimensional space while approximately preserving some geometric properties. By operating with the compact representation of the data, it is theoretically possible to produce approximate solutions to certain large problems very efficiently. Recently, it has been observed that dimension reduction has powerful applications in numerical linear algebra and numerical analysis. This talk provides a high-level introduction to randomized methods for computing standard matrix approximations, and it summarizes a new analysis that offers (nearly) optimal bounds on the performance of these methods. In practice, the techniques are so effective that they compete with—or even outperform—classical algorithms. Since matrix approximations play a ubiquitous role in areas ranging from information processing to scientific computing, it seems certain that randomized algorithms will eventually supplant the standard methods in some application domains. Joint work with Gunnar Martinsson and Nathan Halko. The paper is available at
Fast First-Order and Alternating Direction Methods for Stable Principal Component Pursuit, Don Goldfarb with Necdet Aybat and Garud Iyengar. Given a matrix M that is the sum of a low rank matrix X and a sparse matrix Y, it is known that under certain conditions, X and Y can be recovered with very high probability by solving a so-called robust principal component pursuit problem.  When M has noise added to it, the recovery problem is called stable principal component pursuit. In this problem one minimizes the sum of the nuclear norm of X plus a scalar times the sum of the absolute values of the elements of Y, subject to the Frobenius norm of X+Y-M being less than some known noise level. We show how to apply Nesterov's optimal gradient and composite methods and a fast alternating linearization method to obtain epsilon-optimal solutions in O(1/epsilon) iterations, while keeping the cost of each iteration low.  We also propose a very efficient alternating direction augmented Lagrangian method, and prove its convergence.
Operator-Theoretic Modeling for Radar in the Presence of Doppler, Doug Cochran. Co-design of waveform and receiver processing is becoming increasingly important with the emergence of closed-loop waveform management, MIMO, and cooperating networked system concepts in radar. A common model of the radar scene in past work of this kind has been a linear time-invariant (LTI) operator with additive Gaussian noise that acts on each transmitted signal to produce the corresponding received signal.  This model is intrinsically ill-suited to dynamic scenes or moving radar platforms because it cannot account for Doppler.  This talk will discuss scene modelling based on Hilbert-Schmidt class (HS) operators on the space of finite-energy signals.  This category of models generalizes the LTI category in the sense that every LTI operator is also a HS operator, but the HS class includes operators that account for frequency shifts as well as time shifts, and are thus suitable for modelling radar scenes involving Doppler.  Every HS operator is uniquely expressible as a superposition of elementary time and frequency shift operators, thus providing a convenient interpretation of a scene in terms of these physically meaningful operations on the transmitted signal.  Application of this perspective to waveform design for target detection in noise and to optimal receiver processing for a given waveform to support target detection in clutter and noise are demonstrated. This is joint work with Stephen Howard and Bill Moran.
Wavelets and filter banks on graphs, Pierre Vandergheynst. In this talk I will first describe a simple construction of wavelets on graphs from spectral graph theory. Spectral graph wavelets are localized on the vertices of a (weighted) undirected graph and can be used to process data defined on networks for example. I will also describe an efficient algorithm to implement the transform and its adjoint which are the basic ingredients for general sparse recovery problems. Then I will describe our ongoing efforts to construct critically sampled filter banks generating wavelets on graphs. A neat connection with nodal theorems will allow us to introduce and study downsampling on graphs and a reduction technique will allow us to iterate the construction through scales. Interestingly, these techniques lead, for a class of graphs, to explicit perfect reconstruction equations that take the form of quadrature mirror conditions. 
Sequential Analysis in High-Dimensional Multiple Testing and Sparse Recovery, Robert Nowak. The study of large-scale networked systems is now a major focus in engineering and science. Deciding what, where, when and how to sample such systems are crucial issues when measurement or experimental resources are limited. Standard experimental methods are non-adaptive: the experimental design is fixed before measurements are taken. One can envision alternative strategies in which the sampling or experimental design is optimized sequentially as more information is gained from previously collected data. In this talk, I will discuss and motivate the need for sequential, adaptive methods with a variety of applications. Then I will illustrate the main ideas by focusing on high-dimensional multiple hypothesis testing and sparse recovery problems, and their applications in biology, cognitive radio and compressed sensing. For example, consider testing to decide which of n>1 genes are differentially expressed in a certain disease. Suppose each test takes the form H0: X ~ N(0,1) vs. H1: X ~ N(m,1), where N(m,1) is the Gaussian density with mean m and variance 1, and H1 represents the case of differential expression. The signal-to-noise ratio (SNR=m^2) must be sufficiently large in order to guarantee a small probability of error. Non-sequential methods, which use an equal number of samples per test, require that the SNR grows like log(n). This is simply because the squared magnitude of the largest of n independent N(0,1) noises is on the order of log(n). Non-sequential methods cannot overcome this curse of dimensionality. Sequential testing, however, is capable of breaking the curse by adaptively focusing experimental resources on certain components at the expense of others. I will discuss a simple sequential method, in the spirit of classic sequential probability ratio testing that only requires the SNR to grow like log(s), where s is the number of tests in which the data are distributed according to H1. In applications like gene testing, s is much much smaller than n and so the gains can be quite significant. For example, if s = log n, then the sequential method is reliable if the SNR is O(log log n), compared to the O(log n) requirement of non-sequential approaches. The distinction is even more dramatic when the tests involve certain one-sided distributions which arise, for example, in cognitive radio spectrum sensing. In this situation the gains can be doubly exponential in n.
Applications and Implications of Fast Methods for L1 Related Optimization, Stan Osher. New and newly revived fast methods for L1, total variation, tight frame based optimization have led to innumerable interesting applications. These include hyper spectral imaging, surveillance, learning theory, separation of dust from aerosols, retinex... I will discuss these methods, applications and future possibilities. This is joint work with many people.
Sparse and Smooth: An optimal convex relaxation for high-dimensional kernel regression, Martin Wainwright. The problem of non-parametric regression is well-known to suffer from a severe curse of dimensionality, in that the required sample size grows exponentially with the dimension $d$.  Consequently, the success of any successful learning procedure in high dimensions must exploit some kind of low-dimensional structure.  This talk focuses on non-parametric estimation within the family of sparse additive models, which consist of sums of univariate functions over $s$ unknown co-ordinates. We derive a simple and intuitive convex relaxation for estimating sparse additive models described by reproducing kernel Hilbert spaces. The method involves solving a second-order cone program (SOCP), and so is suitable for large-scale problems.  Working within a high-dimensional framework that allows both the dimension $d$ and sparsity $s$ to scale, we derive convergence rates that consist of two terms: a subset selection term that captures the difficulty of finding the unknown $s$-sized subset, and an estimation error that captures the difficulty of estimation over kernel classes.  Using information-theoretic methods, we derive matching lower bounds on the minimax risk, showing that the SOCP-based method is optimal. Based on joint work with Garvesh Raskutti and Bin Yu, UC Berkeley. Arxiv paper:
Nonconvex Splitting Algorithms and Video Decomposition, Rick Chartrand. There has been much recent work applying splitting algorithms to optimization problems designed to produce sparse solutions. In this talk, we'll look at extensions of these methods to the nonconvex case, motivated by results in compressive sensing showing that nonconvex optimization can recover signals from many fewer measurements than convex optimization. Our primary example will be the application of these methods to the decomposition of high-dimensional datasets, most notably video, into low-dimensional and sparse components.
Recovering Simple Signals, Anna Gilbert. The primary goal of compressed sensing and (non-adaptive) combinatorial group testing is to recover a sparse vector from an underdetermined set of linear equations. Both problems entail solving an undetermined linear system but they use different models of arithmetic, different models of randomness models for the matrix, and different guarantees upon the solution and the class of signals from which it is drawn. Lipton introduced a model for error correction where the channel is computationally bounded, subject to standard cryptographic assumptions, and produces the error vector x that must be found and then corrected. This has been extended to create more efficient schemes against polynomial and logspace bounded channels. Inspired by these results in error correction, we view compressed sensing and combinatorial group testing as an adversarial process, where Mallory the adversary produces the vector to be measured, with limited information about the matrix. We define a number of computationally bounded models for Mallory and show that there are significant gains (in the minimum number of measurements) to be had by relaxing the model from adversarial to computationally or information-theoretically bounded, and not too much (in some cases, nothing at all) is lost by assuming these models over oblivious or statistical models. We also show that differences in adversarial power give rise to different lower bounds for the number of measurements required to defeat such an adversary. We use lower bounds in communication complexity to show upper bounds for recovery against a streaming adversary. We also give a characterization of when a bounded adversary can produce a “bad” vector for matrices with the Restricted Isometry Property (RIP). Joint work with Brett Hemenway, Atri Rudra, Martin Strauss, and Mary Wootters
Architectures for compressive sampling of correlated signals, Justin Romberg. We will discuss several ways in which recent results on the recovery of low-rank matrices from partial observations can be applied to the problem of sampling ensembles of correlated signals.  We will present several architectures that use simple analog building blocks (vector-matrix multiply, modulators, filters, and ADCs) to implement different types of measurement schemes with "structured randomness".  These sampling schemes allow us to take advantage of the (a priori unknown) correlation structure of the ensemble by reducing the total number of observations required to reconstruct the collection of signals.  We will discuss scenarios that use an ADC for every channel, and those which multiplex the channels onto a single line which is sampled with a single ADC.
Scalable Topic Models, David Blei. Probabilistic topic modeling provides an important suite of tools for the unsupervised analysis of large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. This analysis can then be used for tasks like corpus exploration, document search, and a variety of prediction problems. Traditionally, topic models are fitted with "batch" algorithms, iteratively analyzing each document of the collection and then updating the parameters that describe the model. In this talk, I will describe a faster alternative based on online variational Bayes (VB). Online VB uses online stochastic optimization on the variational objective function with a natural gradient step. With online VB, we can handily analyze massive document collections, including those arriving in a stream. We demonstrate this approach by fitting a Bayesian nonparametric topic model to 3.3 million articles from Wikipedia. We demonstrate that online VB finds topic models as good or better than those found with batch VB, and in a fraction of the time. (Joint work with Matt Hoffman, Chong Wang and Francis Bach)
Diffuse Interface Models on Graphs for Classification of High-Dimensional Data, Andrea Bertozzi.There are currently several communities working on algorithms for classification of high dimensional data.  I will discuss recent joint work with Arjuna Flenner that develops a class of variational algorithms combining recent ideas from spectral methods on graphs with nonlinear edge/region detection methods traditionally used in in the PDE-based imaging community. The algorithms are based on the Ginzburg-Landau functional which has classical PDE connections to total variation minimization. Convex-splitting algorithms allow us to quickly find minimizers of the proposed model and take advantage of fast spectral solvers of linear graph-theoretic problems.  I present diverse computational examples involving both basic clustering and semi-supervised learning for different applications. Case studies include feature identification in images, segmentation in social networks, and segmentation of shapes in high dimensional datasets.
On a new Internet Traffic Matrix (Completion) Problem, Walter Willinger. During the last 10 years, the problem of inferring the intra-domain traffic matrix of large networks (e.g., AT&T, Sprint) has morphed from an active research topic with important practical applications into a non-problem -- many of today's networks are now instrumented to measure the traffic matrix directly, without any need for inference or estimation.  However, it seems safe to predict that the next 10 years will see significant research activities on a new and more challenging traffic matrix problem; that is, inferring the Internet's inter-domain traffic matrix that describes the, say, hourly or daily traffic volumes exchanged between the various networks that define the Internet as a "network of networks." In this talk, I will discuss some of the challenges that arise when working with this matrix including the problem formulation itself, potential solution methods, and the all-important validation requirements.
Easy Does It: Dense and Sparse Spectral Estimation Algorithms that Are User Parameter Free, Jian Li. User parameter free algorithms, like the simple fast Fourier transform (FFT), are easy to use and are desirable for diverse practical applications. However, the data‐independent FFT algorithm suffers from poor resolution and high sidelobe level problems. Many parametric, nonparametric, and sparse (semi‐parametric) spectral estimation algorithms have been introduced in the literature. They possess super resolution and low sidelobe level properties. However, most of these algorithms are not user parameter free, making them difficult to use in practical applications. Many of these algorithms are sensitive to data model errors and/or require second‐order statistics of the measurement vector. We present herein an iterative adaptive approach (IAA) for dense spectral estimation and a sparse learning via iterative minimization (SLIM) algorithm for sparse spectral estimation. Both algorithms are user parameter free, with the dense algorithm more accurate and the sparse algorithm computationally more efficient.
Point process modeling for directed interaction networks, Patrick J. Wolfe. Network data often take the form of repeated interactions between senders and receivers tabulated over time. We introduced a framework to treat these directed interactions as a multivariate point process: a Cox multiplicative intensity model using covariates that depend on the history of the process. We are able to show consistency and asymptotic normality for the resulting partial-likelihood-based estimators under suitable regularity conditions, and describe an efficient procedure for fitting.  We also show how to explicitly treat multicast interactions--those involving a single sender but multiple receivers--in this framework.  A motivating data example shows the effects of reciprocation and group-level preferences on message sending behavior in a corporate e-mail network. (Joint work with Patrick Perry)
Sublinear Time, Measurement-Optimal, Sparse Recovery For All, Martin Strauss. An approximate sparse recovery system in ell_1 norm formally consists of parameters N, k, epsilon an m-by-N measurement matrix, Phi, and a decoding algorithm, D.  Given a vector, x, where x_k denotes the optimal k-term approximation to x, the system approximates x by hat_x = D(Phi.x), which must satisfy                                                       ||hat_x - x||_1  <=  (1+epsilon)||x - x_k||_1.Among the goals in designing such systems are minimizing m and the runtime of D.  We consider the ``forall'' model, in which a single matrix Phi is used for all signals x. All previous algorithms that use the optimal number O(k log(N/k)) of measurements require superlinear time Omega(N log(N/k)).  In this paper, we give the first algorithm for this problem that uses the optimum number of measurements (up to factors of epsilon) and runs insublinear time o(N) when k=o(N), assuming access to a data structure requiring space and preprocessing a bit more than N. Joint work with Ely Porat.
Multi-task Averaging, Maya Gupta. We investigate a similarity-based additive regularization for jointly estimating the means of T random variables from iid samples, which we refer to as multi-task averaging.  We show that this simple multi-task formulation has surprising theoretical properties, and broad applicability to real problems.
Smoothing Proximal Gradient Method for General Structured Sparse Regression, Eric Xing. In many modern problems across areas such as genomics, computer vision, and natural language process, one is interested in learning a Sparse Structured Input-Output Regression Model (SIORM), in which the input variables of the model such as variations on a human genome bear rich structure due to the genetic and functional dependences between entities in the genome; and the output variables such as the disease traits are also structured because of their interrelatedness. A SIORM can nicely capture rich structural properties in the data, but raises severe computational and theoretical challenge on consistent model identification. In this talk, I will present algorithms and theories for learning Sparse SIORMs of various kinds in high dimensional input/output space, with a fast and highly scalable optimization procedure, and strong statistical guarantees. In particular, we investigate a smoothing proximal gradient method, which can solve structured sparse regression problems with a smooth convex loss and a wide spectrum of structured-sparsity-inducing penalties. Our method achieves a convergence rate significantly faster than the standard first-order method, subgradient method, and is much more scalable than the most widely used interior-point method. I will demonstrate application of our approach to problems in large-scale genome association analysis and web image understanding. This is joint work with Xi Chen and Seyoung Kim. 
Image acquisition using sparse (pseudo)-random matrices, Piotr Indyk. I will present an overview of our work on sparse recovery/compressive sensing schemes that use *sparse binary* matrices. Such schemes provably offer good (near-optimal) compression rates and fast recovery times simultaneously. I will also discuss some of the existing  imaging architectures that naturally utilize such matrices, as well as their applications.
Shrinkage estimators for low rank matrices, Andrea Montanari. Suppose to collect noisy observations of a low-rank or approximately low-rank matrix. A simple approach to estimating the matrix is to take its singular value decomposition retain a few singular values, and scale them appropriately. I will show that this approach is optimal (up to a universally multiplicative constant) even if a large part of the entries are missing, within a `bounded entries' model. When studying the constant factor, I will show that substantial gain can be achieved by modifying the shringing of singular values. I will use recent results in random matrix theory to evaluate various shrinkage transformations.
Subspace Expanders and Low Rank Matrix Recovery, Babak Hassibi. The adjacency matrix of an expander graph has been proven useful as the measurement matrix in sparse signal recovery since 1) it maps support sets in the ambient space to large (expanded) support sets in the measurement space (thereby allowing sparse recovery) and 2) it is a sparse matrix which allows for efficient (often linear time) recovery algorithms. In this talk, we generalize the notion of expansion from support sets to subspaces and prove the existence of subspace expanders, i.e., linear mappings from R^(n_1 x n_1) to R^(m_1 x m_2) that "expand" the rank of low rank matrices. We argue that such mappings are suitable as the measuement operator for low rank matrix recovery problems and use them to develop fast recovery algorithms. We also mention several potential applications, such as quantum detection.
Spectral classification sensors: An adaptive approach, Mike Gehm. Frequently, the goal of spectroscopy or spectral imaging is not signal reconstruction, but rather classification of the input spectra against an existing spectral library. We have developed an adaptive approach that modifies the system measurement kernel in response to prior measurements and, as a result, achieves orders-of-magnitude faster time-to-classification than traditional methods. Design and optimization of the extremely high-dimensional measurement kernels is complicated by the need to simultaneously respect physical, system, and task constraints.
An Efficient Dictionary for Reconstruction of Sampled Multiband Signals, Michael B. Wakin. There remains a significant gap between the discrete, finite dimensional compressive sensing (CS) framework and the problem of acquiring a continuous-time signal. In this poster, we will discuss how sparse representations for multiband signals can be incorporated into the CS framework through the use of Discrete Prolate Spheroidal Sequences (DPSS’s). DPSS’s form a highly efficient basis for sampled bandlimited functions; by modulating and merging DPSS bases, one obtains a sparse representation for sampled multiband signals. We will discuss the use of DPSS bases for both signal recovery and the cancellation of strong narrowband interferers from compressive samples. Joint work with Mark A. Davenport.

Exploration & Representation of Data with Geometric Wavelets, Eric E Monson, Guangliang Chen, Rachael Brady & Mauro Maggioni. Geometric Wavelets is a new multi-scale data representation technique which is useful for a variety of applications such as data compression, interpretation and anomaly detection. We have developed an interactive visualization with multiple linked views to help users quickly explore data sets and understand this novel construction. Currently the interface is being used by applied mathematicians to view results and gain new insights, speeding methods development.
Sparse Modeling of Human Actions from Motion Imagery, Alexey Castrodad and Guillermo Sapiro.An efficient sparse modeling pipeline for the classification of human actions from video is developed. Spatio-temporal features that characterize changes in the image are first extracted. This is followed by the learning of a class-structured dictionary encoding the individual actions of interest. Classification is then based on reconstruction, where the label assigned to each video comes from the optimal sparse linear combination of the learned basis vectors (action primitives) representing the learned actions.  A low computational cost deep-layer model learning the interclass correlations of the data is added for increasing discriminative power. In spite of itssimplicity and low computational cost, the method outperforms previously reported results for virtually all standard datasets.
Stable Embeddings of Time Series Data, Han Lun Yap and Christopher J. Rozell. Takens' theorem remarkably established that a delay coordinate map (DCM) can be a one-to-one mapping of a low-dimensional attractor from the system state space. However, Takens' theorem is fragile as small imperfections can induce large errors in this attractor representation. Taking inspiration from compressed sensing, we extend Takens' result to establish deterministic, non-asymptotic sufficient conditions for a DCM to form a stable embedding in the restricted case of linear dynamical systems and observation functions.  Using tools from stable manifold embeddings, we further establish that by filtering the DCM output, we could reduce the number of measurements without unduly sacrificing the conditioning of the resulting stable embedding.
Concentration Inequalities and Isometry Properties for Compressive Block Diagonal Matrices, Han Lun Yap, Jae Young Park, Armin Eftekhari, Christopher Rozell, and Michael Wakin. In this poster, we survey our recent analysis of randomized, compressive block diagonal matrices. We present concentration of measure bounds which indicate that (unlike dense i.i.d. random matrices) the probability of norm preservation actually depends on the signal being measured and discuss implications of this fact in various compressive signal processing applications. We also present an RIP bound for block diagonal matrices and explain that in the best case—for signals that are sparse in certain bases—these matrices perform nearly as well as dense i.i.d. random matrices despite having many fewer nonzero entries.
Learning Sparse Codes for Hyperspectral Images, Adam Charles, Bruno Olshausen, and Christopher Rozell. The exploitation of low dimensional structures in data has dramatically improved our ability to solve highly underdetermined linear inverse problems. While early results leverage sparsity in analytic dictionaries (such as wavelets) additional improvements have since been observed by adapting a dictionary to a specific data type. In this work we demonstrate the advantage of learning a spectral dictionary for Hyperspectral imagery (HSI). Specifically, we show that a dictionary learning procedure can learn relevant material spectra which can be used to infer materials in a HSI scene. The resulting codes can be used to super-resolve HSI data from lower resolution measurements (i.e., multispectral imagery), as well as to reduce the complexity of supervised classifiers for ground materials.
Convergence Properties and Rate Analysis for Neural Networks for Sparse Approximation, Aurèle Balavoine, Justin Romberg, and Christopher Rozell. We present an analysis of the Locally Competitive Algorithm (LCA), a Hopfield-style neural network that solves sparse approximation problems. This class of problems plays a significant role in both theories of neural coding and applications in signal processing, and may benefit from neural network approaches that could be implemented in fast, low power analog architectures. However, traditional network analysis approaches are difficult because the objective functions are non-smooth. Specifically, we characterize the convergence properties of this system by showing that the LCA is globally convergent to a fixed point corresponding to the exact solution of the objective function, and (under some mild conditions) this solution is reached in finite time. Furthermore, we characterize the convergence rate of the system by showing that the LCA converges exponentially fast with an analytically bounded rate (that depends on the specifics of a given problem). We support our analysis with several illustrative simulations. In the specific case of Compressed Sensing, where the dictionary usually has low coherence and the l1-norm is used as the sparsity-inducing cost function, we show that it is possible to chose the threshold to achieve convergence in a small number of steps.
A Compressive Sensing LIDAR Architecture, Darryl Sale, Christopher Rozell, Justin Romberg, and Aaron Lanterman. In this work, we present a linear mode LIDAR sensor architecture based on the principles of Compressive Sensing. This conceptual architecture would produce high resolution range images without complicated scanning or detection electronics by exploiting the inherent sparsity of LIDAR reflections.  Its complementary optical paths and linear mode detectors transform time-varying LIDAR photon arrivals into a compressed measurement waveform corresponding to each row of the sensing matrix.  A small number of samples of these waveforms are used in standard sparse approximation algorithms to recover the scene in a format well-suited for parallel processing.  MATLAB simulation results based on DIRSIG scene models demonstrate excellent detector performance with substantially fewer measurements than required by conventional scanning LIDAR.
Principal Curvature Analysis: Dynamic Feature Selection By Moving Frames Optimization, Xiao Bian and Hamid Krim. We propose a novel geometrical method to extract dynamic features of functional data in high dimensions based on moving frames optimization. Structural invariant dynamic features are represented by the corresponding curvature vectors. The objective functions are designed to optimize on moving frames in order to make the curvature vectors in a simple form, which we call principal curvature vectors. Their solutions are in closed form and have advantages to applied in pattern classification and recognition.
High Dimensional Graphical Model Selection:  Tractable Graph Families and Latent Trees, Anima Anandkumar. Capturing complex interactions among a large set of variables is a challenging task. Probabilistic graphical models or Markov random fields provide a graph-based framework for capturing such dependencies.  Graph estimation is an important task, since it reveals important relationships among the variables. I will present a unified view of graph estimation and propose a simple local algorithm  for graph estimation using only low-order statistics of the data. We establish that the algorithm has consistent graph estimation with low sample complexity  for a class of graphical models satisfying certain structural and parameter criteria. We explicitly characterize these model classes and point out interesting relationships between the graph structure and the parameter regimes, required for tractable learning. Many graph families such as the classical Erdos-Renyi random graphs, random regular graphs, and the small-world graphs can be learnt efficiently under our framework. The second part of the work is motivated by the following question: can we discover hidden influences acting on the observed variables? We consider latent tree models  for capturing hidden relationships. We develop novel algorithms for learning the unknown high-dimensional latent tree structure. Our algorithm is amenable to efficient implementation of the Bayesian Information Criterion (BIC)  to tradeoff the number of hidden variables with the accuracy of the model fitting. Experiment on the S&P 100 financial data reveals sectorization of the companies and experiment on the newsgroups data automatically categorizes words into different topics.
Sparse Principal Components Analysis with Multiply Labeled Data, Kenneth D. Morton Jr., Leslie M. Collins, Peter A. Torrione. Standard dimensionality reduction techniques such as principal components analysis (PCA) and factor analysis model high dimensional data using a small number of high dimensional basis vectors and a corresponding set of low dimensional scores. In general, the basis (or set of factors) is determined so as to maximize the variance in the set of scores, minimize the projection error, or maximize the likelihood of the assumed model, Unfortunately, these techniques ignore information contained in any available supplementary data related to the observations (e.g., observation labels). Although techniques such as linear discriminant analysis (LDA) and partial least squares discriminant analysis (PLSDA) can be used to infer a discriminative basis that incorporates observation labels, these techniques do not learn a generative basis while incorporating this additional information. Based on the Bayesian sparse factor analysis formulation of Chen et al. (2011) this work develops multi-label PCA (MLPCA), a generative dimension reduction technique that can incorporate supplementary information and an online variational Bayesian learning algorithm is developed for the parameters. The model automatically incorporates information from multiple labels for each observation, and is capable of learning a set of factors for each individual class even if data from a particular class is only observed “mixed” with data from different classes. By way of example, the MLPCA model is utilized within a detection algorithm for identifying multiple chemical residues on multiple substrates where identification of the factors corresponding to individual residues is important, and data from each residue is always measured in a mixture with a substrate. It is demonstrated that a detection algorithm utilizing MLPCA provides improved robustness to variations in substrate compared to standard chemometric techniques for residue detection.
Joint Dynamic Sparse Representation for Classification, Haichao Zhang, Nasser M. Nasrabadi and Thomas S. Huang. We propose a joint dynamic sparse representation algorithm for classification with multiple measurements /observations. We formulate classification task as a multi-input joint sparse representation model and exploit the correlations among the multiple inputs for classification using a novel joint dynamic sparsity prior. The proposed joint dynamic sparsity prior promotes shared joint sparsity pattern among the multiple sparse representation vectors at class-level, while allowing distinct sparsity patterns at atom-level within each class to facilitate a flexible representation. The proposed method can handle both homogenous as well as heterogeneous data within the same framework.
Broadband Passive Spatial Spectrum Estimation in a Dynamic Environment, Jonathan Odom and Jeffrey Krolik. Estimation of the time-varying acoustic field is essential for situational awareness in passive sonar. Adaptive processing often assumes both the field is stationary and the array is fixed for multiple observation windows. Highly dynamic scenarios such as high bearing rate sources or underwater maneuvers severely limit the utilization of multiple snapshots. Several models are considered for time-varying fields, and an online broadband maximum-likelihood estimator is introduced that is solved with an EM algorithm using as few as one snapshot. The number of estimated parameters can be reduced for broadband data when information, such as shape, about the source temporal spectra is known. Cramér-Rao bound analysis is used to understand effects of temporal spectrum knowledge on broadband processing.  An example is given for the flat spectrum case to compare with conventional processing. The new spatial spectrum estimate leverages information across the temporal spectrum to reduce the impact of spatial grading lobes. Another feature of dynamic environments is array motion. Since underwater arrays are often subject to motion, the estimate must consider arbitrary, dynamic array shapes. Platforms such as autonomous underwater vehicles provide mobility but constrain the number of sensors. Exploiting maneuverable linear arrays with the new estimate allows for left-right disambiguation, more uniform angular resolution, and suppression of spatial grading lobes. Multi-source simulations are used to demonstrate the ability of a short, maneuvering array to reduce array backlobes as well as operate in the spatial grading lobe region. Detection performance of a weak high-bearing rate source in an interference dominated environment is evaluated for a flat spectrum.
Nonparametric Multivariate Convex Regression with Applications to Value Function Approximation, Lauren Hannah and David Dunson. We propose two new, nonparametric methods for multivariate regression subject to convexity or concavity constraints on the response function. Convexity constraints are common in economics, statistics, operations research and financial engineering, but there is currently no multivariate method that is computationally feasible for more than a few hundred observations. We introduce frequentist Convex Adaptive Partitioning (CAP) and Bayesian Multivariate Bayesian Convex Regression (MBCR), which both create a globally convex regression model from locally linear estimates fit on adaptively selected covariate partitions. Adaptive partitioning makes computation efficient even on large problems. We leverage the search procedure of CAP to create a computationally efficient reversible jump MCMC sampler for MBCR. We give strong consistency results for CAP in the univariate case and both weak and strong consistency results for MBCR in the general case. Convexity offers a few properties that can be exploited. First, it acts as a regularizer, making CAP and MBCR resistant to overfitting. Second, convex functions can be quickly minimized with commercial solvers. We CAP and MBCR to fit value function approximation for real-world sequential decision problems with convex value-to-go functions. By allowing efficient search over the decision space, CAP and MBCR allow us much larger problems than can currently be solved with state of the art methods. We show that MBCR produces much more robust policies than CAP through model averaging.
Efficient Gaussian Process Regression for Large Data Sets, Anjishnu Banerjee, David Dunson and Surya Tokdar. Gaussian processes (GPs) are widely used in nonparametric regression, classification and spatio-temporal modeling, motivated in part by a rich literature on theoretical properties. However, a well known drawback of GPs that limits their use is the expensive computation, typically O($n^3$) in performing the necessary matrix inversions with $n$ denoting the number of data points. In large data sets, data storage and processing also lead to computational bottlenecks and numerical stability of the estimates and predicted values degrades with $n$. To address these problems, a rich variety of methods have been proposed, with recent options including predictive processes in spatial data analysis and subset of regressors in machine learning. The underlying idea in these approaches is to use a subset of the data, leading to questions of sensitivity to the subset and limitations in estimating fine scale structure in regions that are not well covered by the subset. Motivated by the literature on compressive sensing, we propose an alternative random projection of all the data points onto a lower-dimensional subspace. We demonstrate the superiority of this approach from a theoretical perspective and through the use of simulated and real data examples.
Repulsive Mixtures, Francesca Petralia and David B. Dunson. Mixture models have become extremely used for density estimation, clustering and as a component in flexible hierarchical models. In using mixture models for clustering, identifiability problems arise if mixture components are not sufficiently well separated and the data for the different sub-populations contain substantial overlap.  Insufficiently separated components also creates problems in using mixture models for density estimation and robust modeling, as redundant components that are located close together can be introduced leading to an unnecessarily complex model as well as to various computational problems. Current practice in Bayesian mixture modeling generates the component-specific parameters from a common prior, which tends to favor components that are close together. As an alternative, we propose to generate mixture components from a repulsive process that favors placing components further apart. Efficient algorithms are proposed for fully Bayes and fast approximate posterior computation allowing uncertainty in the number of components. The methods are illustrated using simulated data and several applications.
Tensor factor models for high-dimensional unordered categorical data, Anirban Bhattacharya and David B. Dunson. Gaussian latent factor models are routinely used for modeling of dependence in continuous, binary and ordered categorical data. For unordered categorical variables, Euclidean latent factor models lead to challenging computation and complex modeling structures. We relate the problem of modeling high-dimensional categorical variables to sparse tensor decomposition and propose a class of simplex factor models. Using a Bayesian approach for computation and inferences, an efficient MCMC algorithm is proposed that scales well with increasing dimensions. We provide a unified framework for testing associations and interactions among groups of variables and evaluate the approach through simulation examples and applications to modeling dependence in nucleotide sequences.
Bayesian modeling of closed surfaces through tensor products, Debdeep Pati and David B. Dunson. Closed surfaces provide a useful model for 3d shapes, with the data typically consisting of a cloud of points in R^3. The existing literature on closed surface modeling focuses on frequentist point estimation methods that join surface patches along the edges, with surface patches created via Bezier surfaces or tensor products of B-splines. However, the resulting surfaces are not smooth along the edges and the geometric constraints required to join the surface patches lead to computational drawbacks. In this article, we develop a Bayesian model for closed surfaces based on tensor products of a cyclic basis resulting in infinitely smooth surface realizations. We impose sparsity on the control points through a double-shrinkage prior.  Theoretical properties of the support of our proposed prior are studied and it is shown that the posterior achieves the optimal rate of convergence under reasonable assumptions on the prior.  Several simulation examples are considered and the methods are applied to real data. We also extend our model for a single closed surface to a hierarchical model to accommodate multiple closed surfaces. 
Bayesian Kernel Mixtures for Counts, Antonio Canale and David B. Dunson. Although Bayesian nonparametric mixture models for continuous data are well developed, there is a limited literature on related approaches for count data.  A common strategy is to use a mixture of Poissons, which unfortunately is quite restrictive in not accounting for distributions having variance less than the mean.  Other approaches include mixing multinomials, which requires finite support, and using a Dirichlet process prior with a Poisson base measure, which does not allow smooth deviations from the Poisson.  As a broad class of alternative  models, we propose to use nonparametric mixtures of rounded continuous kernels.  An efficient Gibbs sampler is developed for posterior computation, and a simulation study is performed to assess performance. Focusing on the rounded Gaussian case, we generalize the modeling framework to account for multivariate count data, joint modeling with continuous and categorical variables, and other complications. The methods are illustrated through applications to marketing data. 
Bayesian Nonparametric Covariance Regression, Emily Fox and David B. Dunson. Although there is a rich literature on methods for allowing the variance in a univariate regression model to vary with predictors, time and other factors, relatively little has been done in the multivariate case. Our focus is on developing a class of nonparametric covariance regression models, which allow an unknown $p \times p$ covariance matrix to change flexibly with predictors. The proposed modeling framework induces a prior on a collection of covariance matrices indexed by predictors through priors for predictor-dependent loadings matrices in a factor model. In particular, the predictor-dependent loadings are characterized as a sparse combination of a collection of unknown dictionary functions (e.g, Gaussian process random functions). The induced covariance is then a regularized quadratic function of these dictionary elements. Our proposed framework leads to a highly-flexible, but computationally tractable formulation with simple conjugate posterior updates that can readily handle missing data. Theoretical properties are discussed and the methods are illustrated through simulations studies and an application to the Google Flu Trends data.
Bayes variable selection in semiparametric linear models, Suprateek Kundu and David B. Dunson.There is a rich literature proposing methods and establishing asymptotic properties of Bayesian variable selection methods for parametric models, with a particular focus on the normal linear regression model and an increasing emphasis on settings in which the number of candidate predictors (p) diverges with sample size (n). Our focus is on generalizing methods and asymptotic theory established for mixtures of g-priors to semiparametric linear regression models having unknown residual densities. Using a Dirichlet process location mixture for the residual density, we propose a semiparametric g-prior which incorporates an unknown matrix of cluster allocation indicators. For this class of priors, posterior computation can proceed via a straightforward stochastic search variable selection algorithm. In addition, Bayes factor consistency is shown to result under various cases including proper priors on g and p diverging at a rate less than n.
Flexible Scale Mixtures of Normals for Shrinkage Estimation, Artin Armagan, David Dunson and Merlise Clyde. In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better properties than traditional Cauchy and double exponential priors. We first propose a new class of normal scale mixtures through a novel generalized beta distribution that encompasses many interesting priors as special cases. This encompassing framework should prove useful in comparing competing priors, considering properties and revealing close connections. We then develop a class of variational Bayes approximations through the new hierarchy presented that will scale more efficiently to the types of truly massive data sets that are now encountered routinely.
An Empirical-Bayes Approach to Compressive Sensing via Approximate Message Passing, Jeremy Villa. Traditional approaches to compressive sensing (e.g., Lasso, basis-pursuit denoising) take a worst-case approach to signal recovery, and are able to give guarantees for the recovery of any deterministic signal.  Furthermore, recent work has connected these traditional approaches to minimax optimal estimation.  Although the minimax property is admirable from a robustness perspective, estimation performance can be improved when the signal prior is known to differ from the least-favorable one.  At the same time, care must be taken not to assume too much about the signal prior. In this work, we propose an empirical Bayesian approach to compressive sensing, where we assume an i.i.d Gaussian-Mixture signal model and an i.i.d Gaussian noise model, but make no assumptions on the parameters of the signal distribution and the noise distribution.  We then employ the EM algorithm to jointly learn the distributional parameters (in an ML sense) and recover the signal (in an MMSE sense), leveraging the recently proposed approximate message passing (AMP) algorithm for compressive sensing.
Two Proposals for Robust PCA using Semidefinite Programming, Michael McCoy and Joel A. Tropp.The performance of principal component analysis (PCA) suffers badly in the presence of outliers. This poster outlines two novel approaches for robust PCA based on programming. The first method, maximum mean absolute deviation rounding (MDR), seeks directions of large spread in the data while damping the effect of outliers. The second method produces a low-leverage decomposition (LLD) of the data that attempts to form a low-rank model for the data by separating out corrupted observations. This work also presents efficient computational methods for solving these SDPs. Numerical experiments con firm the value of these new techniques."
Fast level set estimation from projection measurements, Kalyani Krishnamurthy, Waheed Bajwa, Rebecca Willett and Robert Calderbank. Estimation of the level set of a function (i.e., regions where the function exceeds some value) is an important problem with applications in digital elevation maps, medical imaging, and astronomy. In many applications, however, the function of interest is acquired through indirect measurements, such as tomographic projections, coded-aperture measurements, or pseudo-random projections associated with compressed sensing. This poster describes a new methodology and associated theoretical analysis for rapid and accurate estimation of the level set from such projection measurements. The proposed method estimates the level set from projection measurements without an intermediate function reconstruction step, thereby leading to significantly faster computation. In addition, the coherence of the projection operator and McDiarmid’s inequality are used to characterize the estimator’s performance.
Estimation of Dynamic Social Network Structure via Online Convex Programming, Maxim Raginsky, Corinne Horn, and Rebecca Willett. Consider a dynamic social network composed of p agents, where the designation “dynamic” means that the network topology or, more broadly, the influence of each agent on the other agents, may evolve with time. We collect sequential observations of the agents’ actions, and would like to use them to infer something in real time about the structure of the network. From the modeling point of view, both the topology of the network and the strengths of the agents’ influence can be encoded in a time-varying sparse binary pairwise Markov random field (also known in statistical physics as the Ising model). Thus, our goal is to infer the parameters of such a model given the observed binary co-occurrences representing the agents’ actions. The offline (batch) version of this problem was recently treated by Banerjee et al. [1], Ravikumar et al. [2], Kolar and Xing [3], and Ho ̈fling and Tibshirani [4]. As noted in [1] and [2], sparsity regularization can play a critical role in accurately estimating network structure from a relatively limited amount of data; this effect has been noted both empirically and theoretically in an analysis of sample complexity of different methods. By contrast, we work in the online setting, i.e., when the observations are available sequentially, one at a time. Moreover, even if all the observations are available at once, online algorithms often offer a computationally feasible and robust alternative to batch inference [5]. A notable feature of our approach is that we do not assume that the dynamical evolution of the agents’ actions is, in fact, described by a time-varying Ising model; nor do we assume that sequential observations are conditionally independent. Rather, we treat the Ising model as a descriptive tool, which allows for easy interpretation of the observed network behavior in terms of a time-varying weighted graph, where the presence of a nonzero weight on an edge connecting a particular pair of agents indicates mutual influence, while the sign of the weight indicates the nature of the correlation between the agents’ actions (positive or negative). Our theoretical results show that we can use the sequential observations of the agents’ actions to construct a sequence of network structure estimates that is nearly as good as if all the observations had been available at once. Finally, we will demonstrate the efficacy of the proposed approach on a simulated data set generated from a time- varying binary Markov random field, as well as on the US Senate voting records.
Fast Imaging with Slow Cameras, Dikpal Reddy, Ashok Veeraraghavan, Ramesh Raskar, Rama Chellappa. Over the years, the spatial resolution of cameras has steadily increased but the temporal resolution has remained the same. In this talk, I will present my work on converting a regular slow camera into a faster one. We capture and accurately reconstruct fast events using our slower prototype camera by exploiting the temporal redundancy in videos. First, I will show how by fluttering the shutter during the exposure duration of a slow 25fps camera we can capture and reconstruct a fast periodic video at 2000fps. Next, I will present its generalization where we show that per-pixel modulation during exposure, in combination with brightness constancy constraints allows us to capture a broad class of motions at 200fps using a 25fps camera. In both these techniques we borrow ideas from compressive sensing theory for acquisition and recovery.
The Hierarchical Beta Process for Convolutional Factor Analysis and Deep Learning, Bo Chen, Gungor Polatkan, Guillermo Sapiro, David B. Dunson, Lawrence Carin. A convolutional factor-analysis model is developed, with the number of filters (factors) inferred via the beta process (BP) and hierarchical BP, for single-task and multi-task learning, respectively. The computation of the model parameters is implemented within a Bayesian setting, employing Gibbs sampling; we explicitly exploit the convolutional nature of the expansion to accelerate computations. The model is used in a multi-level (“deep”) analysis of general data, with specific results presented for image-processing data sets, e.g., classification.
Tree-Structured Infinite Sparse Factor Model, XianXing Zhang, David B. Dunson, Lawrence Carin. A tree-structured multiplicative gamma process (TMGP) is developed, for inferring the depth of a tree-based factor-analysis model. This new model is coupled with the nested Chinese restaurant process, to nonparametrically infer the depth and width (structure) of the tree. In addition to developing the model, theoretical properties of the TMGP are addressed, and a novel MCMC sampler is developed. The structure of the inferred tree is used to learn relationships between high-dimensional data, and the model is also ap- plied to compressive sensing and interpolation of incomplete images.
Dependent Hierarchical Beta Process for Image Interpolation and Denoising, Mingyaun Zhou, Hongxia Yang, Guillermo Sapiro, David Dunson and Lawrence Carin. A dependent hierarchical beta process (dHBP) is developed as a prior for  data that may be represented in terms of a sparse set of latent features, with covariate-dependent feature usage. The dHBP is applicable to general covariates and data models, imposing that signals with similar covariates are likely to be manifested in terms of similar features. Coupling the dHBP with the Bernoulli process, and upon marginalizing out the dHBP, the model may be interpreted as a covariate-dependent hierarchical Indian buffet process. As applications, we consider interpolation and denoising of an image, with covariates defined by the location of image patches within an image. Two types of noise models are considered: (i) typical white Gaussian noise; and (ii) spiky noise of arbitrary amplitude, distributed uniformly at random. In these examples, the features correspond to the atoms of a dictionary, learned based upon the data under test (without a priori training data). State-of-the-art performance is demonstrated, with efficient inference using hybrid Gibbs, Metropolis-Hastings and slice sampling.
On the Integration of Topic Modeling and Dictionary Learning, Lingbo Li, Guillermo Sapiro and Lawrence Carin. A new nonparametric Bayesian model is developed to integrate dictionary learning and topic model into a unified framework. The model is employed to analyze partially annotated images, with the dictionary learning performed directly on image patches. Efficient inference is performed with a Gibbs-slice sampler, and encouraging results are reported on widely used datasets.
Nonparametric Bayesian Modeling of Heterogeneous Space-Time Information Sources, Priyadip Ray, David Dunson and Lawrence Carin. We propose a nonparametric Gaussian process (GP) factor analysis model for space-time-varying data. The problem of spatial inhomogeneity is addressed via a novel kernel stick-breaking process (KSBP) based mixture of GPs. We also propose a joint factor analysis technique to simultaneously model multi-modal correlated spatio-temporal information sources. We discuss theoretical properties of the KSBP-GP factor model and develop an MCMC algorithm for posterior inference. The performance of the proposed approach is demonstrated on the analysis of multi-year unemployment rates at various geographical locations, with these data analyzed jointly with time-evolving stock prices of companies in the S & P 500.
Time-Evolving Modeling of Social Networks, Eric Wang, Jorge Silva, Rebecca Willett and Lawrence Carin. A statistical framework for modeling and prediction of binary matrices is presented. The method is applied to social network analysis, specifically the database of US Supreme Court rulings. It is shown that the ruling behavior of Supreme Court judges can be accurately modeled by using a small number of latent features whose values evolve with time. The learned model facilitates the discovery of inter–relationships between judges and of the gradual evolution of their stances over time. In addition, the analysis in this paper extends previous results by considering automatic estimation of the number of latent features and other model parameters, based on a nonparametric–Bayesian approach. Inference is efficiently performed using Gibbs sampling.
Audio source modeling based on structured sparsity and GMM using temporal constraints, Pablo Sprechmann. In this work we present a new model for representing single tone audio sources. This includes single pitch musical instruments as well as human voice. Dictionary learning and Non negative matrix factorization (as well as its probabilistic counterparts) have been widely used for performing source separation. For each sound source, these algorithms learn a dictionary of spectral vectors to best represent it. These approaches have shown very good results in a variety of situations. However, when handling single tone audio sources, there is no guarantee that the coded signal will have a single tone. Additionally, no temporal structure is usually used in the coding, which is a very important aspect of audio sources. We present a simple model that is based on structured sparsity and Gaussian Mixture Models (GMM). For each source, we jointly learn a GMM for representing the spectral representation as well as a Markov chain that describes the structure of changes between these gaussians. We interpret the source coding as an inverse problem adding additional harmonic constraints.
Non-Local Methods with Shape-Adaptive Patches (NLM-SAP), Charles Deledalle, Vincent Duval, Joseph Salmon. We propose an extension of the Non-Local Means (NL-Means) denoising algorithm. The idea is to replace the usual square patches used to compare pixel neighborhoods with various shapes that can take advantage of the local geometry of the image. We provide a fast algorithm to compute the NL-Means with arbitrary shapes thanks to the Fast Fourier Transform. We then consider local combinations of the estimators associated with various shapes by using Stein's Unbiased Risk Estimate (SURE). Experimental results show that this algorithm improve the standard NL-Means performance and is close to state-of-the-art methods, both in terms of visual quality and numerical results. Moreover, common visual artifacts usually observed by denoising with NL-Means are reduced or suppressed thanks to our approach.
Application of compressive sensing methods to image reconstruction in X-ray based tomography, Emil Y. Sidky and Xiaochuan Pan. The compressive sensing idea of exploiting gradient sparsity for reducing scanning effort in computed tomography (CT) has proved to be quite fruitful. A survey of results, by us and other researchers, obtained with actual CT scanner data will be shown. These results include reconstruction from sparse-view data, high resolution imaging with low intensity X-ray illumination, and time-resolved imaging.  As there is no theoretical result for linking image-gradient sparsity and CT sampling conditions yet, we will discuss our preliminary attempts at making this link with computer simulations.
Sparse Measurement Matrices for Sequential Adaptive Compressive Sensing, Jarvis Haupt, Richard Baraniuk, Rui Castro, and Robert Nowak. The theory of Distilled Sensing (DS) establishes that sequential adaptive sensing provides significant quantifiable performance improvements over non-adaptive sensing in noisy high dimensional testing problems. The DS idea was recently extended to highly undersampled settings typical in compressed sensing (CS), and it was shown that a Compressive Distilled Sensing (CDS) method can similarly result in performance gains in noisy CS problems.  However, the results obtained in the initial work on CDS were restricted to settings where the unknown sparse signals of interest have limited dynamic range (on the order of a constant).  In this work we describe a new adaptive compressive sensing approach that utilizes sequentially designed sparse measurement matrices. Our analysis of this new approach shows that similar performance improvements in adaptive compressive sensing can be achieved, but without restrictive assumptions on the dynamic range of the signal being acquired. 
Multiscale Analysis of Planar Arrangements, Mauro Maggioni and Guangliang Chen. Modeling high-dimensional data by multiple low-dimensional planes is an important problem in many applications such as computer vision and pattern recognition. In the most general setting where only coordinates of the data are given, the problem asks to determine the optimal model parameters (i.e. number of planes and their dimensions), estimate the model planes, and cluster the data accordingly. Though many algorithms have been proposed, most of them need to assume prior knowledge of the model parameters and thus address only the last two components of the problem. In this paper we propose an efficient algorithm based on multiscale SVD analysis and spectral methods to tackle the problem in full generality. We also demonstrate its state-of-the-art performance on both synthetic and real data, in tasks such as motion segmentation and a simple scenario for face recognition.