Rene Vidal
Given a set of data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that efficiently represent the data collection. In our formulation, each data point is expressed as a linear combination of the exemplars, which are selected from the data via convex optimization. In general, we do not assume that the data are low- rank or distributed around cluster centers. However, when the data do come from a collection of low-rank models, we show that our method automatically selects a few representatives from each low-rank model. Our framework can be extended to detect and reject outliers in datasets, and to efficiently deal with new observations and large datasets. The proposed framework and theoretical foundations are illustrated with examples in video summarization and image classification using exemplars. This is joint work with Elhsan Elhamifar and Guillermo Sapiro.
Jordan Boyd-Graber
Topic models based on latent Dirichlet allocation (LDA) assume a predefined vocabulary. This is reasonable in batch settings but not reasonable for streaming and online settings. To address this lacuna, we extend LDA by drawing topics from a Dirichlet process whose base distribution is a distribution over all strings rather than from a finite Dirichlet. We develop inference using online variational inference and--to only consider a finite number of words for each topic---propose heuristics to dynamically order, expand, and contract the set of words we consider in our vocabulary. We show our model can successfully incorporate new words and that it performs better than topic models with finite vocabularies in evaluations of topic quality and classification performance.
Tomaso Poggio
The talk explores the theoretical consequences of a simple assumption: the computational goal of the feedforward path in the ventral stream – from V1, V2, V4 and to IT – is to discount image transformations, after learning them during development. The initial assumption is that a basic neural operation consists of dot products between input vectors and synaptic weights – which can be modified by learning. It proves that a multi-layer hierarchical architecture of dot-product modules can learn in an unsupervised way geometric transformations of images and then achieve the dual goals of invariance to global affine transformations and of robustness to deformations. These architectures learn in an unsupervised way to be automatically invariant to transformations of a new object, achieving the goal of recognition with one or very few labeled examples. The theory should apply to a varying degree to a range of hierarchical architectures such as HMAX, convolutional networks and related feedforward models of the visual system and formally characterize some of their properties.
Joel Tropp
Recent empirical research indicates that many convex optimization problems with random constraints exhibit a phase transition as the number of constraints increases. For example, this phenomenon emerges in the l1 minimization method for identifying a sparse vector from random linear samples. Indeed, l1 minimization succeeds with high probability when the number of samples exceeds a threshold that depends on the sparsity level; otherwise, it fails with high probability. This talk summarizes a rigorous analysis that explains why phase transitions are ubiquitous in random convex optimization problems. It also describes tools for making reliable predictions about the quantitative aspects of the transition, including the location and the width of the transition region. These techniques apply to regularized linear inverse problems with random measurements, to demixing problems under a random incoherence model, and also to cone programs with random affine constraints.
Tony Jebara
Graphical models and Markov random fields are fundamental tools in machine learning with broad application in areas including computer vision, speech recognition and computational biology. They involve two canonical forms of inference: maximum a posteriori (MAP) estimation, where the most likely configuration is returned; and marginal inference, where the probability distribution over a subset of variables is returned. Both problems are NP-hard in general. I describe procedures for compiling MAP estimation and Bethe pseudo-marginal inference into a classical problem in combinatorics: finding the maximum weight stable set (MWSS) in a graph. Whenever a graph is perfect, the MWSS problem is solvable in polynomial time. By making connections to perfect graph theory, several graphical models are shown to compile into perfect graphs and therefore admit tractable inference.
Thomas Strohmer
The need to understand when and how well eigenvectors of matrices are localized, arises in a variety of areas as diverse as Massive Data Analysis, Random Matrix Theory, and Condensed Matter Physics (Anderson localization). Yet, our understanding of the localization of eigenvectors is surprisingly limited, given the fact that more and more instances emerge where such localization has been observed empirically, and a thorough understanding of this phenomenon is expected to yield crucial insights. In this talk I will make a first step toward developing a comprehensive qualitative and quantitative mathematical framework for characterizing the localization behavior of eigenvectors. The approach combines tools from Harmonic Analysis, Banach Algebras, and Random Matrix Theory. I will then briefly discuss similar localization results for other matrix factorizations beyond the singular value decomposition, and conclude with some open problems.
Stan Osher
Sparsity and compressive sensing have had a tremendous impact in science, technology, medicine, imaging, machine learning and now, in solving multiscale problems in applied partial differential equations, developing sparse bases for Elliptic eigenspaces and connections with viscosity solutions to Hamilton-Jacobi equations. l1 and related optimization solvers are a key tool in this area. The special nature of this functional allows for very fast solvers: l1 actually forgives and forgets errors in Bregman iterative methods. I will describe simple, fast algorithms and new applications ranging from sparse dynamics for PDE, new regularization paths for logistic regression and support vector machine to optimal data collection and hyperspectral image processing.
Venkat Chandrasekaran
Modern massive datasets create a fundamental problem at the intersection of the computational and statistical sciences: how to provide guarantees on the quality of statistical inference given bounds on computational resources such as time or space. Our approach to this problem is to define a notion of “algorithmic weakening,” in which a hierarchy of algorithms is ordered by both computational efficiency and statistical efficiency, allowing the growing strength of the data at scale to be traded off against the need for sophisticated processing. We illustrate this approach in the setting of denoising problems, using convex relaxation as the core inferential tool. Hierarchies of convex relaxations have been widely used in theoretical computer science to yield tractable approximation algorithms to many computationally intractable tasks. In this talk we show how to endow such hierarchies with a statistical characterization and thereby obtain concrete tradeoffs relating algorithmic runtime to amount of data. Joint work with Michael Jordan.
Yuqian Zhang
Illumination variation remains a central challenge in object detection and recognition. Existing analyses of illumination variation typically pertain to convex, Lambertian objects, andguarantee quality of approximation in an average case sense. We show that it is possible to build V(vertex)-description convex cone models with worst-case performance guarantees, for nonconvex Lambertian objects. Namely, a natural verification test based on the angle to the constructed cone guarantees to accept any image which is sufficiently well-approximated by an image of the object under some admissible lighting condition, and guarantees to reject any image that does not have a sufficiently good approximation. The cone models are generated by sampling point illuminations with sufficient density, which follows from a new perturbation bound for point images in the Lambertian model. As the number of point images required for guaranteed verification may be large, we introduce a new formulation for cone preserving dimensionality reduction, which leverages tools from sparse and low-rank decomposition to reduce the complexity, while controlling the approximation error with respect to the original cone.
Ryan Adams
There has been a surge of recent interest in the analysis of network data, focused on improving our understanding of the statistical properties of naturally-occurring graphs. Typically, it is assumed that one can observe edges in such graphs directly, but what do we do when we cannot observe the network directly? For example, can we infer functional connectivity in the brain from neural spike recordings, or inter-gang rivalries from police logs? In this talk, I will discuss how Bayesian modeling of time series interactions allows us to coherently discover latent networks even when the edges cannot be observed. By combining a generalization of the Hawkes process with recent developments in exchangeable random graph models, we are able to find networks of interaction in neural recordings, crime data, and financial time series. This is joint work with Scott Linderman.
Anna Gilbert
In structural health monitoring, wireless sensors are deployed on structures (e.g., bridges, buildings) to collect data to generate structural models, to measure or to detect damage over time or due to natural disasters, and to measure real-time acceleration. Usually, the data are transmitted to a central repository where they are analyzed. One predominant analysis task is that of modal analysis, the inference of modal parameters such as modal frequencies and mode shapes, as
these features are used in damage detection algorithms. We analyze three different sampling schemes for data acquisition and modal analysis: (i) uniform time sampling, (ii) random time sampling, and (iii) uniform time sampling followed by random matrix multiplication. For each sampling method we provide sufficient conditions on the required sampling rate, the total sampling time span, and the total number of measurements in order to faithfully recover the underlying mode shape vectors. We also discuss several different algorithms for computing mode shape vectors under the different sampling schemes and we provide simulation results on both synthetic and real datasets to confirm our theoretical results. This is joint work with Jae Young Park (UMichigan) and Michael Wakin (Colorado School of Mines).
Martin J. Strauss
In Sparse Recovery, a challenger, Charlie, produces a matrix, Phi, and an adversary, Sigourney, produces a sparse, noisy signal, x. Charlie wins if he can recover x approximately from (a noisy version of) Phi.x. By analogy with channel coding, we can define the following versions of the game:
1. Shannon (oblivious): Sigourney produces x without seeing Phi.
2. Hamming (malicious): All-powerful Sigourney produces x from Phi.
3. Lipton (bounded): Sigourney sees Phi but has computational or other limitations.
It is known that, for Charley to win against a malicious Sigourney, there must be a strong quantitative bound on the noise (roughly, the ell-1 norm of the noise must be bounded), whereas Charley wins against an oblivious Sigourney even in the presence of more noise (ell-2 bounded). We show that, in various contexts, Charley can tolerate higher, ell-2 noise while sill letting Sigourney see Phi, provided Sigourney is computationally bounded. Based on work with Anna Gilbert, Brett Hemenway, Atri Rudra, and Mary Wootters. Hat tip to Moritz Hardt, Dick Lipton, and David Woodruff.
Karl Rohe
When estimating a sparse vector in the linear model, if the measurement matrix contains highly correlated columns, then the solution to the Lasso misaligns with the NP-hard L_0 minimization problem. This talk will discuss how preconditioning, a technique from numerical linear algebra, can decorrelate the measurement matrix in a way that circumvents the various conditions for consistency (e.g. RIP). Importantly, preconditioning can be applied after the data have been collected. As such, preconditioning is most relevant to scenarios in which physical constraints restrict the measurement matrix. This is joint work with Jinzhu Jia.
Karl Rohe
Transitivity describes the fact that friends of friends are likely to be friends. It is a recurring property in empirical networks and it is potentially a "local" manifestation of community structure. This is because people are tightly connected within communities. This poster will string together a triptych of theoretical results relating transitivity, sparsity, and fast algorithmic estimation/community detection. The first two results will examine some standard assumptions in the statistical literature on network inference. The last result will show that a fast algorithm has good estimation properties under a "local" Stochastic Blockmodel.
Rachel Ward
Sparse recovery guarantees in compressive sensing and related optimization problems often rely on strong incoherence assumptions which are hard to realize in practice. In this talk we discuss a finer notion of separation between bases, called the local coherence, and show that by sampling accordingly, sparse recovery guarantees extend to a broad class of sensing problems beyond incoherent systems. We discuss particular applications to MRI imaging, polynomial interpolation, and matrix completion, and end by discussing several open problems.
Andrea Bertozzi
Geometric methods have revolutionized the field of image processing and image analysis. I will review some of these classical methods including image snakes, total variation minimization, image segmentation methods based on curve minimization, diffuse interface methods, and state of the art fast computational algorithms that use ideas from compressive sensing. Recently some of these concepts have shown promise for problems in high dimensional data analysis and machine learning on graphs. I will briefly review the methods from imaging and then focus on the new methods for high dimensional data and network data.
Ronald DeVore
We introduce a greedy algorithm for recovering a manifold in a Banach space X by selecting snapshots of the manifold. This algorithm originated in Reduced Basis Methods for solving a parametric family of PDEs. We use a comparison with Kolmogorov widths to analyze the performance of the greedy algorithm.
David Lawlor
We present a framework for high-dimensional regression using the GMRA data structure. In analogy to a classical wavelet decomposition of function spaces, a GMRA is a tree-based decomposition of a data set into local linear projections. Moreover, for new points, GMRA admits a fast algorithm for computing the projection coefficients on the already-learned dictionary. Within each node of the tree one can also assign regression coefficients in any manner; here we study the simple case of weighted linear regression. We present some preliminary empirical results using galactic spectra from the Sloan Digital Sky Survey and indicate future research directions.
Raphy Coifman
We'll describe methodologies to obtain intrinsic data driven organizations for transposable arrays, we'll show that dual metrics like earth mover metrics arise naturally in a data learning context or diffusion geometry settings.
Robert Nowak
Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. We propose a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multi-subject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso. This is joint work with Chris Cox, Nikhil Rao, and Tim Rogers
Barbara Engelhardt
Latent factor models are used to identify a low-dimensional linear subspace within high-dimensional data; often, additional structure is added to the model for identifiability and interpretability of the resulting factors. In the analysis of gene expression data, though, latent factors capturing confounding effects, such as sex, age, or batch of the samples, do not have the same structure as the latent factors representing groups of co-expressed genes: the confounding effects are captured well by dense factors, whereas the co-expressed gene are modeled by sparse factors. We use a three parameter beta (TPB) prior on the factor loading matrix in a `three-groups' setting, and furthermore add a two-component mixture for each factor, in order to jointly model sparse and dense factors. Our model shows incredible performance in both simulated data, as quantified through novel robustness metrics of the latent subspace, and in real data, identifying co-expressed gene sets that are biologically meaningful.
Garvesh Raskutti
I consider the problem of learning a directed acyclic graph (DAG) model based on conditional independence testing. The most widely used approaches to this problem are variants of the PC algorithm. One of the drawbacks of the PC algorithm is that it requires the strong-faithfulness assumption, which has been show to be restrictive especially for graphs with cycles in the skeleton. In this paper, we propose an alternative method based on finding the permutation of the variables that yields the sparsest DAG. We prove that the constraints required for our sparsest permutation (SP) algorithm are strictly weaker than faithfulness and are necessary for any causal inference algorithm based on conditional independence testing. Through specific examples I show that the SP algorithm has better performance than the PC algorithm. In the Gaussian setting, we prove that our algorithm boils down to finding the permutation of the variables with sparsest Cholesky decomposition for the inverse covariance matrix. Using this connection, I show that in the oracle setting, where the true covariance matrix is known, the SP algorithm is in fact equivalent to $\ell_0$-penalized maximum likelihood estimation. This is joint work with Caroline Uhler (Assistant Professor IST Austria).
Natesh Pillai
In this talk we outline a theory for constructing shrinkage priors in high dimensional problems. These prior distributions are popular because the corresponding MCMC algorithms mix very quickly. However, nothing much is known about their statistical efficiency. We present some results in this direction and also give a new prior which is both statistically and computationally efficient. This is joint work with Anirban Bhattacharya, Debdeep Pati and David Dunson.
Amit Singer
Consider N points in d-dimension and M > 2 local coordinate systems that are related by unknown rigid transforms. For each point we are given (possibly noisy) measurements of its local coordinates in some of the coordinate systems. Alternatively, for each coordinate system we observe the coordinates of a subset of the points. We study the problem of estimating the global coordinates (up to a global rigid transform) of the points from such measurements. This problem comes up in distributed approaches to molecular conformation (determination of molecular structures using NMR) and sensor network localization, astrometry, and also in computer vision and graphics applications. We formulate a least-squares-based program, in which the variables are the global coordinates and the rigid transforms associated with each local coordinate system. While there exists a well-known closed form SVD-based solution when M = 2, for M > 2 the least-squares problem is non-convex and no closed-form solution is known. We show that the non-convex least-squares problem can be relaxed into a standard semidefinite program (SDP), and demonstrate how the global coordinates (and rigid transforms) can be efficiently computed from this SDP. Under specific measurement and noise models, we prove that the proposed convex relaxation exactly and stably recovers the global coordinates from the noisy local coordinates. Joint work with Kunal Chaudhury and Yuehaw Khoo.
Robert Nowak
Sampling from distributions to find the one with the largest mean arises in a broad range of applications, and it can be mathematically modeled as a multi-armed bandit problem in which each distribution is associated with an arm. This whiteboard presentation studies the sample complexity of identifying the best arm (largest mean) in a multi-armed bandit problem. Motivated by large-scale applications, we are especially interested in identifying situations where the total number of samples that are necessary and sufficient to find the best arm scale linearly with the number of arms. We present a single-parameter multi-armed bandit model that spans the range from linear to superlinear sample complexity. We also give a new algorithm for best arm identification, called PRISM, with linear sample complexity for a wide range of mean distributions. The algorithm, like most exploration procedures for multi-armed bandits, is adaptive in the sense that the next arms to sample are selected based on previous samples. We compare the sample complexity of adaptive procedures with simpler non-adaptive procedures using new lower bounds. For many problem instances, the increased sample complexity required by non-adaptive procedures is a polynomial factor of the number of arms.
Yi Yang
In compressive sensing(CS), the goal is to recover signals at reduced sample rate compared to the classic Shannon-Nyquist rate. The quantization of CS measurements has been studied recently and it has been shown that accurate and stable signal acquisition is possible even when each measurement is quantized to only one single bit. There are many algorithms proposed for 1-bit compressive sensing and they work well when there is no noise in the measurements, while the performance is worsened when there are a lot of sign flips in the measurements. We propose a robust method for recovering signals from 1-bit measurements using adaptive outlier pursuit. This method will detect the positions where sign flips happen and recover the signals using “correct” measurements. Numerical experiments show the accuracy of sign flips detection and high performance of signal recovery for our algorithms compared with other algorithms.
Joshua Vogelstein
Two-sample tests are an important class of problems in statistics, with abundant applications ranging from astrophysics to zoology. However, much of the previous art assumes the data samples live in finite dimensional Euclidean space. Here, we consider a foray into two-sample testing when the objects live in a non-Euclidean space, in particular, for graph-valued observations. Via embedding each graph into Euclidean space, and then learning a manifold along which they reside, we demonstrate the existence of a test, such that for a given confidence level a, we obtain power b >a. Simulations and real data applications demonstrate the pragmatic utility of our approach even for very large graphs.
Michael Gehm
Recently, a multi-institutional team led by Duke University demonstrated a new approach to ultra-large-scale image acquisition. In this talk I will discuss the motivations that led to the development of this approach and the enabling technologies as well as how the such a system connects to (and can benefit from) the ideas central to this workshop.
Jianbo Yang and Xin Yuan
A real-time "leave-bag-behind" detection algorithm is proposed based on the Robust Principal Component Analysis (RPCA). RPCA methods decompose a data matrix into low-rank and sparse components. Given a video sequence, the low-rank component consists of a near-stationary background and the sparse component characterizes a kinetic foreground. By exploiting the variance of the background and foreground in real-time, we detect the "leave-bag-behind" events and provide warnings. The algorithm is verified by real data.
Rajarsri Guhaniyogi
We propose a novel algorithm for fast online Bayesian inference, motivated by streaming data with a large number of features. Conditional Density Filtering (CDF) is designed as a powerful adaptation of Gibbs sampling for parameter learning and online prediction. By propagating conditional sufficient statistics of the observed data in time, C-DF yields an online approximation to the joint posterior distribution over model parameters. It does so by iteratively updating the parameters based on approximations to their conditional posterior distributions. The algorithm is illustrated through an application to Bayesian supervised compressed regression; C-DF demonstrates excellent predictive performance and feature selection, with valid measures of uncertainty. This is a joint work with Shaan Qamar and David B. Dunson.
Jinyoung Kim
The subthalamic nucleus (STN) within the sub-cortical region of the Basal ganglia is a crucial targeting structure for Deep brain stimulation (DBS) surgery, in particular for alleviating Parkinson’s disease symptoms. Volumetric segmentation of such small and complex structure, which is elusive in clinical MRI protocols, is thereby a pre-requisite process for reliable DBS targeting. While direct visualization and localization of the STN is facilitated with advanced high-field 7T MR imaging, such high fields are not always clinically available. In this study, we focus on the automatic shape prediction of the STN, exploiting the spatial dependency of the STN on its adjacent structures as predictors, some of them easier to localize with standard clinical procedures. Variation modes of the STN and its predictors on five high-quality training sets obtained from 7T MR imaging are first captured using a statistical shape model. We then exploit the partial least squares regression (PLSR) method to induce the spatial relationship between the STN and its predictors. Experimental results demonstrate that the spatial dependency of STNs on its predictors contributes to directly predicting the STN with accurate position, dimension, and orientation on 7T MR and on clinical 1.5T MR imaging.
Jiang Li
Manifold learning performs dimensionality reduction by identifying low-dimensional structures (manifolds) embedded in a high-dimensional space. Many algorithms involve an eigenvector or singular value decomposition (SVD) procedure on a similarity matrix of size n by n, where n denotes the number of data samples, making them not scalable to big data. A method to overcome large data set size is to create a manifold with a subset of the original data while embedding the rest into the manifold skeleton. An adequate number of neighbors varies and depends on the geometry of the manifold. Points that contain too few neighbors may not be able to encompass the intrinsic manifold geometry. Conversely, too many neighbors will cause a short circuit in the manifold. To overcome these problems, we introduce a novel adaptive neighbor selection approach using the L1 norm based optimization. We show that this neighborhood selection can be useful in creating a more robust manifold in regards to MRI brain tumor data. Joint work with Loc Tran, Kevin Hu, and Jihong Wang.
Stas Minsker
We describe a universal method which can be used to construct the estimators with tight concentration around the true parameter of interest. In particular, we propose new robust estimators of multivariate mean and covariance matrix of a heavy-tailed distribution.
Liming Wang
A new generalized Bregman divergence is developed, and utilized to unify vector Poisson and Gaussian channel models, from the perspective of the gradient of mutual information. The gradient is with respect to the measurement matrix in a compressive-sensing setting, and mutual information is considered for signal recovery and classification. The generalized Bregman divergence induces numerous matrix-valued metrics, which are discussed, as are other properties of the new divergence. The generalized Bregman divergence is also utilized to connect the relative entropy and mismatched minimum mean square error. We also present applications based on the derived theoretical results.
Ryan Adams
Bayesian approaches to data analysis are often conceptually appealing, but they can be computationally challenging to implement. One of the most general and powerful tools for manipulating such models is Markov chain Monte Carlo (MCMC), in which samples from complicated posterior distributions can be generated by simulation of a Markov transition operator. Unfortunately, MCMC is a sequential procedure and so it has not generally been able to benefit from a computational landscape that is moving towards parallelism. In this talk, I will discuss some recent work I have been doing to develop general-purpose MCMC algorithms that can take advantage of hundreds or thousands of cores.
Takanori Watanabe
Schizophrenia is a neurodevelopmental disorder with extensive evidence of neural disconnection. There is substantial interest in developing machine-based methods that reliably distinguish schizophrenic subjects from healthy controls using functional neuroimaging data. Toward this end, we apply support vector machines with embedded feature selection to 126 resting state scans from the Center for Biomedical Research Excellence sample. In particular, we propose a variable splitting scheme that converts an unconstrained optimization problem to an equivalent constrained problem, which can be solved efficiently using the alternating direction method of multipliers. We demonstrate that the inner sub-problems associated with the algorithm can be solved exactly and non-iteratively using fast Fourier transform (FFT) on the augmented parameter vector.
Sumanta Basu
We consider the problem of estimation, model selection and forecasting in high dimenisonal vector autorgressive (VAR) processes. High dimensional VAR models are a natural choice in the analysis of Economic and Financial data, where one is typically interested in learning the nature of relationship among a large number of variables from time series data. We propose an l1-regualized regression framework for estimating the model coefficients. Assuming the order of the VAR process to be known, we establish estimation and model selection consistency of penalized least square and maximum likelihood estimates leading to improved prediction accuracy over standard least square estimates. We demonstrate the performance of the proposed methodology using an extensive set of simulation studies and a real data application from financial econometrics.
Reema Al-Aifari
In Computerized Tomography a 2D or 3D object is reconstructed from projection data (Radon transform data) from multiple directions. When the X-ray beams are sufficiently wide to fully embrace the object and when the beams from a sufficiently dense set of directions around the object can be used, this problem and its solution are well understood. When the data are more limited the image reconstruction problem becomes much more challenging; leading to configurations where only a subregion of the object is illuminated from all angles. In this presentation we consider a limited data problem in 2D Computerized Tomography which gives rise to a restriction of the Hilbert transform as an operator T from L2(b,d) to L2(a,c) for real numbers a < b < c < d. We present the framework of tomographic reconstruction from limited data and the method of differentiated back-projection (DBP) which leads to the operator T. The reconstruction from the DBP method requires recovering a family of 1D functions f supported on compact intervals [b,d] from its Hilbert transform measured on intervals [a,c] that might only overlap, but not cover [b,d]. We relate the operator T to a self-adjoint two-interval Sturm-Liouville problem, which allows to explore the spectrum of T*T. We find that T is not compact but its inversion is ill-posed. With these results, we then address the question of the rate of convergence of the singular values which we find by performing an asymptotic analysis on the eigenfunctions of the related Sturm-Liouville problem for large eigenvalues. We conclude by illustrating the properties obtained for the SVD of T numerically.
Quentin Berthet
In the context of sparse principal component detection, we bring evidence towards the existence of a statistical price to pay for computational efficiency. We measure the performance of a test by the smallest signal strength that it can detect and we propose a computationally efficient method based on semidefinite programming. We also prove that the statistical performance of this test cannot be strictly improved by any computationally efficient method. Our results can be viewed as complexity theoretic lower bounds conditionally on the assumptions that some instances of the planted clique problem cannot be solved in randomized polynomial time.
Jordan Hashemi
Research on autism spectrum disorder (ASD) suggests behavioral markers can be observed late in the first year of life. Many of these studies involve extensive frame-by-frame video observation and analysis of a child’s natural behavior. Although non-intrusive, these methods are extremely time-intensive and require a high level of observer training; thus, they are impractical for clinical and large population research purposes. This work aims at helping clinicians and general practitioners accomplish this early detection/measurement task automatically. We focus on providing low-cost computer vision tools to measure and identify ASD behavioral markers based on components of the Autism Observation Scale for Infants (AOSI). In particular, we develop algorithms to measure critical AOSI tasks and activities that assess visual attention by tracking facial features. Using the tracked facial features, we estimate the participants' head motions during the specific tasks and acitivities. We show results demonstrating that the proposed system provides insightful knowledge to augment the clinician’s behavioral observations obtained from real in-clinic assessments.
Mariano Tepper
We address the problem of multiple parametric model estimation from data corrupted by noise and outliers. Our main contribution is to provide a novel formulation for this problem. Inscribed in the RANSAC paradigm, the proposed formulation characterizes multiple model estimation as a bi-clustering problem, obtaining a conceptually simple and at the same time descriptively rich modeling. For demonstrating the performance of the proposed framework, we also present a simple modification of the standard sparse SVD bi-clustering algorithm, tacking into account both the binary nature of the problem and the possibility of models sharing data points. We also show that by using statistical testing, we can simplify the input of the bi-clustering algorithm, rendering the pattern-discovery process easier and faster. The experimental results, ranging from geometric structures to image correspondences recovery, show that the proposed formulation is extremely suitable for efficient multiple model estimation.
Volkan Cevher
We propose an algorithmic framework for convex minimization problems of a composite function with two terms: a self-concordant function and a possibly nonsmooth regularization term. Our method is a new proximal Newton algorithm that features a local quadratic convergence rate. As a specific instance of our framework, we consider the sparse inverse covariance matrix estimation in graph learning problems. Via a careful dual formulation and a novel analytic step-size selection procedure, our approach for graph learning avoids Cholesky decompositions and matrix inversions in its iteration making it attractive for parallel and distributed implementations.
Howard Bondell
For high-dimensional data, selection of predictors for regression is a challenging problem. Methods such as sure screening, forward selection, or penalization are commonly used. Instead, Bayesian variable selection methods place prior distributions over model space, along with priors on the parameters, or equivalently, a mixture prior with mass at zero for the parameters in the full model. Since exhaustive enumeration is not feasible, posterior model probabilities are often obtained via long MCMC runs. The chosen model can depend heavily on various choices for priors and also posterior thresholds. Alternatively, we propose a conjugate prior only on the full model parameters and to use sparse solutions within posterior credible regions to perform selection. These posterior credible regions often have closed form representations, and it is shown that these sparse solutions can be computed via existing algorithms. The approach is shown to outperform common methods in the high-dimensional setting, particularly under correlation. By searching for a sparse solution within a joint credible region, consistent model selection is established. Furthermore, it is shown that the simple use of marginal credible intervals can give consistent selection up to the case where the dimension grows exponentially in the sample size.
Marco Duarte
Existing approaches to compressive sensing of frequency-sparse signals focuses on signal recovery rather than spectral estimation. Furthermore, the recovery performance is limited by the coherence of the required sparsity dictionaries and by the discretization of the frequency parameter space. In this paper, we introduce a greedy recovery algorithm that leverages a band-exclusion function and a polar interpolation function to address these two issues in spectral compressive sensing. Our algorithm is geared towards line spectral estimation from compressive measurements and outperforms most existing approaches in fidelity and tolerance to noise. This is joint work with Karsten Fyhn (Aalborg University) and Hamid Dadkhahi.
Qiang Qiu
We propose a low-rank transformation-learning framework for recognition and clustering. Many high-dimensional data, such as face images and motion sequences, lie in a union of low-dimensional subspaces. However, low-dimensional intrinsic structures are often violated for real-world observations, as they can be corrupted by errors or deviate from ideal models. We propose to address this by learning a linear transformation on subspaces using matrix rank, via its convex surrogate nuclear norm, as the optimization criteria. The learned linear transformation restores a low-rank structure for data from the same subspace, and, at the same time, forces a high-rank structure for data from different subspaces. In this way, we reduce variations within the subspaces, and increase separations between the subspaces for more accurate recognition or clustering. Joint work with Guillermo Sapiro.
Xin Jiang
Compressed sensing is a useful paradigm for signal reconstruction with limited measurements. Past theoretical results suggest that as the number of sensors increases, the signal reconstruction error decays at a reasonable rate. However many of the past theoretical results do not account for the physical constraints of real-world systems (e.g., signal-dependent noise, intensity constraints, etc.). Recent work provides upper bounds and simulation results for the performance of photon-limited compressed sensing under Poisson noise. These results suggest that in this photon-limited Poisson noise setting, reconstruction error does not decrease and may increase as the number of sensors increase. However there exist no lower bounds on reconstruction error to ensure that the rates cannot be improved. In this work, we supplement the previous upper bounds with sharp minimax lower bounds for a more general setting that includes the photon-limited case of interest. Our lower bounds confirm that as the previous upper bounds and simulations suggest, the reconstruction error does not decrease as the number of sensors increases. This is joint work with Garvesh Raskutti and Rebecca Willett.
Peng Guan
We consider an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space. The agent's action at each time step is to specify the probability distribution for the next state given the current state. Following the set-up of Todorov, the state-action cost at each time step is a sum of a nonnegative state cost and a control cost given by the Kullback-Leibler divergence between the agent's next-state distribution and that determined by some fixed passive dynamics. The online aspect of the problem is due to the fact that the state cost functions are generated by a dynamic environment, and the agent learns the current state cost only after having selected its action. We give an explicit construction of a computationally efficient strategy that has small regret (i.e., the expected difference between its actual total cost and the smallest cost attainable using noncausal knowledge of the state costs) under mild regularity conditions, and demonstrate the performance of the proposed strategy on a simulated target tracking problem. This is joint work with Maxim Raginsky and Rebecca Willett
Joan Bruna
Scattering Representations are a structured family of deep convolutional networks providing invariance to a given group of transformations and Lipscthitz continuity with respect to deformations. They are constructed by cascading complex wavelet decompositions and modulus operators, followed by averaging kernels. We extend this construction on general datasets by defining a diffusion process, which extracts smooth components, and a demodulation process, which recovers high frequencies and maps them to slower features. The resulting cascade has the architecture of a deep network, where filters can be learnt in a non-supervised or supervised fashion, and non-linearities are given by $l_2$ pooling operators. Whereas diffusion operators are linear and can be learnt with local component analysis, the demodulation step requires learning a nonlinear map. We show that group sparsity and slow feature learning provide a valid mechanism to obtain such representations.
Justin Romberg
Sparse signal recovery often involves solving an L1-regularized optimization problem. Most of the existing algorithms focus on the static settings, where the goal is to recover a fixed signal from a fixed system of equations. This talk will have two parts. In the first, we present a collection of homotopy-based algorithms that dynamically update the solution of the underlying L1 problem as the system changes. The sparse Kalman filter solves an L1-regularized Kalman filter equation for a time-varying signal that follows a linear dynamical system. Our proposed algorithm sequentially updates the solution as the new measurements are added and the old measurements are removed from the system. In the second part of the talk, we will discuss a continuous time "algorithm" (i.e. a set of coupled nonlinear differential equations) for solving a class of sparsity regularized least-square problems. We characterize the convergence properties of this neural-net type system, with a special emphasis on the case when the final solution is indeed sparse. This is joint work with M. Salman Asif, Aurele Balavoine, and Chris Rozell.
Justin Romberg
We consider the problem of recovering two unknown vectors, w and x, of length L from their circular convolution. We make the structural assumption that the two vectors are members known subspaces, one with dimension N and the other with dimension K. Although the observed convolution is nonlinear in both w and x, it is linear in the rank-1 matrix formed by their outer product wx∗. This observation allows us to recast the deconvolution problem as low-rank matrix recovery problem from linear measurements, whose natural convex relaxation is a nuclear norm minimization program. This gives us both a new algorithm for blind deconvolution, and an avenue for adapting results from the theory of low-rank matrix recovery to analyze when this algorithm is effective.
Brandon Osting
We study the dependence of the statistical ranking problem on the available pairwise data and propose a framework for optimally querying new data to maximally increases the Fisher information of the ranking.
Volkan Cevher
Group-based sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing. The main promise of these models is the recovery of "interpretable" signals along with the identification of their constituent groups. To this end, we establish a combinatorial framework for group-model selection problems and highlight the underlying tractability issues revolving around such notions of interpretability when the regression matrix is simply the identity operator. We show that, in general, claims of correctly identifying the groups with convex relaxations would lead to polynomial time solution algorithms for a well-known NP-hard problem, called the weighted maximum cover problem. Instead, leveraging a graph-based understanding of group models, we describe group structures which enable correct model identification in polynomial time via dynamic programming. We also show that group structures that lead to totally unimodular constraints have tractable discrete as well as convex relaxations. Finally, we study the Pareto frontier of budgeted group-sparse approximations for the tree-based sparsity model and illustrate identification and computation trade-offs between our framework and the existing convex relaxations.
Ben Recht
We often model signals from the physical world with continuous parameterizations. Unfortunately, continuous models pose problems for the tools of sparse approximation, and popular discrete approximations are fraught with theoretical and algorithmic issues. In this talk, I will propose a general, convex-optimization framework---called atomic-norm denoising----that obviates discretization and gridding by generalizing sparse approximation to continuous dictionaries. As an extended example that highlights the salient features of the atomic-norm framework, I will highlight the problem of estimating the frequencies and phases of a mixture of complex exponentials from noisy, incomplete data. I will demonstrate that atomic-norm denoising outperforms state-of-the-art spectral and sparse-approximation methods in both theory and practice. I will then close with a discussion of additional applications of the atomic-norm framework including deconvolution, deblurring, and system identification. Joint work with Badri Bhaskar, Parikshit Shah, and Gongguo Tang
Cun Mu
Recovering a low-rank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially suboptimal: reliably recovering a $K$-way tensor of length $n$ and Tucker rank $r$ from Gaussian measurements requires $\Omega( r n^{K-1} )$ observations. In contrast, a certain (intractable) nonconvex formulation needs only $O(r^K + nrK)$ observations. We introduce a very simple, new convex relaxation, which partially bridges this gap. Our new formulation succeeds with $O(r^{\lfloor K/2 \rfloor}n^{\lceil K/2 \rceil})$ observations. While these results pertain to Gaussian measurements, simulations strongly suggest that the new norm also outperforms the sum of nuclear norms for tensor completion from a random subset of entries. Our lower bounds for the sum-of-nuclear-norm model follow from a new result on simultaneously structured models, which may be of independent interest for matrix and vector recovery problems.
Matt Nokleby
Motivated by applications in high-dimensional signal processing, we derive fundamental limits on the performance of compressive linear classifiers over Gaussian mixture models. By analogy with Shannon theory, we define the classification capacity, which quantifies the maximum number of classes that can be discriminated with low probability of error, and the diversity-discrimination tradeoff, which quantifies the tradeoff between the number of classes and the probability of classification error. We identify a duality between communications over non-coherent multiple-antenna channels and classification of Gaussian mixtures, which allows us to leverage existing tools to characterize the classify the classification performance in the regime of high signal-to-noise ratio. In particular, we show that the capacity and diversity-discrimination tradeoff are optimized by choosing classes that correspond to low-dimensional subspaces drawn from the appropriate Grassmann manifold, with performance scaling as a function of the dimensionality. These results have implications for the design of dictionary learning algorithms.
Andrew Harms
The Random Demodulator (RD) is a recently proposed architecture for sub-Nyquist sampling of sparse, bandlimited signals that leverages ideas from Compressed Sensing. An integral piece of the RD is a random, bipolar waveform that modulates the input signal prior to sampling. For the RD, this waveform is generated from a statistically independent sequence. We propose a modification to the RD, called the Knowledge Enhanced Random Demodulator, that uses constrained random sequences with statistically correlated entries. Such sequences have a characteristic power spectrum. We show that the theoretical reconstruction guarantees of the Knowledge Enhanced RD, and by extension the RD, are related to the power spectrum of the random sequence. The advantages of using constrained random sequences are a reduction in the rate of transitions in the waveform, through the use of run-length limited sequences, and the ability to tune the Knowledge Enhanced RD to particular input signals. In regard to this last point, we show improved reconstruction performance when the power spectrum of the random sequence matches the prior distribution of energy over the tones of the input signal. This is joint work with Waheed Bajwa and Robert Calderbank.
Huiyi Hu
We present a computationally efficient algorithm utilizing a fully or semi nonlocal graph Laplacian for solving a wide range of learning problems in data clustering and image processing. Motivated by diffuse interface models based on the Ginzburg-Landau functional, we propose an adaptation of the classic Merriman-Bence-Osher (MBO) scheme for graph-based methods and also make use of fast numerical solvers for finding eigenvalues and eigenvectors of the graph Laplacian. We present various computational examples to demonstrate the performance of our model, which is successful on images with texture and repetitive structure due to its nonlocal nature. Another line of work presented here is the detection of cohesive groups of nodes (``communities'') in network science. One popular approach is to maximize a quality function known as ``Modularity'' to achieve some sort of optimal clustering of nodes. We interpret the modularity function from a novel perspective: we reformulate modularity optimization as a minimization problem of an energy functional that consists of a total variation term and an L2 balance term. By employing numerical techniques from image processing and L1 compressive sensing---such as convex splitting and MBO scheme -- we develop a variational algorithm for the minimization problem. We present our computational results using both synthetic benchmark networks and real data.
Dan Sussman
The eigen-decomposition of an adjacency matrix for a graph provides a method to represent each vertex as a point in Euclidean space which makes the graph amenable to subsequent inference and analysis. In this talk we will present a recent theorem and conjecture that state that, under the random dot product model for random graphs, the components of the appropriately scaled eigenvectors corresponding to the largest eigenvalues of the adjacency matrix are asymptotically distributed according to a specified mixture of normal distributions. The proof relies on the power method and concentration inequalities for the eigenvectors and works when the dimension of the latent positions is 1. Due to empirical evidence, we also conjecture that the theorem holds in higher dimensions.
Laura Balzano
The GROUSE (Grassmannian Rank-One Update Subspace Estimation) algorithm for subspace estimation with missing data implements incremental gradient in order to minimize a data fit cost function over all subspaces of a given dimension. Because the minimization is over all subspaces (i.e., constrained to the Grassmannian), it is non-convex. However, in experiments GROUSE enjoys repeated success when the true data are actually low-rank or when there is a small amount of noise. In our pursuit of a deeper understanding, this poster first presents results on the local convergence of GROUSE. Within a local region of the global solution, GROUSE converges linearly to the optimal point at a rate dependent on the fraction of observed entries. We then relate the GROUSE algorithm to the Singular Value Decomposition, another algorithm which minimizes a non-convex cost (again, a data-fit cost function over all possible subspaces) but has a provable polynomial time guarantees in seeking the global minimum.
Francesco Renna
An $s$-sparse, $n$-dimensional signal drawn from a Gaussian mixture can be recovered reliably with just $s+1$ random projections in the low-noise regime and reconstruction is performed via the closed-form conditional mean estimator. Reconstruction accuracy is evaluated in terms of the minimum mean-squared error (MMSE) and the minimum number of measurements needed to drive the MMSE to zero (the phase transition) is characterized within one measurement only, for both random and designed projection kernels. Joint work with R. Calderbank, L. Carin and M. R. D. Rodrigues.
Yanting Ma
We present a two-part reconstruction framework for signal recovery in compressed sensing (CS). Our framework combines the advantages of two different algorithms, and allows to accelerate the reconstruction procedure without compromising the reconstruction quality. To illustrate how this combination can work, we describe a Noisy-Sudocodes algorithm, which extends our original noiseless Sudocodes algorithm to perform fast reconstruction in the presence of noise. We apply Noisy-Sudocodes in a 1-bit CS setting by utilizing a modified 1-bit quantizer in Part 1 and a 1-bit CS reconstruction algorithm in Part 2. Promising numerical results indicate that our algorithm offers both a reduction in run-time and improvement in reconstruction quality.
Sinan Gunturk
We will discuss the performance and applicability of some of the readily available and newly proposed quantization strategies in the setting of redundant measurements (frames) as well as compressive measurements subject to sparsity assumptions. An important distinction to be made in this talk will involve noise shaping quantizers as opposed to memoryless ones, where the meaning of noise shaping will be redefined.
Swayambhoo Jain and Akshay Soni
This work considers an estimation task in compressive sensing, where the goal is to estimate an unknown signal from compressive measurements that are corrupted by additive pre-measurement noise (interference, or “clutter”) as well as post-measurement noise, in the specific setting where some (perhaps limited) prior knowledge on the signal, interference, and noise is available. The specific aim here is to devise a strategy for incorporating this prior information into the design of an appropriate compressive measurement strategy. Here, the prior information is interpreted as statistics of a prior distribution on the relevant quantities, and an approach based on Bayesian Experimental Design is proposed. Experimental results on synthetic data demonstrate that the proposed approach outperforms traditional random compressive measurement designs, which are agnostic to the prior information, as well as several other knowledge-enhanced sensing matrix designs based on more heuristic notions.
Farhad Pourkamali Anaraki
Compressive sensing allows us to recover signals that are linearly sparse in some basis from a smaller number of measurements than traditionally required. However, it has been shown that many classes of images or video can be more efficiently modeled as lying on a nonlinear manifold, and hence described as a non-linear function of a few underlying parameters. Recently, there has been growing interest in using these manifold models to reduce the required number of compressive sensing measurements. However, the complexity of manifold models has been an obstacle to their use in efficient data acquisition. In this paper, we introduce a new algorithm for applying manifold models in compressive sensing using kernel methods. Our proposed algorithm, kernel compressive sensing (KCS), is the kernel version of classical compressive sensing. It uses dictionary learning in the feature space to build an efficient model for one or more signal manifolds. It then is able to formulate the problem of recovering the signal's coordinates in the manifold representation as an underdetermined linear inverse problem as in traditional compressive sensing. Standard compressive sensing recovery methods can thus be used to recover these coordinates, avoiding additional computational complexity. We present experimental results demonstrating the efficiency and efficacy of this algorithm in manifold-based compressive sensing.
Yue Lu
In this whiteboard presentation, we consider the problem of estimating a parameter by using a finite collection of different statistical experiments. Based on the observations made so far, we will adaptively select the next experiment that provides the most “information” about the unknown parameter. Summarizing the past information with finite memory, we represent such adaptive sensing schemes as finite-state parametric Markov chains and establish a formula linking the asymptotic performance of the parameter estimation to the steady-state distributions of the Markov chains. Consequently, finding optimal adaptive strategies can be reformulated as the problem of designing a (continuous) family of Markov chains with prescribed steady-state distributions. We propose a quantitative design criterion for optimal sensing policies based on minimax ratio regret, and present numerical solutions to the associated optimization problem. Finally, we demonstrate one application of this adaptive sensing scheme in high-speed single-photon imaging.
Pablo Sprechmann
In recent years, a lot of attention has been devoted to learning binary hash functions for efficient nearest neighbor search. In particular, similarity-preserving hashing techniques have proven to be very efficient tools in various problems related to large-scale information retrieval. However, existing hashing techniques suffer from an intrinsic trade-off between performance and computational complexity. While longer hash codes allow for lower false positive rates, it is very difficult to increase the embedding dimensionality without incurring in very high false negatives rates or prohibiting computational costs. In this work, we propose a way to overcome this limitation by enforcing the hash codes to be sparse. Such sparse high-dimensional codes enjoy from the low false positive rates typical of long hashes, while keeping the false negative rates similar to those of a shorter dense hashing scheme with equal number of degrees of freedom. Sparsity also plays a crucial role in keeping the computational complexity of retrieval low, even when looking for neighbors within a large Hamming radius. A tailored feed-forward neural network is used for the hashing function. Experimental evaluation involving visual and multi-modal data shows the benefits of the proposed method producing, in certain settings, up to an order of magnitude better performance than current methods. This is joint work with Jonathan Masci, Alex Bronstein, Michael Bronstein, and Guillermo Sapiro.
Galen Reeves
Recent work on Approximate Message Passing algorithms in compressed sensing focuses on `ideal' algorithms which at each iteration face a subproblem of recovering an unknown sparse signal in Gaussian white noise. The noise level in each subproblem changes from iteration to iteration in a way that depends on the underlying signal (which we don't know!). For such algorithms to be used in practice, it seems we need an estimator that achieves the MMSE when the noise level is unknown. In this paper we solve this problem using convex optimization, Stein Unbiased Risk Estimates and Huber Splines.
Deanna Needell
We present two distinct pursuit-like algorithms that we call the "superset method" and "partial inversion" for recovery of sparse vectors from consecutive Fourier measurements in the super-resolution regime. The methods rely on subspace identification that utilizes properties of the Fourier transform, and removal steps to estimate the solution's support. The superset method is always successful in the noiseless regime (unlike L1-minimization) and generalizes to higher dimensions (unlike the matrix pencil method). Relative robustness to noise is demonstrated numerically for both methods.
Vladimir Koltchinskii
We consider a version of noisy matrix completion in which a symmetric kernel is defined on the vertex set of a weighted graph and the goal is to estimate the kernel based on its i.i.d. noisy observations at randomly picked couples of vertices. The underlying assumption is that the kernel is low rank and, at the same time, “smooth” on the graph. The degree of smoothness is characterized by discrete Sobolev norms defined in terms of graph Laplacian.
We obtained minimax lower bounds (in the classes of symmetric kernels of a given rank and given smoothness) on the error (measured by the Frobenius norm) of an arbitrary estimator of the target kernel and developed estimation methods for which such bounds are attained up to log factors. The results show that the error rates are becoming “nonparametric” when the number of vertices of the graph is very large, and they depend both on the rank of the target kernel and its smoothness.
Jarvis Haupt
Recent advances in compressive sensing (CS) have established that high-dimensional signals possessing a sparse representation in some basis or dictionary can often be recovered from relatively few measurements. Consequently, CS strategies have been proposed and examined in a number of application domains where sensing resource efficiency is of primary importance. In this work we examine a type of anomaly detection problem that can be viewed as a generalization of the model selection or support recovery task in CS. Our aim here is to identify the locations of a nominally small number of outliers in a large collection of data (which may be scalar or multivariate) from a small number of observations of the form of linear combinations of subsets of the data elements. While traditional CS techniques are empowered by data sparsity, our approach here utilizes a generalized notion of sparsity that we describe as "saliency," a term chosen to embody the notion that the outliers are deemed so merely by virtue of their deviation from the characteristics exhibited by the bulk of the data. We propose a general approach called Compressive Saliency Sensing (CSS), comprised of a randomized linear sensing strategy and associated computationally efficient inference procedure, and establish that under a few relatively mild assumptions on the underlying data being acquired the CSS procedure accurately identifies the locations of k data outliers in a collection of n items from m=O(k log n) linear measurements. We discuss the implications of our main result in several stylized problems including traditional sparse support recovery, identification of outliers in k-simple signals defined as nominally binary vectors having k entries strictly in (0,1), and in an application motivated by computer vision tasks where our approach can be used to identify "saliency maps" from only a few compressive measurements (i.e., without requiring storage or processing of the full, potentially high dimensional, image).
Radu Balan
In this talk we present stability results regarding sets of vectors that allow vector reconstruction up to a constant phase factor from its sequence of frame coefficient magnitudes. In particular we show that if there is a set of 'm' vectors in C^n that allows such a reconstruction, then the set of frames with this property is open. In this context we shall comments on the recent 4n-4 construction by B. Bodmann of such a set, and its stability.
Alan Qi
In this talk, I will present our work on Bayesian learning for high dimensional data (p≫n) and data streams (n≫p). For this first part, I will describe how we incorporate eigenstructure of data and network constraints to select correlated variables with applications to the study of Alzheimer's disease and cancer pathway analysis. I will also discuss the connection of our work to eigenvector localization. For the second part, I will present our work on virtual vector machine (VVM). Instead of using only current data point or a data minbatch to conduct online learning, VVM adaptively updates a compact representation of the data stream to maintain an approximate posterior distribution not limited in the exponential family. Experimental results demonstrate its improved performance over online SVMs and online Gaussian process classification approaches.
Yan Kaganovsky
We study sampling strategies and reconstruction algorithms for compressive X-ray computerized tomography (CT). Generally, the problem can be stated as the reconstruction of a 2D image from measurements of it's line integrals, with the image representing attenuation of x-ray radiation along those lines per unit length. The data is collected in a real medical CT system where the object under test is being illuminated by a divergent fan beam emanating from a point source that moves along a circle to different view angles. We study the feasibility of compressive measurements where we strongly under-sample the object of interest, by either using less view angles or by blocking parts of the fan beam for each view. This is motivated by the fact that under-sampling in medical CT corresponds to less radiation to patients. In addition, blocking parts of the fan beam allows to measure X-ray scatter which is utilized to improve the noise model and therefore improve reconstruction quality, and can also be used to infer spectral properties of the object which are important for explosive detection. We utilize Bayesian compressive sensing (BCS) algorithms that exploit the sparsity of the imaged object in the wavelet basis. More sophisticated priors that make use of the tree structure of the wavelet decomposition are also investigated. Comparison to standard algorithms used in CT shows the advantages of the proposed algorithms in terms of image quality. We also investigate the inference of the noise variance from measurements by exploiting the specific physical model. Interestingly, in this problem the noise variance has a non-linear dependence on the attenuation (which is unknown). In x-ray CT, the noise variance depends on the intensity of the source and can fluctuate from scan to scan but measuring the noise level requires changing the detectors and is not done before every scan. The noise level inference offered by BCS can therefore improve performance. BCS also allows to perform adaptive measurements where one sequentially chooses which parts of the beam to block when moving the source to the next view angle or even to choose to which view angle to proceed. Finally, we consider the application of BCS to large scale problems and investigate GPU implementations. Joint work with L. Carin.
Shannon Hughes
Solving inverse problems in signal processing often involves making prior assumptions about the signal being reconstructed. Here the appropriateness of the chosen model greatly determines the quality of the final result. Recently, it has been proposed to model images by representing them in terms of smaller patches, each arising from an underlying manifold. This model has been shown to be surprisingly effective in tasks such as denoising, inpainting, and superresolution. However, applying this model to whole images can be fraught with the difficulty of finding the intersection of many presumably non-linear manifolds in high-dimensional space, which precludes much of its potential use. We propose an efficient method of solving this problem using a kernel methods variant of the projection onto convex sets algorithm to quickly find the intersection of many manifolds while learning their non-linear structure. Indeed the final solution can even be expressed in closed form. We foresee our method allowing a patch-based regularization to be applied across a wide variety of inverse problems, including compressive sensing, inpainting, deconvolution, etc. Indeed, we show a proof of concept of our approach on an image denoising problem, where it compares very favorably with state-of-the-art denoising techniques.
Alex Mrozack, Kalyani Krishnamurthy and David Brady
Dispersive imagers use frequencies to index multiple spatial modes for antenna based imaging. This frequency swept nature allows for a compact imager with limited mechanical hardware relative to standard millimeter-wave imaging techniques. The drawback of this system is that the frequency coupling of the spatial modes couples the range and cross-range information together, resulting in the measurement of projections of multiple speckle realizations of the diffuse object of interest. The speckle realizations are modeled to be drawn from a complex Gaussian distribution with zero mean and variance proportional to the underlying scattering density. The proposed work exploits the correlations among speckle realizations interrogated by successive frequency measurements to recover the underlying object scattering density.
Vincent Lyzinski
There are no efficient algorithms known for graph matching (even deciding if two graphs are isomorphic is notoriously of unknown complexity), and therefore graph matching will not directly, and by itself, provide for efficient graph alignment. However, we prove (under mild conditions) that if there are known seeds (i.e. a partial alignment), then a logarithmic number of such known seeds are necessary and sufficient for an efficient linear assignment problem formulation ``CNS" to almost always give a correct alignment. Efficient Frank-Wolfe methodology has been promulgated in the literature for use in obtaining approximate graph matching, and we demonstrate in this paper that Frank-Wolfe methodology's natural extension to incorporate seeds inherently includes a linear assignment problem step which has the CNS formulation directly embedded in it. We then illustrate via simulation experiments that, when there are few seeds, Frank-Wolfe methodology can perform substantially better than CNS alone; indeed, the Frank-Wolfe methodology essentially incorporates CNS as well as a non-seed-based approach.
Jianing Shi
We present a video compressive sensing framework to accelerate the image acquisition process of dynamic MRI. The key idea is to construct a compact representation of the spatio-temporal data and perform computation within the motion manifold. We consider a specific type of motion manifold as the linear dynamical systems and couple it with video compressive sensing. Given compressive measurements, the state sequence can be first estimated using system identification. We then reconstruct the video on its motion manifold using a joint structured sparsity assumption. We demonstrate the performance of our approach for video compressive sensing, with applications to dynamic MRI. We also investigate the impact of various sampling strategies.
Ahmed Kirmani, Dongeek Shin, Andrea Colaco, and Vivek K Goyal
Active optical systems use narrowband sources and optical filtering along with highly-sensitive single photon counting detectors to improve signal detection rates and suppress ambient light. Despite these optoelectronic enhancements, image formation invariably requires detection of multiple photons for each image pixel, with statistical averaging to enhance signal and mitigate noise. We introduce first-photon imaging, a method for acquiring both range and scene intensity using only the first detected photon for each pixel location. In our laser ranging experiment, we demonstrate millimeter accurate, sub-pulsewidth range resolution and sixteen gray level reflectance imaging at high ambient light levels only using first photon arrivals. Our image formation method is based on exploiting spatio-temporal statistical correlations between photon arrivals to reject noisy ambient light photons and accurately estimate scene range and reflectance without sacrificing spatial resolution. The high photon efficiency we achieve translates to the design of low-power active imagers capable of operating in high-noise environments.
Dongeek Shin, Ahmed Kirmani, and Vivek K Goyal
Multiplexed imaging is a powerful mechanism for achieving high signal-to-noise ratio (SNR) in the presence of signal-independent additive noise. However, for imaging in presence of only signal-dependent shot noise, multiplexing has been shown to significantly degrade SNR. Hence, multiplexing to increase SNR in presence of Poisson noise is normally thought to be infeasible. We present an exception to this view by demonstrating multiplexing advantage when the scene parameters are non-negative valued and are observed through a low-rate Poisson channel.
Yuejie Chi
Abstract: The paper studies the problem of recovering a spectrally sparse object from a small number of time domain samples. Specifically, the object of interest with ambient dimension n is assumed to be a mixture of r complex sinusoids, while the underlying frequencies can assume any continuous values in the unit disk. Conventional compressed sensing paradigms suffer from the basis mismatch issue when imposing a discrete dictionary on the Fourier representation. To address this problem, we develop a novel nonparametric algorithm, called Enhanced Matrix Completion (EMaC), based on structured matrix completion. The algorithm starts by arranging the data into a low-rank enhanced form with multi-fold Hankel structure whose rank is upper bounded by r, and then attempts recovery via nuclear norm minimization. Under mild incoherence conditions, EMaC allows perfect recovery as soon as the number of samples exceeds the order of r log^2n, and is robust against bounded noise. Even if a constant portion of samples are corrupted with arbitrary magnitude, EMaC can still allow accurate recovery if the number of samples exceeds the order of r log^3n. Along the way, our results demonstrate that accurate completion of a low-rank multi-fold Hankel matrix is possible when the number of observed entries is proportional to the information theoretical limits (except for a logarithmic gap) under mild conditions. The performance of our algorithm and its applicability to super resolution are further demonstrated by numerical experiments.
Gautam Dasarathy
We consider the problem of recovering an unknown sparse p x p matrix X from an m x m matrix Y=AXB^T, where A and B are known m \times p matrices with m << p. The main result shows that there exist constructions of the "sketching" matrices A and B so that even if X has O(p) non-zeros, it can be recovered exactly and efficiently using a convex program as long as these non-zeros are not concentrated in any single row/column of X. Furthermore, it suffices for the size of Y (the sketch dimension) to scale as m = O(\sqrt{# nonzeros in X} \times log p). The results also show that the recovery is robust and stable in the sense that if X is equal to a sparse matrix plus a perturbation, then the convex program we propose produces an approximation with accuracy proportional to the size of the perturbation. Unlike traditional results on sparse recovery, where the sensing matrix produces independent measurements, our sensing operator is highly constrained (it assumes a tensor product structure). Therefore, proving recovery guarantees require non-standard techniques. Indeed our approach relies on a novel result concerning tensor products of bipartite graphs, which may be of independent interest. This problem is motivated by the following application, among others. Consider a p x n data matrix D, consisting of n observations of p variables. Assume that the correlation matrix X:=DD^{T} is (approximately) sparse in the sense that each of the p variables is significantly correlated with only a few others. Our results show that these significant correlations can be detected even if we have access to only a sketch of the data S=AD with A \in R^{m\times p}. Joint work with Robert Nowak.
Tony Jebara
The partition function plays a key role in probabilistic modeling including conditional random fields, graphical models, and maximum likelihood estimation. To optimize partition functions of log-linear models, we introduce a quadratic variational upper bound. This inequality facilitates majorization methods: optimization of complicated functions through the iterative solution of simpler sub-problems. Such bounds remain efficient to compute even when the partition function involves a graphical model (with small tree-width) or in latent likelihood settings. For large-scale problems, low-rank versions of the bound are provided and outperform LBFGS as well as first-order methods. Several learning applications are shown and reduce to fast and convergent update rules. Experimental results show advantages over state-of-the-art optimization methods. We also propose a stochastic version of bound majorization which competes well against stochastic gradient descent (across any of its variations and tunings). It converges in fewer iterations, reduces computation time and finds better parameter estimates. The proposed method bridges first- and second-order stochastic optimization methods by maintaining linear computational complexity (with respect to dimensionality) while exploiting second order information about the pseudo-global curvature of the objective function.
Andrew Holmgren
Traditional X-ray computed tomography (CT) detects the integrated attenuation along an X-ray's path through an object. In order to disambiguate the possible attenuation contributions from points along a single ray, multiple rays must pass through the object in a diversity of positions and angles -- typically hundreds to thousands of measurements must be made. As an alternative to traditional CT, we use a coded aperture to modulate an object's Compton and coherent X-ray scatter. The modulated scatter signal can be processed to reconstruct both the scatter density and the angular distribution of scatter, which gives information about both the object's spatial distribution and the object's material composition, in a single measurement. The ability to recover scatter information in a snapshot enables localized low dose X-ray tomography or tomographic X-ray video.
Patrick Llull, Sally Gewalt, Dan Schulz, Steve Feller, Daniel Marks, and David J. Brady
Sophisticated conventional imagers efficiently focus using lenslet arrays or radar-based measurements. We consider the problem of passive autofocus for many parallel relay microcameras in the absence of such technologies. Issues such as optimal focus metric, robustness to motion, and operating bandwidth are discussed.
Patrick Llull, Xin Yuan, Xuejun Liao, Jianbo Yang, Lawrence Carin, Guillermo Sapiro, and David J. Brady
We present an experimental prototype that utilizes mechanical motion of a passive coding element to significantly increase a camera’s temporal resolution with little additional power overhead. We demonstrate 330fps color videos reconstructed from 30fps raw measurements.
Joel A. Greenberg, Kalyani Krishnamurthy, David Brady
Realizing fast volumetric imaging of the molecular structure of an object is important for a variety of applications, ranging from medicine to security. Traditional approaches based on the measurement of scattered x-rays require the acquisition of many images and extensive filtering of the incident or scattered photons, which makes these systems slow and inappropriate for use in their intended applications. In contrast, we demonstrate fast, snapshot molecular imaging by using a coded aperture to modulate spatially the scattered x-rays and an energy-sensitive detector to measure the x-rays. We present a theoretical analysis and experimental data validating this approach in both a compressive and non-compressive regime.
Andrew Thompson
Group coherence of matrices plays an important role in analyzing the performance of group compressed sensing recovery algorithms (Bajwa/Mixon,2012). In this paper, we characterize two group coherence metrics: worst-case and average group coherence. First, we establish a fundamental lower bound on worst-case group coherence. We then present a deterministic matrix construction based upon Kronecker products that obtains this lower bound. We also characterize the worst-case group coherence of random subspaces. Finally, we present a flipping algorithm that can improve the average group coherence of a matrix, while maintaining the worst-case group coherence of the original matrix. We provide numerical examples which demonstrate that our proposed deterministic matrix construction performs well in group compressed sensing. Joint work with Yao Xie and Robert Calderbank.
Nikhil Rao
Standard compressive sensing results state that to exactly recover an s sparse signal in R^p , one requires O(s · log p) measurements. While this bound is extremely useful in practice, often real world signals are not only sparse, but also exhibit structure in the sparsity pattern. We focus on group-structured patterns in this paper. Under this model, groups of signal coefficients are active (or inactive) together. The groups are predefined, but the particular set of groups that are active (i.e., in the signal support) must be learned from measurements. We show that exploiting knowledge of groups can further reduce the number of measurements required for exact signal recovery, and derive universal bounds for the number of measurements needed. The bound is universal in the sense that it only depends on the number of groups under consideration, and not the particulars of the groups (e.g., compositions, sizes, extents, overlaps, etc.). Experiments show that our result holds for a variety of overlapping group configurations.
Tal Ben Yakar
Polyphonic Automatic Music Transcription is considered an open prob- lem and the state-of-the-art solutions are far from the level of precision required in many applications. We propose a trainable sparse model for automatic polyphonic music transcription, which incorporates several successful approaches into a unified optimization framework. Our model combines unsupervised synthesis models similar to latent component analysis and nonnegative factorization with metric learning techniques that allow supervised discriminative learning. We develop efficient stochastic gradient training schemes allowing unsupervised, semi-, and fully supervised training of the model as well its adaptation to test data. We show efficient fixed complexity and latency approximation that can replace iterative minimization algorithms in time-critical applications. Experimental evaluation on synthetic and real data shows promising initial results.
McKell Carter
Functional magnetic resonance imaging (fMRI) datasets consist of ~30k volume pixels collected in multiple sessions of several hundred time points. Reaching a conclusion about a structure-function relationship between the brain and a cognitive process requires comparing data collected during a task that engages the process of interest to data collected during a task that differs only in the process of interest. Statistical methods used in the analysis of fMRI data have largely consisted of massively parallel univariate tests. These tests have proven extremely useful in characterizing the neural bases of psychological processes but often lead to claims that a cognitive function has been localized to a specific brain structure, a conclusion that requires a comparison between multiple structures. We have previously shown that orthogonal distance regression of whole brain data (distributional analyses) – based on combinatoric methods for identifying independent information carried in each brain region -- can effectively isolate complex brain functions. Here, we used distribution analysis of fMRI data to identify common factors like arousal and engagement that drive the broad networks identified in two common tasks. We conclude that, in the examined cases a cognitive process common to many brain regions can explain most structure-function conclusions. Models incorporating these common effects can be used to isolate brain function related to the desired process.
R. McKell Carter (1,2), Jeff J. MacInnes (1,3), Amy Winecoff (1,2,3), R. Alison Adcock (1,3,4), Scott A. Huettel (1,2,3). 1 - Center for Cognitive Neuroscience, Duke. 2 - Brain Imaging and Analysis Center, Duke. 3 - Department of Psychology and Neuroscience, Duke. 4 - Department of Psychiatry and Behavioral Sciences, Duke