Nuit Blanche: ICML 2012 accepted papers

Tuesday, May 15, 2012

ICML 2012 accepted papers

The list of ICML 2012 papers accepted is here. Here is the list of the interesting ones, if you have link to the papers please mention it in the comment section or send e-mail. Thanks.

[Thank you Emile Richard and Ziyuan Lin ]

Estimation of Simultaneously Sparse and Low Rank Matrices

Pierre-André Savalle, Emile Richard, Nicolas Vayatis

– Accepted

Abstract: The paper introduces a penalized matrix estimation procedure aiming at solutions which are sparse and low-rank at the same time. Such structures arise in the context of social networks or protein interactions where underlying graphs have adjacency matrices which are block-diagonal in the appropriate basis. We introduce a convex mixed penalty which involves ℓ_1-norm and trace norm simultaneously. We obtain an oracle inequality which indicates how the two effects interact according to the nature of the target matrix. We bound generalization error in the link prediction problem. We also develop proximal descent strategies to solve the the optimization problem efficiently and evaluate performance on synthetic and real data sets.

Improved Nystrom Low-rank Decomposition with Priors

Kai Zhang, Liang Lan, Jun Liu, andreas Rauber

– Accepted

Abstract: Low-rank matrix decomposition has gained great popularity recently in scaling up kernel methods to large amount of data. However, some limitations could prevent them from working effectively in certain domains. For example, many existing approaches are intrinsically unsupervised, which does not incorporate side information (e.g., class labels) to produce task specific decomposition; also, they typically work “transductively” and the factorization does not generalize to new samples, causing lot of inconvenience for updated environment. To solve these problems, in this paper we propose an “inductive”-flavored method for low-rank kernel decomposition with priors or side information. We achieve this by generalizing the Nystróm method in a novel way. On the one hand, our approach employs a highly flexible, nonparametric structure that allows us to generalize the low-rank factors to arbitrarily new samples; on the other hand, it has linear time and space complexities, which can be orders of magnitudes faster than existing approaches and renders great efficiency in learning a low-rank kernel decomposition. Empirical results demonstrate the efficacy and efficiency of the proposed method.

Stability of matrix factorization for collaborative filtering

Yu-Xiang Wang, Huan Xu

– Accepted

Abstract: We study the stability vis a vis adversarial noise of matrix factorization algorithm for matrix completion. In particular, our results include: (I) we bound the gap between the solution matrix of the factorization method and the ground truth in terms of root mean square error; (II) we treat the matrix factorization as a subspace fitting problem and analyze the difference between the solution subspace and the ground truth; (III) we analyze the prediction error of individual users based on the subspace stability. We apply these results to the problem of collaborative filtering under manipulator attack, which leads to useful insights and guidelines for collaborative filtering system design.

Learning Efficient Structured Sparse Models

Alex Bronstein, Pablo Sprechmann, Guillermo Sapiro

– Accepted

Abstract: We present a comprehensive framework for structured sparse coding and modeling extending the recent ideas of using learnable fast regressors to approximate exact sparse codes. For this purpose, we develop a novel block-coordinate proximal splitting method for the iterative solution of hierarchical sparse coding problems, and show an efficient feed forward architecture derived from its iteration. This architecture faithfully approximates the exact structured sparse codes with a fraction of the complexity of the standard optimization methods. We also show that by using different training objective functions, learnable sparse encoders are no longer restricted to be mere approximants of the exact sparse code for a pre-given dictionary, as in earlier formulations, but can be rather used as full-featured sparse encoders or even modelers. A simple implementation shows several orders of magnitude speedup compared to the state-of-the-art at minimal performance degradation, making the proposed framework suitable for real time and large-scale applications.

Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization

Haim Avron, Satyen Kale, Shiva Kasiviswanathan, Vikas Sindhwani

– Accepted

Abstract: We describe novel subgradient methods for a broad class of matrix optimization problems involving nuclear norm regularization. Unlike existing approaches, our method executes very cheap iterations by combining low-rank stochastic subgradients with efficient incremental SVD updates, made possible by highly optimized and parallelizable dense linear algebra operations on small matrices. Our practical algorithms always maintain a low-rank factorization of iterates that can be conveniently held in memory and efficiently multiplied to generate predictions in matrix completion settings. Empirical comparisons confirm that our approach is highly competitive with several recently proposed state-of-the-art solvers for such problems.

Sparse Support Vector Infinite Push

Alain Rakotomamonjy

– Accepted

Abstract: In this paper, we address the problem of embedded feature selection for ranking on top of the list problems. We pose this problem as a regularized empirical risk minimization with p-norm push loss function (p=∞) and sparsity inducing regularizers. We leverage the issues related to this challenging optimization problem by considering an alternating direction method of multipliers algorithm which is built upon proximal operators of the loss function and the regularizer. Our main technical contribution is thus to provide a numerical scheme for computing the infinite push loss function proximal operator. Experimental results on toy, DNA microarray and BCI problems show how our novel algorithm compares favorably to competitors for ranking on top while using fewer variables in the scoring function.

A Dantzig Selector Approach to Temporal Difference Learning

Matthieu Geist, Bruno Scherrer, Alessandro Lazaric, Mohammad Ghavamzadeh

– Accepted

Abstract: LSTD is one of the most popular reinforcement learning algorithms for value function approximation. Whenever the number of samples is larger than the number of features, LSTD must be paired with some form of regularization. In particular, L1-regularization methods tends to perform feature selection by promoting sparsity and thus they are particularly suited in high–dimensional problems. Nonetheless, since LSTD is not a simple regression algorithm but it solves a fixed–point problem, the integration with L1-regularization is not straightforward and it might come with some drawbacks (see e.g., the P-matrix assumption for LASSO-TD). In this paper we introduce a novel algorithm obtained by integrating LSTD with the Dantzig Selector. In particular, we investigate the performance of the algorithm and its relationship with existing regularized approaches, showing how it overcomes some of the drawbacks of existing solutions.

Scaling Up Coordinate Descent Algorithms for Large ℓ_1 Regularization Problems

Chad Scherrer, Mahantesh Halappanavar, Ambuj Tewari, David Haglin

– Accepted

Abstract: We present a generic framework for parallel coordinate descent (CD) algorithms that has as special cases the original sequential algorithms of Cyclic CD and Stochastic CD, as well as the recent parallel Shotgun algorithm of Bradley et al. We introduce two novel parallel algorithms that are also special cases—Thread-Greedy CD and Coloring-Based CD—and give performance measurements for an OpenMP implementation of these.

Is margin preserved after random projection?

Qinfeng Shi, Chunhua Shen, Rhys Hill, Anton van den Hengel

– Accepted

Abstract: Random projections have been applied in many machine learning algorithms. However, whether margin is preserved after random projection is non-trivial and not well studied. In this paper we analyse margin distortion after random projection, and give the conditions of margin preservation for binary classification problems. We also extend our analysis to margin for multiclass problems, and provide theoretical bounds on multiclass margin on the projected data.

Evaluating Bayesian and L1 Approaches for Sparse Unsupervised Learning

Shakir Mohamed, Katherine Heller, Zoubin Ghahramani

– Accepted

Abstract: The use of L_1 regularisation for sparse learning has generated immense research interest, with many successful applications in diverse areas such as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L_1 methods, in this paper we find that L_1 regularisation often dramatically under-performs in terms of predictive performance when compared with other methods for inferring sparsity. We focus on unsupervised latent variable models, and develop L_1 minimising factor models, Bayesian variants of “L_1”, and Bayesian models with a strongerL_0-like sparsity induced through spike-and-slab distributions. These spike-and-slab Bayesian factor models encourage sparsity while accounting for uncertainty in a principled manner, and avoid unnecessary shrinkage of non-zero values. We demonstrate on a number of data sets that in practice spike-and-slab Bayesian methods outperform L_1 minimisation, even on a computational budget. We thus highlight the need to re-assess the wide use of L_1 methods in sparsity-reliant applications, particularly when we care about generalising to previously unseen data, and provide an alternative that, over many varying conditions, provides improved generalisation performance.

LPQP for MAP: Putting LP Solvers to Better Use

Patrick Pletscher, Sharon Wulff

– Accepted

Abstract: MAP inference for general energy functions remains a challenging problem. While most efforts are channeled towards improving the linear programming (LP) based relaxation, this work is motivated by the quadratic programming (QP) relaxation. We propose a novel MAP relaxation that penalizes the Kullback-Leibler divergence between the LP pairwise auxiliary variables, and QP equivalent terms given by the product of the unaries. We develop two efficient algorithms based on variants of this relaxation. The algorithms minimize the non-convex objective using belief propagation and dual decomposition as building blocks. Experiments on synthetic and real-world data show that the solutions returned by our algorithms substantially improve over the LP relaxation.

Clustering by Low-Rank Doubly Stochastic Matrix Decomposition

Zhirong Yang, Erkki Oja

– Accepted

Abstract: Clustering analysis by nonnegative low-rank approximations has achieved remarkable progress in the past decade. However, most approximation approaches in this direction are still restricted to matrix factorization. We propose a new low-rank learning method to improve the clustering performance, which is beyond matrix factorization. The approximation is based on a two-step bipartite random walk through virtual cluster nodes, where the approximation is formed by only cluster assigning probabilities. Minimizing the approximation error measured by Kullback-Leibler divergence is equivalent to maximizing the likelihood of a discriminative model, which endows our method with a solid probabilistic interpretation. The optimization is implemented by a relaxed Majorization-Minimization algorithm that is advantageous in finding good local minima. Furthermore, we point out that the regularized algorithm with Dirichlet prior only serves as initialization. Experimental results show that the new method has strong performance in clustering purity for various datasets, especially for large-scale manifold data.

Fast classification using sparse decision DAGs

Robert Busa-Fekete, Djalel Benbouzid, Balazs Kegl

– Accepted

Abstract: In this paper we propose an algorithm that builds sparse decision DAGs (directed acyclic graphs) out of a list of base classifiers provided by an external learning method such as AdaBoost. The basic idea is to cast the DAG design task as a Markov decision process. Each instance can decide to use or to skip each base classifier, based on the current state of the classifier being built. The result is a sparse decision DAG where the base classifiers are selected in a data-dependent way. The method has a single hyperparameter with a clear semantics of controlling the accuracy/speed trade-off. The algorithm is competitive with state-of-the-art cascade detectors on three object-detection benchmarks, and it clearly outperforms them in the regime of low number of base classifiers. Unlike cascades, it is also readily applicable for multi-class classification. Using the multi-class setup, we show on a benchmark web page ranking data set that we can significantly improve the decision speed without harming the performance of the ranker.

A Combinatorial Algebraic Approach for the Identifiability of Low-Rank Matrix Completion

Franz Király, Ryota Tomioka

– Accepted

Abstract: In this paper, we review the problem of matrix completion and expose its intimate relations with algebraic geometry and combinatorial graph theory. We present the first necessary and sufficient combinatorial conditions for matrices of arbitrary rank to be identifiable from a set of matrix entries, yielding theoretical constraints and new algorithms for the problem of matrix completion. We conclude with algorithmically evaluating the tightness of the given conditions and algorithms for practically relevant matrix sizes, showing that the algebraic-combinatoric approach can lead to improvements over state-of-the-art matrix completion methods.

Greedy Algorithms for Sparse Reinforcement Learning

Christopher Painter-Wakefield, Ronald Parr

– Accepted

Abstract: Feature selection and regularization are becoming increasingly prominent tools in the efforts of the reinforcement learning (RL) community to expand the reach and applicability of RL. One approach to the problem of feature selection is to impose a sparsity-inducing form of regularization on the learning method. Recent work on L_1 regularization has adapted techniques from the supervised learning literature for use with RL. Another approach that has received renewed attention in the supervised learning community is that of using a simple algorithm that greedily adds new features. Such algorithms have many of the good properties of the L_1 regularization methods, while also being extremely efficient and, in some cases, allowing theoretical guarantees on recovery of the true form of a sparse target function from sampled data. This paper considers variants of orthogonal matching pursuit (OMP) applied to reinforcement learning. The resulting algorithms are analyzed and compared experimentally with existing L_1 regularized approaches. We demonstrate that perhaps the most natural scenario in which one might hope to achieve sparse recovery fails; however, one variant, OMP-BRM, provides promising theoretical guarantees under certain assumptions on the feature dictionary. Another variant, OMP-TD, empirically outperforms prior methods both in approximation accuracy and efficiency on several benchmark problems.

Semi-supervised Metric Learning Paradigm with Hyper-Sparsity

Gang Niu, Bo Dai, Makoto Yamada, Masashi Sugiyama

– Accepted

Abstract: We propose a general information-theoretic approach called Seraph for semi-supervised metric learning that does not rely upon the manifold assumption. It maximizes/minimizes entropies of labeled/unlabeled data pairs in the supervised/unsupervised part, which allows these two parts to be integrated in a natural and meaningful way. Furthermore, it is equipped with the hyper-sparsity: Given a certain probabilistic model parameterized by the learned metric, the posterior distribution and the resultant projection matrix are both `sparse'. Consequently, the metric learned by Seraph possesses high discriminability even under a noisy environment. The optimization problem of Seraph can be solved efficiently and stably by an EM-like scheme, where the E-Step is analytical and the M-Step is convex and `smooth'. Experiments demonstrate that Seraph compares favorably with many well-known metric learning methods.

Fast approximation of matrix coherence and statistical leverage

Michael Mahoney, Petros Drineas, Malik Magdon-Ismail, David Woodruff

– Accepted

Abstract: The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities have been of interest in recently-popular problems such as matrix completion and Nystrom-based low-rank matrix approximation; in large-scale statistical data analysis applications more generally; and since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n × d matrix A, with n >> d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs in O(nd log n) time, as opposed to the O(nd2) time required by the naıve algorithm that involves computing an orthogonal basis for the range of A. This resolves an open question from (Drineas et al., 2006b) and (Mohri & Talwalkar, 2011); and our result leads to immediate improvements in coreset-based L2-regression, the estimation of the coherence of a matrix, and several related low-rank matrix problems. Interestingly, to achieve our result we judiciously apply random projections on both sides of A.

A Hybrid Algorithm for Convex Semidefinite Optimization

Soeren Laue

– Accepted

Abstract: We present a hybrid algorithm for optimizing a convex, smooth function over the cone of positive semidefinite matrices. Our algorithm converges to the global optimal solution and can be used to solve general large-scale semidefinite programs and hence can be readily applied to a variety of machine learning problems. We show experimental results on three machine learning problems (matrix completion, metric learning, and sparse PCA) . Our approach outperforms state-of-the-art algorithms.

A Complete Analysis of the l_1,p Group-Lasso

Julia Vogt, Volker Roth

– Accepted

Abstract: The Group-Lasso is a well-known tool for joint regularization in machine learning methods. While the l_{1,2} and the l_{1,∞} version have been studied in detail and efficient algorithms exist, there are still open questions regarding other l_{1,p} variants. We characterize conditions for solutions of the l_{1,p} Group-Lasso for all p-norms with 1 <= p <= ∞, and we present a unified active set algorithm. For all p-norms, a highly efficient projected gradient algorithm is presented. This new algorithm enables us to compare the prediction performance of many variants of the Group-Lasso in a multi-task learning setting, where the aim is to solve many learning problems in parallel which are coupled via the Group-Lasso constraint. We conduct large-scale experiments on synthetic data and on two real-world data sets. In accordance with theoretical characterizations of the different norms we observe that the weak-coupling norms with p between 1.5 and 2 consistently outperform the strong-coupling norms with p >> 2.

Convergence Rates of Biased Stochastic Optimization for Learning Sparse Ising Models

Jean Honorio

– Accepted

Abstract: We study the convergence rate of stochastic optimization of exact (NP-hard) objectives, for which only biased estimates of the gradient are available. We motivate this problem in the context of learning the structure and parameters of Ising models. We first provide a convergence-rate analysis of deterministic errors for forward-backward splitting (FBS). We then extend our analysis to biased stochastic errors, by first characterizing a family of samplers and providing a high probability bound that allows understanding not only FBS, but also proximal gradient (PG) methods. We derive some interesting conclusions: FBS requires only a logarithmically increasing number of random samples in order to converge (although at a very low rate); the required number of random samples is the same for the deterministic and the biased stochastic setting for FBS and basic PG; accelerated PG is not guaranteed to converge in the biased stochastic setting.

Sparse-GEV: Sparse Latent Space Model for Multivariate Extreme Value Time Serie Modeling

Yan Liu, Taha Bahadori, Hongfei Li

– Accepted

Abstract: In many applications of time series models, such as climate analysis or social media analysis, we are often interested in extreme events, such as heatwave, wind gust, and burst of topics. These time series data usually exhibit a heavy-tailed distribution rather than a normal distribution. This poses great challenges to existing approaches due to the significantly different assumptions on the data distributions and the lack of sufficient past data on extreme events. In this paper, we propose the sparse-GEV model, a latent state model based on the theory of extreme value modeling to automatically learn sparse temporal dependence and make predictions. Our model is theoretically significant because it is among the first models to learnsparse temporal dependencies between multivariate extreme value time series. We demonstrate the superior performance of our algorithm compared with state-of-art methods, including Granger causality, copula approach, and transfer entropy, on one synthetic dataset, one climate dataset and one Twitter dataset.

A Proximal-Gradient Homotopy Method for the L1-Regularized Least-Squares Problem

Lin Xiao, Tong Zhang

– Accepted

Abstract: We consider the ℓ_1-regularized least-squares problem for sparse recovery and compressed sensing. Since the objective function is not strongly convex, standard proximal gradient methods only achieve sublinear convergence. We propose a homotopy continuation strategy, which employs a proximal gradient method to solve the problem with a sequence of decreasing regularization parameters. It is shown that under common assumptions in compressed sensing, the proposed method ensures that all iterates along the homotopy solution path are sparse, and the objective function is effectively strongly convex along the solution path. This observation allows us to obtain a global geometric convergence rate for the procedure. Empirical results are presented to support our theoretical analysis.

An Efficient Approach to Sparse Linear Discriminant Analysis

Luis Francisco Sánchez Merchante, Yves Grandvalet, Gérrad Govaert

– Accepted

Abstract: We present the Group-Lasso Optimal Scoring Solver (GLOSS), a novel approach to the formulation and the resolution of sparse Linear Discriminant Analysis (LDA). Our formulation, which is based on penalized Optimal Scoring, preserves an exact equivalence with sparse LDA, contrary to the multi-class approaches based on the regression of class indicator that have been proposed so far. Additionally, the versatility of the implementation allows to impose some particular structure on the within-class covariance matrix. Computationally, the optimization algorithm considers two nested convex problems, the main one being a linear regression regularized by a quadratic penalty implementing the group-lasso penalty. As group-Lasso selects the same features in all discriminant directions, it generates extremely parsimonious models without compromising the prediction performances. Moreover, the resulting sparse discriminant directions are amenable to low-dimensional representations. Our algorithm is highly efficient for medium to large number of variables, and is thus particularly well suited to the analysis of gene expression data.

A convex relaxation for weakly supervised classifiers

Armand Joulin, Francis Bach

– Accepted

Abstract: This paper introduces a general multi-class approach to weakly supervised classification. Inferring the labels and learning the parameters of the model is usually done jointly through a block-coordinate descent algorithm such as expectation-maximization (EM), which may lead to local minima. To avoid this problem, we propose a cost function based on a convex relaxation of the soft-max loss. We then propose an algorithm specifically designed to efficiently solve the corresponding semidefinite program (SDP). Empirically, our method compares favorably to standard ones on different datasets for multiple instance learning and semi-supervised learning as well as on clustering tasks.

Small-sample brain mapping: sparse recovery on spatially correlated designs with randomization and clustering

Gael Varoquaux, Alexandre Gramfort, Bertrand Thirion

– Accepted

Abstract: Functional neuroimaging can measure the brain’s response to an external stimulus. It is used to perform brain mapping: identifying from these observations the brain regions involved. This problem can be cast into a linear supervised learning task where the neuroimaging data are used as predictors for the stimulus. Brain mapping is then seen as a support recovery problem. On functional MRI (fMRI) data, this problem is particularly challenging as i) the number of samples is small due to limited acquisition time and ii) the variables are strongly correlated. We propose to overcome these difficulties using sparse regression models over new variables obtained by clustering of the original variables. The use of randomization techniques, e.g. bootstrap samples, and hierarchical clustering of the variables improves the recovery properties of sparse methods. We demonstrate the benefit of our approach on an extensive simulation study as well as two publicly available fMRI datasets.

Regularizers versus Losses for Nonlinear Dimensionality Reduction: A Factored View with New Convex Relaxations

James Neufeld, Yaoliang Yu, Xinhua Zhang, Ryan Kiros, Dale Schuurmans

– Accepted

Abstract: We demonstrate that almost all non-parametric dimensionality reduction methods can be expressed by a simple procedure: regularized loss minimization plus singular value truncation. By distinguishing the role of the loss and regularizer in such a process, we recover a factored perspective that reveals some gaps in the current literature. Beyond identifying a useful new loss for manifold unfolding, a key contribution is to derive new convex regularizers that combine distance maximization with rank reduction. These regularizers can be applied to any loss.

Efficient Euclidean Projections onto the Intersection of Norm Balls

Wei YU, Hao Su, Li Fei-Fei

– Accepted

Abstract: Using sparse-inducing norms to learn robust models has received increasing attention from many fields for its attractive properties. Projection-based methods have been widely applied to learning tasks constrained by such norms. As a key building block of these methods, an efficient operator for Euclidean projection onto the intersection of ℓ_1 and ℓ_{1,q} norm balls (q=2 or ∞) is proposed in this paper. We prove that the projection can be reduced to finding the root of an auxiliary function which is piecewise smooth and monotonic. Hence, a bisection algorithm is sufficient to solve the problem. We show that the time complexity of our solution is O(n+g log g) for q=2 and O(n log n) for q=∞, where n is the dimensionality of the vector to be projected and g is the number of disjoint groups; we confirm this complexity by experimentation. Empirical study reveals that our method achieves significantly better performance than classical methods in terms of running time and memory usage. We further show that embedded with our efficient projection operator, projection-based algorithms can solve regression problems with composite norm constraints more efficiently than other methods and give superior accuracy.

Clustering using Max-norm Constrained Optimization

Ali Jalali, Nathan Srebro

– Accepted

Abstract: We suggest using the max-norm as a convex surrogate constraint for clustering. We show how this yields a better exact cluster recovery guarantee than previously suggested nuclear-norm relaxation, and study the effectiveness of our method, and other related convex relaxations, compared to other approaches.

Randomized Smoothing for (Parallel) Stochastic Optimization

John Duchi, Martin Wainwright, Peter Bartlett

– Not for proceedings

Abstract: By combining randomized smoothing techniques with accelerated gradient methods, we obtain convergence rates for stochastic optimization procedures, both in expectation and with high probability, that have optimal dependence on the variance of the gradient estimates. To the best of our knowledge, these are the first variance-based convergence guarantees for non-smooth optimization. A combination of our techniques with recent work on decentralized optimization yields order-optimal parallel stochastic optimization algorithms. We give applications of our results to several statistical machine learning problems, providing experimental results demonstrating the effectiveness of our algorithms.

Group Sparse Additive Models

Junming Yin, Xi Chen, eric xing

– Accepted

Abstract: We consider the problem of sparse variable selection in nonparametric additive models, with the prior knowledge of the structure among the covariates to encourage those variables within a group to be selected jointly. Previous works either study the group sparsity in the parametric setting (e.g., group lasso), or address the variable selection problem in the nonparametric setting without exploiting the structural information (e.g., sparse additive models (SpAM)). In this paper, we present a new method, called group sparse additive models (GroupSpAM), which can handle group sparsity in nonparametric additive models. We generalize the l1/l2 norm to Hilbert spaces as the sparsity-inducing penalty in GroupSpAM. Moreover, we derive a novel thresholding condition for identifying the functional sparsity at the group level, and propose an efficient block coordinate descent algorithm for constructing the estimate. We demonstrate by simulation that GroupSpAM substantially outperforms competing methods in terms of support recovery and prediction accuracy in additive models, and also conduct a comparative experiment on a real breast cancer dataset.

Online Alternating Direction Method

Huahua Wang, Arindam Banerjee

– Accepted

Abstract: Online optimization has emerged as powerful tool in large scale optimization. In this paper, we introduce efficient online algorithms based on the alternating directions method (ADM). We introduce a new proof technique for ADM in the batch setting, which yields a linear rate of convergence of ADM and forms the basis of regret analysis in the online setting. We consider two scenarios in the online setting, based on whether the solution needs to lie in the feasible set or not. In both settings, we establish regret bounds for both the objective function as well as constraint violation for general and strongly convex functions. Preliminary results are presented to illustrate the performance of the proposed algorithms.

Robust PCA in High-dimension: A Deterministic Approach

Jiashi Feng, Huan Xu, Shuicheng Yan

– Accepted

Abstract: We consider principal component analysis for contaminated data-set in the high dimensional regime, where the number of observations is comparable or more than the dimensionality of each observation. We propose a *deterministic* high-dimensional robust PCA algorithm which inherits all theoretical properties of its *randomized* counterpart, i.e., it is tractable, robust to contaminated points, easily kernelizable, asymptotic consistent and achieves maximal robustness - a breakdown point of 50%. More importantly, the proposed method exhibits significantly better computational efficiency, which makes it suitable for large-scale real applications.

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

5 comments:

Emile Richard said...: Here is a link to our paper :
http://www.cmla.ens-cachan.fr/fileadmin/Membres/richard/pdfs/SPLR.pdf; Tuesday, May 15, 2012 at 12:20:00 PM CDT
Chad Scherrer said...: Here is a link to our "Scaling Up Coordinate Descent Paper":
http://www.cs.utexas.edu/~ambuj/research/scherrer12scaling.pdf; Monday, May 21, 2012 at 11:36:00 PM CDT
Anonymous said...: Here is a link for the paper Efficient Euclidean Projections onto the Intersection of Norm Balls
http://i.cs.hku.hk/~wyu/Adams_Wei_Yu_Homepage_files/Efficient_Euclidean_Projections_onto_the_Intersection_of_Norm_Balls.pdf; Sunday, May 27, 2012 at 2:05:00 AM CDT
Yu-Xiang Wang said...: Here is a link for the paper Stability of Matrix Factorization for Collaborative Filtering:
http://serve.me.nus.edu.sg/mpewangy/docs/MF_stability-ICML.pdf
Supplementary material:
http://serve.me.nus.edu.sg/mpewangy/docs/MF_stability-ICML-sup.pdf; Tuesday, May 29, 2012 at 9:24:00 PM CDT
Yu-Xiang Wang said...: Here is a link for the paper Stability of Matrix Factorization for Collaborative Filtering:
http://serve.me.nus.edu.sg/mpewangy/docs/MF_stability-ICML.pdf
Supplementary material:
http://serve.me.nus.edu.sg/mpewangy/docs/MF_stability-ICML-sup.pdf; Tuesday, May 29, 2012 at 9:26:00 PM CDT