## Page Views on Nuit Blanche since July 2010

My papers on ArXiv:
Approximating Kernels at the speed of Light
&
Imaging with Nature

LightOn
LinkedIn (727)|| on CrunchBase || our Blog
(2452)
(3967)
(1333)||
Attendant references pages:
The Advanced Matrix Factorization Jungle Page ||

Paris Machine Learning
@Meetup.com (8016 members) || @Archives

## Monday, July 07, 2014

### Randomized Numerical Linear Algebra for Large Scale Data Analysis ( libSkylark: Sketching based Matrix computations for Machine Learning )

While reading up some RandNLA papers, I stumbled upon the Randomized Numerical Linear Algebra for Large Scale Data Analysis page which has been added to the collection of Highly Technical Reference Pages. Here is what the main page says:

The Sketching Linear Algebra Kernel is a library for matrix computations suitable for general statistical data analysis and optimization applications.
Many tasks in machine learning and statistics ultimately end up being problems involving matrices: whether you're matching lenders and loans in the microfinance space, or finding the key players in the bitcoin market, or inferring where tweets came from, you'll want to have a toolkit for low-rank matrix approximation, least-squares and robust regression, eigenvector analysis, CUR and non-negative matrix factorizations, and other matrix computations.
Sketching is a way to compress matrices that preserves key matrix properties; it can be used to speed up many matrix computations. Sketching takes a given matrix A and produces a sketch matrixB that has fewer rows and/or columns than A. For a good sketch B, if we solve a problem with inputB, the solution will also be pretty good for input A. For some problems, sketches can also be used to get faster ways to find high-precision solutions to the original problem. In other cases, sketches can be used to summarize the data by identifying the most important rows or columns.
A simple example of sketching is just sampling the rows (and/or columns) of the matrix, where each row (and/or column) is equally likely to sampled. This uniform sampling is quick and easy, but doesn't always yield good sketches; however, there are sophisticated sampling methods that do yield good sketches.
The goal of this project is to build a sketching-based open-source software stack for NLA

and its applications, as shown:

Matrix Completion
Nonlinear RLS,
SVM, PCA
Robust Regression

Other applications

Python: Python-based data analytics scripting layer
PythonBinding: C++ to Python bindings

NLA: Numerical Linear Algebra primitives
(Least squares regression, low-rank approximation, randomized estimators)

Sketch: Sketching kernels
JL, FJL, Gaussian, Sign, Sparse Embedding

Third-Party Libraries:
MPI, Elemental, BLAS, CombBLAS, FFTW, Boost

Here are the recent papers from this RandNLA effort

#### Random Laplace Feature Maps for Semigroup Kernels on Histograms

Jiyan Yang, Vikas Sindhwani, Quanfu Fan, Haim Avron, Michael Mahoney
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014

#### Efficient Dimensionality Reduction for Canonical Correlation Analysis

Haim Avron, Christos Boutsidis, Sivan Toledo, Anastasios Zouzias
SIAM Journal on Scientific Computing, to appear, 2014
Preliminary version appeared in the Proceedings of the 30th International Conference on Machine Learning (ICML), 2013

#### Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels

Jiyan Yang*, Vikas Sindhwani*, Haim Avron*, Michael Mahoney
Proceedings of the 31th International Conference on Machine Learning (ICML), 2014
(*) Equal contributors.

#### Kernel Methods Match Deep Neural Networks on TIMIT

Po-Sen Huang, Haim Avron, Tara Sainath, Vikas Sindhwani, Bhuvana Ramabhadran
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014
Best Student Paper Award

#### Optimal CUR Matrix Decompositions

Christos Boutsidis, David Woodruff
ACM Symposium on Theory of Computing (STOC), 2014

#### Efficient Dimensionality Reduction for Canonical Correlation Analysis

Haim Avron, Christos Boutsidis, Sivan Toledo, Anastasios Zouzias
SIAM Journal on Scientific Computing, to appear, 2014

#### Faster SVD-truncated Regularized Least-squares

Christos Boutsidis, Malik Magdon-Ismail
Technical Report, 2014

#### A Note on Sparse Least-squares Regression

C. Boutsidis and M. Magdon-Ismail
Information Processing Letters, to appear, 2014

#### Approximate Spectral Clustering via Randomized Sketching

A. Gittens, A. Kambadur, C. Boutsidis.
Technical Report, updated Feb 15, 2014

#### Revisiting Asynchronous Linear Solvers: Provable Convergence Rate Through Randomization

Haim Avron, Alex Druinsky, Anshul Gupta
Proceeding of the 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS) , 2014

#### Random Projections for Linear Support Vector Machines

S. Paul, C. Boutsidis, M. Magdon-Ismail, P. Drineas
ACM Transactions on Knowledge Discovery from Data, to appear, 2014

Also of interest is libSkylark: Sketching based Matrix computations for Machine Learning on GitHub. From the page:

Introduction
The Sketching based Matrix computations for Machine Learning is a library for matrix computations suitable for general statistical data analysis and optimization applications.
Many tasks in machine learning and statistics ultimately end up being problems involving matrices: whether you're finding the key players in the bitcoin market, or inferring where tweets came from, or figuring out what's in sewage, you'll want to have a toolkit for least-squares and robust regression, eigenvector analysis, non-negative matrix factorization, and other matrix computations.

Sketching is a way to compress matrices that preserves key matrix properties; it can be used to speed up many matrix computations. Sketching takes a given matrix A and produces a sketch matrix B that has fewer rows and/or columns than A. For a good sketch B, if we solve a problem with input B, the solution will also be pretty good for input A. For some problems, sketches can also be used to get faster ways to find high-precision solutions to the original problem. In other cases, sketches can be used to summarize the data by identifying the most important rows or columns.
A simple example of sketching is just sampling the rows (and/or columns) of the matrix, where each row (and/or column) is equally likely to be sampled. This uniform sampling is quick and easy, but doesn't always yield good sketches; however, there are sophisticated sampling methods that do yield good sketches.
The goal of this project is to build a sketching-based open-source software stack for NLA and its applications, as shown:
 Matrix Completion Nonlinear RLS, SVM, PCA Robust Regression Other applications Python: Python-based data analytics scripting layer PythonBinding: C++ to Python bindings NLA: Numerical Linear Algebra primitives (Least squares regression, low-rank approximation, randomized estimators) Sketch: Sketching kernels JL, FJL, Gaussian, Sign, Sparse Embedding Third-Party Libraries: MPI, Elemental, BLAS, CombBLAS, FFTW, Boost
Image Credit: NASA/JPL-Caltech

Navcam: Left B

2014-07-04 20:21:29 UTC
This image was taken by Navcam: Left B (NAV_LEFT_B) onboard NASA's Mars rover Curiosity on Sol 679 (2014-07-04 20:21:29 UTC).
Full Resolution

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.