## Friday, December 19, 2014

### Video: A Statistical Model for Tensor Principal Component Analysis

A Statistical Model for Tensor Principal Component Analysis
I will discuss the Principal Component Analysis problem for large tensors of arbitrary order k under a single-spike (or rank-one plus noise) model. On the one hand, using information theory, and recent results in probability theory I will provide necessary and sufficient conditions under  which the principal component can be estimated using unbounded computational resources. It turns out that this is possible as soon as the signal-to-noise ratio \beta becomes larger than C\sqrt{k\log k} (and in particular \beta can remain bounded has the problem dimensions increase). On the other hand, I will analyze several polynomial-time estimation algorithms, based on various approaches:
1. Tensor matricization and spectral analysis,
2. Semidefinite relaxations,
3. Power iteration.
I will show that, unless the signal-to-noise ratio diverges in the system dimensions, none of these approaches succeeds. This is possibly related to a fundamental limitation of polynomial estimators for this problem. While complexity theory suggests that intractability holds from a worst case point of view, no analogous result has been proved  under statistical models of the data.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Thursday, December 18, 2014

### Video and Slides: Random Embeddings, Matrix-valued Kernels and Deep Learning

Vikas Sindhwani

The recent dramatic success of Deep Neural Networks (DNNs) in many applications highlights the statistical benefits of marrying near-nonparametric models with large datasets, using efficient optimization algorithms running in distributed computing environments. In the 1990's, Kernel methods became the toolset of choice for a wide variety of machine learning problems due to their theoretical appeal and algorithmic roots in convex optimization, replacing neural nets in many settings. So what changed between then and the modern deep learning revolution? Perhaps the advent of "big data" or perhaps the notion of "depth" or perhaps better DNN training algorithms, or all of the above. Or perhaps also that the development of kernel methods has somewhat lagged behind in terms of scalable training techniques, effective mechanisms for kernel learning and parallel implementations.
I will describe new efforts to resolve scalability challenges of kernel methods, for both scalar and multivariate prediction settings, allowing them to be trained on big data using a combination of randomized data embeddings, Quasi-Monte Carlo (QMC) acceleration, distributed convex optimization and input-output kernel learning. I will report that on classic speech recognition and computer vision datasets, randomized kernel methods and deep neural networks turn out to have essentially identical performance. Curiously, though, randomized kernel methods begin to look a bit like neural networks, but with a clear mathematical basis for their architecture. Conversely, invariant kernel learning and matrix-valued kernels may offer a new way to construct deeper architectures. This talk will describe recent research results and personal perspectives on points of synergy between these fields.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Wednesday, December 17, 2014

### Learning with Fredholm Kernels - implementation -

Here is a way to do semi-surpevised learning with kernels (and even with Random Features at the very end)

Learning with Fredholm Kernels by Qichao Que Mikhail Belkin and Yusu Wang.
In this paper we propose a framework for supervised and semi-supervised learning based on reformulating the learning problem as a regularized Fredholm integral equation. Our approach fits naturally into the kernel framework and can be interpreted as constructing new data-dependent kernels, which we call Fredholm kernels. We proceed to discuss the “noise assumption” for semi-supervised learning and provide both theoretical and experimental evidences that Fredholm kernels can effectively utilize unlabeled data under the noise assumption. We demonstrate that methods based on Fredholm learning show very competitive performance inthe standard semi-supervised learning setting
The implementation is located at: https://github.com/queqichao/FredholmLearning

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### More Oil and Compressive Sensing Connections

Felix Herrmann followed through to yesterday's "Another Donoho-Tao Moment ?" which itself was a followup to Sunday Morning Insight: The Stuff of Discovery and Hamming's time: Scientific Discovery Enabled by Compressive Sensing and related fields. Here what he has to say:

Hi Igor,
I agree with your assessment that the application of CS to seismic data acquisition may not qualify as a major scientific discovery but it will almost certainly lead to major scientific breakthroughs in crustal and mantle seismology as soon as my colleagues in those areas get wind of this innovation. Having said this it may be worth mentioning that randomized subsampling techniques are also having a major impact on carrying out large scale inversions in exploration seismology. For instance, a major seismic contractor company has been able to render full-waveform inversion (the seismic term for inverse problems involving PDE solves) into a commercially viable service by virtue of the fact that we were able to reduce the computational costs of these methods 5—7 fold using batching techniques. These references
were instrumental in motivating the oil & gas industry into adapting this batching technology into their business.
An area that is closer in spirit to Compressive Sensing is seismic imaging. Contrary to full-waveform inversion, seismic imaging entails the inversion of extremely large-scale tall systems of equations that can be made computationally viable using randomized sampling techniques in combination with sparsity promotion. While heuristic in nature, this technique
is able to approximately invert tall systems at the cost of roughly one pass through the data (read only one application of the full adjoint). It is very interesting to see that this heuristic technique has, at least at the conceptual level, connections with approximate message passing (see section 6.5 of “Graphical Models Concepts in Compressed Sensing”) and recent work on Kaczmarz: “A sparse Kaczmarz solver and a linearized Bregman method for online compressed sensing”.
While these developments may by themselves not qualify as major “scientific breakthroughs”, it is clear that these developments are having a major impact on the field of exploration seismology where inversions as opposed to applications of “single adjoints” are now possible that were until very recently computationally unfeasible. For further information on the application of Compressive Sensing to seismic data acquisition and wave-equation based inversion, I would like to point your readers to our website, mind map, and a recent overview article
Thanks again for your efforts promoting Compressive Sensing to the wider research community.
Kind regards,
Felix J. Herrmann
UBC—Seismic Laboratory for Imaging and Modelling
Thanks

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Tuesday, December 16, 2014

### Another Donoho-Tao Moment ?

We always hear about the need to justify mathematics as regards to applied work. Remember this "Donoho-Tao moment"? (from this 2008 newletter)
[Mark Green's] favorite moment of the program came when the NSF panel – which was simultaneously conducting its first site visit – asked: Is this interdisciplinary work? Participant David Donoho (statistics, Stanford), another major contributor to the genesis of compressed sensing, reportedly exclaimed, “You’ve got Terry Tao talking to geoscientists, what do you want?

Here is one that looks like another instance of it: Chuck Mosher, a Geoscience Fellow at ConocoPhillips mentioned the following in the LinkedIn thread related to scientific discovery using compressive sensing. While this is not a scientific discovery per se, it does fall in the discovery area and technology improvement:

Igor,

I've recently started following Nuit Blanche, and find your site to be a great reference for compressive sensing topics.

I believe I have a possible example of CS leading to "discoveries" in my field: seismic data acquisition, processing, and imaging. I have worked with Rich Baraniuk at Rice University on many topics over the past 20 years or so. Rich recognized that seismic data acquisition might be a good candidate for compressive sensing innovation, and began coaching me in this direction starting around 2008 when CS was gathering steam. In 2010, I started a formal research project at my company (ConocoPhillips) to investigate applications of CS to petroleum exploration and production. In 2012, we acquired our first field trials using CS designs for seismic data acquistion, and finally in 2014 we deployed our first full scale CS based system for seismic data acquistion. We call our framework Compressive Seismic Imaging, or CSI. Gotta have a good acronym ;-)

Seismic data acquisition for a single geologic prospect can cost anywhere between $10-$100MM USD, and involves the use of the largest moving objects ever created by man (6 x 18 km sensor arrays with upwards of 64,000 channels). Compressive Sensing allows us to increase what we call "acquisition efficiency" by a factor of 2 or more in each coordinate direction that is employed. For seismic data acquisition, we have 4 coordinate directions (source x, source y , receiver x, receiver y), so efficiencies on the order of 2**4 or 16x are possible. In addition, CS can be used to enable the use of multiple simultaneous sources, providing another factor of 4 or so. Just as in parallel computing, a significant portion of these projects is "serial", so the efficiencies might only have a 2-10x impact on cost. You do the math - impacting cost with a factor of 2 on a $100 million dollar project will get your attention. Capital programs for seismic data acquistion exceed$1 billion dollars for many of the large oil companies, so there is a lot of upside for CS in our business. We have only started to get this technology into use, but we expect that in a few years the seismic acquistion business will fully embrace CSI.

Here is a link to a feature article on our CSI program in a popular news magazine for the exploration geophysics industry, "The Leading Edge", published by the Society of Exploration Geophysicists:

Best regards,
Chuck Mosher
Geoscience Fellow, ConocoPhillips

So this is the applied part. What about the pure math connection ? Dustin Mixon recently wrote about Alexander Grothendieck's influence on his work:
(I say that his is not my field of study, and yet I have still seen his influence. For example, the Grothendieck inequality is a beautiful result in functional analysis that he proved as a graduate student before changing fields, and it has since found applications in hardness of approximation. Also, his development of Grothendieck groups provided a starting point for K-theory, which is the source of the best known lower bound for the 4M-4 conjecture.)

see also Phase Retrieval from masked Fourier transforms. In summary, you've got one of Grothendieck's theory providing bounds on the number of sensors and enabling a billion dollar industry, what do you want ?

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Learning Multidimensional Fourier Series With Tensor Trains - implementation -

Ah ! Tensor Trains versus Random Features (and an implementaiton in Julia)

Learning Multidimensional Fourier Series With Tensor Trains by Sander Wahls, Visa Koivunen, H. Vincent Poor and Michel Verhaegen
How to learn a function from observations of inputs and noisy outputs is a fundamental problem in machine learning. Often, an approximation of the desired function is found by minimizing a risk functional over some function space. The space of candidate functions should contain good approximations of the true function, but it should also be such that the minimization of the risk functional is computationally feasible. In this paper, finite multidimensional Fourier series are used as candidate functions. Their impressive approximative capabilities are illustrated by showing that Gaussian-kernel estimators can be approximated arbitrarily well over any compact set of bandwidths with a fixed number of Fourier coefficients. However, the solution of the associated risk minimization problem is computationally feasible only if the dimension d of the inputs is small because the number of required Fourier coefficients grows exponentially with d. This problem is addressed by using the tensor train format to model the tensor of Fourier coefficients under a low-rank constraint. An algorithm for least-squares regression is derived and the potential of this approach is illustrated in numerical experiments. The computational complexity of the algorithm grows only linearly both with the number of observations N and the input dimension d, making it feasible also for large-scale problems.
Sander also let me know that:
Hi Igor,

....
We also provide provide a Julia implementation of the algorithm:

https://bitbucket.org/wahls/mdfourier

....

Best, Sander

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Sunday, December 14, 2014

### Sunday Morning Insight: The Stuff of Discovery

In machine learning, they call the connundrum "exploitation versus exploration", in other circles we talk about "improving stuff versus discovery". There are many ways discovery can be defined. For instance in Crossing into P territory, we noted that a new kind of sensor (a genome sequencer) could enable experimentations in polynomial time thereby clearly expanding the possibilities to do real discovery. While both the PacBio and Oxford Nanopore technologies have just been made available before the summer, they are already changing the nature of discovery in that field [1]. Before, people would wonder how one could put pieces of DNA together, now that this complexity is mostly gone the new norm is now becoming: since genomes can be assembled easily, what sort of discovery can be done with a collection of genomes.
As I've said before, we have had this exact same explosion unraveling in compressive sensing ten years ago. What happened since ? Many polynomial time algorithms were developed with the emphasis of being faster than the previous ones and soon enough more complex data structures began to be exploited by the algorithms. There is really no reason to believe why this should not happen in genome sequencing: We are going to have many algorithms that do alignement using either PacBio or Oxford Nanopore technologies.

But in compressive sensing, something else happened: We began to discover a few things. This past Friday ( Hamming's time: Scientific Discovery Enabled by Compressive Sensing and related fields ) I provided two candidates. One candidate used the structure of the problem and its limitation to make predictions, while the other used the new paradigm to show how to nix a theory. Here are two other: An inference based on sparsity priors [2] (as featured in Catching an aha moment with compressive sensing ) and another one [3] about finding a needle in a haystack in an exponential families of solutions featured in ( Cluster expansion made easy with Bayesian compressive sensing ).

In actuality, those four examples fall into two categories: One category is where one uses the new prior as a way to find that needle in an exponential haystack while the other category uses empirical complexity bounds to reduce the phase space of what is feasible.

Either exploit the newfound capability or reduce the exploration horizon: different sides of the same discovery coin. Both are important.

References:
[1] Genomic sequencing: Recent tweets, papers, blog posts and attendant comment on that blog post:
Widespread polycistronic transcripts in mushroom-forming fungi revealed by single-molecule long-read mRNA sequencing by Sean Gordon, Elizabeth Tseng, Asaf Salamov, Jiwei Zhang, Xiandong Meng, Zhiying Zhao, Dongwan Don Kang, Jason Underwood, Igor V Grigoriev, Melania Figueroa, Jonathan S Schilling, Feng Chen, Zhong Wang

MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island by Philip M Ashton, Satheesh Nair, Tim Dallman, Salvatore Rubino, Wolfgang Rabsch, Solomon Mwaigwisya, John Wain & Justin O'Grady
And a comment following this article: USB-sized DNA sequencer is error-prone, but still useful

Hi! Thanks for the write up! Getting spoken about on Ars Technica is definitely crossed something off my bucket list (I'm first author on the paper discussed).

I would just like to say a few things about the MinION/Oxford Nanopore:

1) While the error rate we observed is high compared to e.g. Illumina, it is comparable to PacBio (the main high throughput, long read tech).

2) You say 'the great promise of nanopore sequencing has been very difficult to match in practice'. However, I don't really think that is true. What ONT have done is amazing!

2a) First of all, the form factor is revolutionary. I'm not sure what your definition of a USB product is, but mine would be 'something where the only connection is a USB connection'. The MinION meets this.

2b) Rather than limiting the MinION device to a small number of elite institutes, they sent it to hundreds of 'normal' people. This is a brave move that speaks to the confidence they have in their technology. We had a positive experience with it, some people probably less so, others more so. This approach to letting everyone have a crack is surely one to applaud?

2c) The technology is just fantastic - single molecule sequencing using a biological pore! Think about how hard that must be to engineer! In a way that can be shipped to and used by hundreds of non-specialists! I should say that Illumina and PacBio also have awesome devices/technologies, but this one is newer ;-)

3) A slight technical issue, but the short reads weren't used to correct the long reads. The long reads were used to join contigs made using the short reads.

4) Another slight technical issue, the Illumina technology with bias is specifically the Nextera protocol. This has been known since this technology was developed by Jay Shendure's lab.

Thanks again for writing us up!

[2] Direct inference of protein–DNA interactions using compressed sensing methods by Mohammed AlQuraishi, and Harley H. McAdams (featured in Catching an aha moment with compressive sensing )

[3] Lance J. Nelson*, Vidvuds Ozolins, C. Shane Reese, Fei Zhou, Gus L. W. Hart, "Cluster expansion made easy with Bayesian compressive sensing," Phys. Rev. B 88, 155105 (Oct. 2013). [pdf] and Lance J. Nelson*, Gus L. W. Hart, Fei Zhou, and Vidvuds Ozolins, "Compressive sensing as a paradigm for building physics models," Phys. Rev. B 87 035125 (2013). [pdf] featured in ( Cluster expansion made easy with Bayesian compressive sensing )

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Saturday, December 13, 2014

### NIPS2014 Poster papers

While I listed the proceedings of NIPS recently, it did not seem to include posters at workshops that are taking place today (and took place yesterday). Here are the ones I could find from the workshop pages, enjoy !

## Deep Learning and Representation Learning Workshop: NIPS 2014

OPT2014 Optimization for Machine Learning

## Modern Nonparametrics 3: Automating the Learning Pipeline

1. The Randomized Causation Coefficient. (talk)
David Lopez-Paz, Krikamol Muandet, Benjamin Recht.
[paper]
2. Influence Functions for Nonparametric Estimation. (spotlight, poster)
Kirthevasan Kandasamy, Akshay Krishnamurthy, Barnabás Póczos, Larry Wasserman, James M. Robins.
[paper]
3. Are You Still Tuning Hyperparameters? Parameter-free Model Selection and Learning. (talk)
Francesco Orabona.
[paper]
4. Uncertainty Propagation in Gaussian Process Pipelines. (spotlight, poster)
Andreas C. Damianou, Neil D. Lawrence.
[paper, poster]
5. Kernel non-parametric tests of relative dependency. (spotlight, poster)
Wacha Bounliphone, Arthur Gretton, Matthew Blaschko.
[paper, spotlight, poster]
6. Generalized Product of Experts for Automatic and Principled Fusion of Gaussian Process Predictions. (spotlight, poster)
Yanshuai Cao, David J. Fleet.
[paper]
7. Computable Learning via Metric Measure Theory. (spotlight, poster)
Arijit Das, Achim Tresch.
[paper]
8. Nonparametric Maximum Margin Similarity for Semi-Supervised Learning. (spotlight, poster)
Yingzhen Yang, Xinqi Chu, Zhangyang Wang, Thomas S. Huang.
[paper]
9. Learning with Deep Trees.  (spotlight, poster)
Giulia DeSalvo, Mehryar Mohri, Umar Syed.
[paper, spotlight, poster]
10. Theoretical Foundations for Learning Kernels in Supervised Kernel PCA. (spotlight, poster)
Mehryar Mohri, Afshin Rostamizadeh, Dmitry Storcheus.
[paper]
11. Kernel Selection in Support Vector Machines Using Gram-Matrix Properties. (spotlight, poster)
Roberto Valerio, Ricardo Vilalta.
[paper, spotlight (pdf, pptx), poster (pdf, pptx)]

## Optimal transport and machine learning

 8:30 - 9:10 Crash-Course in Optimal Transport (Organizers) 9:10 - 10:00 Piotr Indyk Geometric representations of the Earth-Mover Distance Coffee Break 10:30 - 11:20 Alexander Barvinok Non-negative matrices with prescribed row and column sums (slides) 11:20 - 11:40 Zheng, Pestilli, Rokem Quantifying error in estimates of human brain fiber directions using the EMD (paper) 11:40 - 12:00 Courty, Flamary, Rakotomamonjy, Tuia Optimal transport for Domain adaptation Lunch Break 14:40 - 15:00 Flamary, Courty, Rakotomamonjy, Tuia Optimal transport with Laplacian regularization 15:00 - 15:20 Rabin, Papadakis Non-convex relaxation of optimal transport for color transfer (paper) 15:20 - 15:40 Lescornel, Loubès Estimation of deformations between distributions with the Wasserstein distance (paper) 15:40 - 16:30 Adam Oberman Numerical methods for the optimal transportation problem Coffee Break 17:00 - 17:50 Robert McCann Optimal transport: old and new 17:50 - 18:30 Panel Discussion

## Invited Talks

#### Volkan Cevher: A totally unimodular view of structured sparsity

We describe a simple framework for structured sparse recovery based on convex optimization. We show that many interesting structured sparsity models can be naturally represented by linear matrix inequalities on the support of the unknown parameters, where the constraint matrix has a totally unimodular (TU) structure. For such structured models, tight convex relaxations can be obtained in polynomial time via linear programming. Our modeling framework unifies the prevalent structured sparsity norms in the literature, introduces new interesting ones, and renders their tightness and tractability arguments transparent.
Based on
http://infoscience.epfl.ch/record/202767?ln=en http://infoscience.epfl.ch/record/184981?ln=en
Web:
http://lions.epfl.ch/publications

#### Tom McCormick: A survey of minimizing submodular functions on lattices

We usually talk about submodular functions on the subsets of a finite set, the so-called Boolean lattice, and there are many applications of them. There has been a lot of progress made in the last thirty years in understanding the complexity of, and algorithms for, submodular function minimization (SFM) on the Boolean lattice.
But there are other applications where more general notions of submodularity are important. Suppose that we take the usual definition of submodularity and replace the notion of intersection and union of subsets by the "meet" and "join" in a finite lattice. For example, we could take a rectangular "box" of integer points in n-space, with meet being component-wise min, and join being component-wise max. Then we would like to generalize the results about SFM from the Boolean lattice to more general lattices.
There has been a lot of work on such questions in the last ten years, and this talk will survey some of this work. We start with the case where the lattice is distributive. Here Birkhoff's Theorem allows a clever reduction from SFM on a distributive lattice to ordinary SFM on the Boolean lattice. We then consider lattices that are "oriented" or "signed" generalizations of the Boolean lattice, leading to "bisubmodularity", where we do have good characterizations and algorithms. We also briefly cover recent efforts to further generalize bisubmodularity to "k-submodularity", and to product lattices of "diamonds".
Finally we come back to lattices of integer points in n-space. We demonstrate that submodularity by itself is not enough to allow for efficient minimization, even if we also assume coordinate-wise convexity. This leads to considering concepts from "discrete convexity" such as "L-natural convexity", where we do gain enough structure to allow for efficient minimization.

#### Yaron Singer: Adaptive seeding: a new framework for stochastic submodular optimization

In this talk we will introduce a new framework for stochastic optimization called adaptive seeding. The framework was originally designed to enable substantial improvements to influence maximization by leveraging a remarkable structural phenomenon in social networks known as the "friendship paradox" (or "your friends have more friends than you"). At a high level, adaptive seeding is the task of making choices in the present that lead to favorable outcomes in the future, and may be of independent interest to those curious about stochastic optimization, submodular maximization, and machine learning. In the talk we will give a taste to some of the problems that arise in this rich domain. We will discuss key algorithmic ideas, as well as empirical and experimental results.

#### Sebastian Nowozin: Trade-offs in Structured Prediction

Structured learning and prediction problems often turn out to be challenging computationally. Yet, there are different sources of structure and it pays to examine them more closely. One source is a fixed or given or physical constraint on feasible decisions, for example when an actual physical system is being controlled. Another source is the assumption that by modelling the structured domain through latent variables and constraints provides statistical advantages because a small number of parameters can now describe the target domain more accurately and we can learn more effectively from fewer samples by exposing this structure. A third source of combinatorial structure is often the loss function that is used; a structured loss invariably leads to a structured decision problem. When designing our models we are often free to select trade-offs in terms of model structure, capacity, number of parameters, and difficulty of inference; in this talk I will argue that by thinking about the above sources of combinatorial structure we can make more sensible trade-offs and I will illustrate these using example applications from computer vision.

#### Dhruv Batra: Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets

Perception problems are hard and notoriously ambiguous. Robust prediction methods in Computer Vision and Natural Language Processing often search for a diverse set of high-quality candidate solutions or proposals. In structured prediction problems, this becomes a daunting task, as the solution space (image labelings, sentence parses, etc.) is exponentially large.
We study greedy algorithms for finding a diverse subset of solutions in structured-output spaces by drawing new connections between submodular functions over combinatorial item sets and High-Order Potentials (HOPs) studied for graphical models. Specifically, we show that when marginal gains of submodular diversity functions allow structured representations, this enables efficient (sub-linear time) approximate maximization by reducing the greedy augmentation step to inference in a factor graph with appropriately constructed HOPs.
Joint work with Adarsh Prasad (UT-Austin) and Stefanie Jegelka (UC Berkeley).

## Representation and Learning Methods for Complex Outputs

08:30--09:15 Invited Talk: Honglak Lee.
Multimodal Deep Learning with Implicit Output Representations
09:15--10:00 Invited Talk: Francesco Dinuzzo.
Output Kernel Learning with Structural Constraints
10:00--10:30 Coffee Break

Session 2
10:30--11:15 Invited Talk: Hal Daume III.
Structured Latent Representations in NLP
11:15--11:30 Contributed Talk: Corinna Cortes, Vitaly Kuznetsov and Mehryar Mohri.
On-line Learning Approach to Ensemble Methods for Structured Prediction
11:30--12:00 Spotlights
Hanchen Xiong, Sandor Szedmak and Justus Piater.
Implicit Learning of Simpler Output Kernels for Multi-Label Prediction
Fabio Massimo Zanzotto and Lorenzo Ferrone.
Output Distributed Convolution Structure Learning for Syntactic Analysis of Natural Language
François Laviolette, Emilie Morvant, Liva Ralaivola and Jean-Francis Roy.
On Generalizing the C-Bound to the Multiclass and Multilabel Settings
Hongyu Guo, Xiaodan Zhu and Martin Renqiang Min.
A Deep Learning Model for Structured Outputs with High-order Interaction
Thomas Unterthiner, Andreas Mayr, Günter Klambauer, Marvin Steijaert, Hugo Ceulemans, Jörg Wenger and Sepp Hochreiter.
Deep Learning for Drug Target Prediction
12:00--14:00 Lunch Break
14:00--15:00 Poster Session

Session 3
15:00--15:45 Invited Talk: Jia Deng.
Learning Visual Models with a Knowledge Graph
15:45--16:00 Contributed Talk: Calvin Murdock and Fernando De La Torre.
Semantic Component Analysis
16:00--16:15 Contributed Talk: Jiaqian Yu and Matthew Blaschko.
Lova ́sz Hinge for Learning Submodular Losses
16:15--16:30 Invited Talk: Rich Sutton.
Representation Learning in Reinforcement Learning
16:30--17:00 Coffee Break

Session 4
17:00--17:45 Invited Talk: Noah Smith.
Loss Functions and Regularization for Less-than-Supervised NLP
17:45--18:00 Contributed Talk: Hichem Sahbi.
Finite State Machines for Structured Scene Decoding
18:00--18:15 Contributed Talk: Luke Vilnis, Nikos Karampatziakis and Paul Mineiro.
Generalized Eigenvectors for Large Multiclass Problems

# Distributed Machine Learning and Matrix Computations

Session 1
========
08:15-08:30 Introduction, Reza Zadeh
08:30-09:00 Ameet Talwalkar, MLbase: Simplified Distributed Machine Learning
09:00-09:30 David Woodruff, Principal Component Analysis and Higher Correlations for Distributed Data
09:30-10:00 Virginia Smith, Communication-Efficient Distributed Dual Coordinate Ascent

10:00-10:30 Coffee Break

Session 2
========
10:30-11:30 Jeff Dean (Keynote), Techniques for Training Neural Networks Quickly
11:30-12:00 Reza Zadeh, Distributing the Singular Value Decomposition with Spark
12:00-12:30 Jure Leskovec, In-memory graph analytics

12:30-14:30 Lunch Break

Session 3
========
14:30-15:00 Carlos Guestrin, SFrame and SGraph: Scalable, Out-of-Core, Unified Tabular and Graph Processing
15:00-15:30 Inderjit Dhillon, NOMAD: A Distributed Framework for Latent Variable Models
15:30-16:00 Ankur Dave, GraphX: Graph Processing in a Distributed Dataflow Framework
16:00-16:30 Jeremy Freeman, Large-scale decompositions of brain activity

Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning
Minjie Wang, Tianjun Xiao, Jianpeng Li, Jiaxing Zhang, Chuntao Hong, Zheng Zhang
Maxios: Large Scale Nonnegative Matrix Factorization for Collaborative Filtering
Simon Shaolei Du, Boyi Chen, Yilin Liu, Lei Li
Factorbird - a Parameter Server Approach to Distributed Matrix Factorization
Sebastian Schelter, Venu Satuluri, Reza Zadeh
Improved Algorithms for Distributed Boosting
Jeff Cooper, Lev Reyzin
Parallel and Distributed Inference in Coupled Tensor Factorization Models supplementary
Umut Simsekli, Beyza Ermis, Ali Taylan Cemgil, Figen Oztoprak, S. Ilker Birbil
Dogwild! — Distributed Hogwild for CPU and GPU
Cyprien Noel, Simon Osindero
Generalized Low Rank Models
Madeleine Udell, Corinne Horn, Reza Zadeh, Stephen Boyd
Elastic Distributed Bayesian Collaborative Filtering
Alex Beutel, Markus Weimer, Tom Minka, Yordan Zaykov, Vijay Narayanan
LOCO: Distributing Ridge Regression with Random Projections
Brian McWilliams, Christina Heinze, Nicolai Meinshausen, Gabriel Krummenacher, Hastagiri P. Vanchinathan
Logistic Matrix Factorization for Implicit Feedback Data
Christopher C. Johnson
Tighter Low-rank Approximation via Sampling the Leveraged Element
Srinadh Bhojanapalli, Prateek Jain, Sujay Sanghavi

Accepted Papers

## Representation and Learning Methods for Complex Outputs

Flexible Transfer Learning under Support and Model Shift.
Xuezhi Wang and Jeff Schneider

Daniel Hernández-Lobato, José Miguel Hernández-Lobato and Zoubin Ghahramani.

Song Liu, Taiji suzuki and Masashi Sugiyama

Thomas Unterthiner, Andreas Mayr,  Günter Klambauer, Marvin Steijaert, Jörg Wenger, Hugo Ceulemans and Sepp Hochreiter

Pascal Germain, Amaury Habrard, François Laviolette and Emilie Morvant

Anant Raj, Vinary Namboodiri and Tinne Tuytelaars

Michael Goetz, Christian Weber, Bram Stieltjes and Klaus Maier-Hein

Active Nearest Neighbors in Changing environments
Christopher Berlind, Ruth Urner

Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette and Mario Marchand

Sigurd Spieckermann, Steffen Udluft and Thomas Runkler

Daniele calndirello, Alessandro Lazaric and Marcello Restelli

Marta Soare, Ouais Alsharif, Alsessandro Lazaric and Joelle Pineau

Piyush Rai, Wenzhao Lian and Lawrence Carin

Yujia Li, Kevin Swersky and Richard Zernel
Vitaly Kuznetsov and Mehryar Mohri

## 4th Workshop on Automated Knowledge Base Construction (AKBC) 2014

• Contributed Talks
• Arvind Neelakantan, Benjamin Roth and Andrew Mccallum. Knowledge Base Completion using Compositional Vector Space Models PDF
• Peter Clark, Niranjan Balasubramanian, Sumithra Bhakthavatsalam, Kevin Humphreys, Jesse Kinkead, Ashish Sabharwal, and Oyvind Tafjord. Automatic Construction of Inference-Supporting Knowledge Bases PDF
• Ari Kobren, Thomas Logan, Siddarth Sampangi and Andrew McCallum. Domain Specific Knowledge Base Construction via Crowdsourcing PDF
• Posters
• Adrian Benton, Jay Deyoung, Adam Teichert, Mark Dredze, Benjamin Van Durme, Stephen Mayhew and Max Thomas. Faster (and Better) Entity Linking with Cascades PDF
• Ning Gao, Douglas Oard and Mark Dredze. A Test Collection for Email Entity Linking PDF
• Derry Tanti Wijaya, Ndapandula Nakashole and Tom Mitchell. Mining and Organizing a Resource of State-changing Verbs PDF
• Jakob Huber, Christian Meilicke and Heiner Stuckenschmidt. Applying Markov Logic for Debugging Probabilistic Temporal Knowledge Bases PDF
• Ramanathan Guha. Correlating Entities PDF
• Tomasz Tylenda, Sarath Kumar Kondreddi and Gerhard Weikum. Spotting Knowledge Base Facts in Web Texts PDF
• Bhavana Dalvi and William Cohen. Multi-view Exploratory Learning for AKBC Problems PDF
• Alexander G. Ororbia Ii, Jian Wu and C. Lee Giles. CiteSeerX: Intelligent Information Extraction and Knowledge Creation from Web-Based Data PDF
• Adam Grycner, Gerhard Weikum, Jay Pujara, James Foulds and Lise Getoor. A Unified Probabilistic Approach for Semantic Clustering of Relational Phrases PDF
• Ndapandula Nakashole and Tom M. Mitchell. Micro Reading with Priors: Towards Second Generation Machine Readers PDF
• Edouard Grave. Weakly supervised named entity classification PDF
• Lucas Sterckx, Thomas Demeester, Johannes Deleu and Chris Develder. Using Semantic Clustering and Active Learning for Noise Reduction in Distant Supervision PDF
• Francis Ferraro, Max Thomas, Matthew Gormley, Travis Wolfe, Craig Harman and Benjamin Van Durme. Concretely Annotated Corpora PDF
• Luis Galárraga and Fabian M. Suchanek. Towards a Numerical Rule Mining Language PDF
• Bishan Yang and Claire Cardie. Improving on Recall Errors for Coreference Resolution PDF
• Chandra Sekhar Bhagavatula, Thanapon Noraset and Doug Downey. TextJoiner: On-demand Information Extraction with Multi-Pattern Queries PDF
• Benjamin Roth, Emma Strubell, Katherine Silverstein and Andrew Mccallum. Minimally Supervised Event Argument Extraction using Universal Schema PDF
• Mathias Niepert and Sameer Singh. Out of Many, One: Unifying Web-Extracted Knowledge Bases PDF
• Laura Dietz, Michael Schuhmacher and Simone Paolo Ponzetto. Queripidia: Query-specific Wikipedia Construction PDF
• Jay Pujara and Lise Getoor. Building Dynamic Knowledge Graphs PDF

### NIPS2014 - Out of the Box: Robustness in High Dimension

Session 1
========
8:30-9:15 Pradeep Ravikumar, "Dirty Statistical Models"
9:15-10:00 Donald Goldfarb,
"Low-rank Matrix and Tensor Recovery: Theory and Algorithms"

Session 2
========
10:30-11:15 Alessandro Chiuso, "Robustness issues in Kernel Tuning: SURE vs. Marginal Likelihood"
11:15-12:00 Po-Ling Loh, "Local optima of nonconvex M-estimators"

Session 3 (contributed talks)
========
15:00-15:20 Vassilis Kalofolias, "Matrix Completion on Graphs"
15:20-15:40 Aleksandr Aravkin, "Learning sparse models using general robust losses"
15:40-16:00 Stephen Becker, "Robust Compressed Least Squares Regression"
16:00-16:20 Ahmet Iscen, "Exploiting high dimensionality for similarity search"

Session 4
========
17:00-17:45 Rina Foygel Barber, "Controlling the False Discovery Rate via Knockoffs" (joint with Emmanuel Candes)
18:45-18:30 Noureddine El Karoui, "On high-dimensional robust regression and inefficiency of maximum likelihood methods"
Modern ML + NLP 2014

Learning Semantics 2014

Machine Reasoning & Artificial Intelligence
• 08:30a Pedro DomingosUniversity of Washington, Symmetry-Based Semantic Parsing
• 08:50a Tomas Mikolov, Facebook, Challenges in Development of Machine Intelligence
• 09:10a Luke Zettlemoyer, University of Washington, Semantic Parsing for Knowledge Extraction
• 09:30a Panel Discussion
Contributed Posters
• 10:00a Contributed Posters, Coffee Break
Natural Language Processing & Semantics from Text Corpora
Afternoon

Personal Assistants, Dialog Systems, and Question Answering
• 03:00p Susan Hendrich, Microsoft Cortana
• 03:20p Ashutosh SaxenaCornell, Tell Me Dave: Context-Sensitive Grounding of Natural Language into Robotic Tasks
• 03:40p Jason Weston, Facebook, Memory Networks
• 04:00p Panel Discussion
Contributed Posters
• 04:30p Contributed Posters, Coffee Break
Reasoning from Visual Scenes
• 05:00p Alyosha Efros, UC Berkeley, Towards The Visual Memex
• 05:20p Jeffrey Siskind, Purdue University, Learning to Ground Sentences in Video
• 05:40p Larry ZitnickMicrosoft Research, Forget Reality: Learning from Visual Abstraction
• 06:00p Panel Discussion

Contributed Posters

Morning Session (10:00p-10:30p)

Afternoon Session (4:10p-5:00p)

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.