Nuit Blanche: AlexSmola

Showing posts with label AlexSmola. Show all posts

Thursday, May 31, 2018

McKernel: A Library for Approximate Kernel Expansions in Log-linear Time - implementation -

Woohoo ! following up on a previous post, Joachim lets me know of the release of an implementation:

Hi Igor,
The library is now up. The name changed to McKernel. Thanks for your interest.
https://github.com/curto2/mckernel https://arxiv.org/pdf/1702.08159
Cheers,
Curtó

Thanks !

McKernel: A Library for Approximate Kernel Expansions in Log-linear Time by Joachim D. Curtó, Irene C. Zarza, Feng Yang, Alexander J. Smola, Fernando De La Torre, Chong-Wah Ngo, Luc Van Gool

Kernel Methods Next Generation (KMNG) introduces a framework to use kernel approximates in the mini-batch setting with SGD Optimizer as an alternative to Deep Learning. McKernel is a C++ library for KMNG ML Large-scale. It contains a CPU optimized implementation of the Fastfood algorithm that allows the computation of approximated kernel expansions in log-linear time. The algorithm requires to compute the product of Walsh Hadamard Transform (WHT) matrices. A cache friendly SIMD Fast Walsh Hadamard Transform (FWHT) that achieves compelling speed and outperforms current state-of-the-art methods has been developed. McKernel allows to obtain non-linear classification combining Fastfood and a linear classifier.

Implementation is here: https://github.com/curto2/mckernel

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, July 29, 2016

Stochastic Frank-Wolfe Methods for Nonconvex Optimization

All the previous blog entries around the Frank-Wolfe can be found under this Frank-Wolfe tag.

Stochastic Frank-Wolfe Methods for Nonconvex Optimization by Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

We study Frank-Wolfe methods for nonconvex stochastic and finite-sum optimization problems. Frank-Wolfe methods (in the convex case) have gained tremendous recent interest in machine learning and optimization communities due to their projection-free property and their ability to exploit structured constraints. However, our understanding of these algorithms in the nonconvex setting is fairly limited. In this paper, we propose nonconvex stochastic Frank-Wolfe methods and analyze their convergence properties. For objective functions that decompose into a finite-sum, we leverage ideas from variance reduction techniques for convex optimization to obtain new variance reduced nonconvex Frank-Wolfe methods that have provably faster convergence than the classical Frank-Wolfe method. Finally, we show that the faster convergence rates of our variance reduced methods also translate into improved convergence rates for the stochastic setting.

h/t Atlas Wang

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Thursday, August 13, 2015

Cuckoo Linear Algebra

Cuckoo Linear Algebra by Li Zhou, David G. Andersen , Mu Li , Alexander J. Smola

In this paper we present a novel data structure for sparse vectors based on Cuckoo hashing. It is highly memory efficient and allows for random access at near dense vector level rates. This allows us to solve sparse l₁ programming problems exactly and without preprocessing at a cost that is identical to dense linear algebra both in terms of memory and speed. Our approach provides a feasible alternative to the hash kernel and it excels whenever exact solutions are required, such as for feature selection.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Tuesday, June 16, 2015

Fast and Guaranteed Tensor Decomposition via Sketching

More goodies in the tensor realm, and according to Anima, an implementation will be shortly available. Here is the paper: Fast and Guaranteed Tensor Decomposition via Sketching by Yining Wang, Hsiao-Yu Tung, Alexander Smola, Animashree Anandkumar

Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce many novel ideas which are unique to tensors. We develop novel methods for randomized computation of tensor contractions via FFTs, without explicitly forming the tensors. Such tensor contractions are encountered in decomposition methods such as tensor power iterations and alternating least squares. We also design novel colliding hashes for symmetric tensors to further save time in computing the sketches. We then combine these sketching ideas with existing whitening and tensor power iterative techniques to obtain the fastest algorithm on both sparse and dense tensors. The quality of approximation under our method does not depend on properties such as sparsity, uniformity of elements, etc. We apply the method for topic modeling and obtain competitive results.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Saturday, September 20, 2014

Saturday Morning Videos: Random Functions for Dependence and Component Analysis, Randomized Nonlinear Component Analysis and the Automatic Statistician

Random Functions for Dependence and Component Analysis by David Lopez-Paz. The slides are here.

ICML: Randomized Nonlinear Component Analysis by David Lopez-Paz; Suvrit Sra; Alex Smola; Zoubin Ghahramani; Bernhard Schoelkopf

Joint Talk: Automatic Statistician - the big picture & Automatic construction and natural language description of nonparametric regression models by Zoubin Ghahramani and James LLoyd. The slides are here.

Other talks and videos of the First MSR-MLG Joint Meeting organized by David Lopez-Paz, Sebastian Nowozin, Zoubin Ghahramani at Microsoft Research Cambridge, 2014

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Wednesday, September 10, 2014

Randomized Nonlinear Component Analysis - implementation -

Following up on those Saturday Morning Videos: Some ICML 2014 presentations, here is: Randomized Nonlinear Component Analysis by David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schölkopf

Classical methods such as Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are ubiquitous in statistics. However, these techniques are only able to reveal linear relationships in data. Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale.

In a separate strand of recent research, randomized methods have been proposed to construct features that help reveal nonlinear patterns in data. For basic tasks such as regression or classification, random features exhibit little or no loss in performance, while achieving drastic savings in computational requirements.

In this paper we leverage randomness to design scalable new variants of nonlinear PCA and CCA; our ideas extend to key multivariate analysis tools such as spectral clustering or LDA. We demonstrate our algorithms through experiments on real-world data, on which we compare against the state-of-the-art. A simple R implementation of the presented algorithms is provided.

The implementation is here.

Let me note, something we pointed out earlier on Nuit Blanche:

It is of special interest that randomized algorithms are in many cases more robust than their deterministic analogues (Mahoney, 2011) because of the implicit regularization induced by randomness.

Indeed the seminal paper by Mike Mahoney was very clear on the advantages of randomization. Re-reading the introduction makes it plainly clear and is the basis for RandNLA (Randomized Numerical Linear Algebra)

Randomized algorithms for matrices and data

Michael W. Mahoney

(Submitted on 29 Apr 2011 (v1), last revised 15 Nov 2011 (this version, v3))

Randomized algorithms for very large matrix problems have received a great deal of attention in recent years. Much of this work was motivated by problems in large-scale data analysis, and this work was performed by individuals from many different research communities. This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis. An emphasis will be placed on a few simple core ideas that underlie not only recent theoretical advances but also the usefulness of these tools in large-scale data applications. Crucial in this context is the connection with the concept of statistical leverage. This concept has long been used in statistical regression diagnostics to identify outliers; and it has recently proved crucial in the development of improved worst-case matrix algorithms that are also amenable to high-quality numerical implementation and that are useful to domain scientists. Randomized methods solve problems such as the linear least-squares problem and the low-rank matrix approximation problem by constructing and operating on a randomized sketch of the input matrix. Depending on the specifics of the situation, when compared with the best previously-existing deterministic algorithms, the resulting randomized algorithms have worst-case running time that is asymptotically faster; their numerical implementations are faster in terms of clock-time; or they can be implemented in parallel computing environments where existing numerical algorithms fail to run at all. Numerous examples illustrating these observations will be described in detail.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Saturday, September 06, 2014

Saturday Morning Videos: Some ICML 2014 presentations.

There goes your saturday morning, here are some videos from the ICML meeting that might be of interest to the readers of Nuit Blanche:

Robust Principal Component Analysis with Complex Noise Authors: Qian Zhao; Deyu Meng; Zongben Xu; Wangmeng Zuo; Lei Zhang
Randomized Nonlinear Component Analysis Authors: David Lopez-Paz; Suvrit Sra; Alex Smola; Zoubin Ghahramani; Bernhard Schoelkopf
Discriminative Features via Generalized Eigenvectors Authors: Nikos Karampatziakis; Paul Mineiro
Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery Authors: Cun Mu; Bo Huang; John Wright; Donald Goldfarb
Coherent Matrix Completion Authors: Yudong Chen; Srinadh Bhojanapalli; Sujay Sanghavi; Rachel Ward
Universal Matrix Completion Authors: Srinadh Bhojanapalli; Prateek Jain
Exponential Family Matrix Completion under Structural Constraints Authors: Suriya Gunasekar; Pradeep Ravikumar; Joydeep Ghosh
An Adaptive Accelerated Proximal Gradient Method and its Homotopy Continuation for Sparse Optimization Authors: Qihang Lin; Lin Xiao
A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data Authors: Jinfeng Yi; Lijun Zhang; Jun Wang; Rong Jin; Anil Jain
Robust Distance Metric Learning via Simultaneous L1-Norm Minimization and Maximization Authors: Hua Wang; Feiping Nie; Heng Huang
Fast Stochastic Alternating Direction Method of Multipliers Authors: Wenliang Zhong; James Kwok
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Authors: Alekh Agarwal; Daniel Hsu; Satyen Kale; John Langford; Lihong Li; Robert Schapire
Narrowing the Gap: Random Forests In Theory and In Practice Authors: Misha Denil; David Matheson; Nando De Freitas
Marginalized Denoising Auto-encoders for Nonlinear Representations Authors: Minmin Chen; Kilian Weinberger; Fei Sha; Yoshua Bengio
Deep Generative Stochastic Networks Trainable by Backprop Authors: Yoshua Bengio; Eric Laufer; Guillaume Alain; Jason Yosinski
Stochastic Inference for Scalable Probabilistic Modeling of Binary Matrices Authors: Jose Miguel Hernandez-Lobato; Neil Houlsby; Zoubin Ghahramani
Cold-start Active Learning with Robust Ordinal Matrix Factorization Authors: Neil Houlsby; Jose Miguel Hernandez-Lobato; Zoubin Ghahramani
Probabilistic Matrix Factorization with Non-random Missing Data Authors: Jose Miguel Hernandez-Lobato; Neil Houlsby; Zoubin Ghahramani
A Deep Semi-NMF Model for Learning Hidden Representations Authors: George Trigeorgis; Konstantinos Bousmalis; Stefanos Zafeiriou; Bjoern Schuller
Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image ProcessingAuthors: Benjamin Haeffele; Eric Young; Rene Vidal
An Information Geometry of Statistical Manifold Learning Authors: Ke Sun; St
Geodesic Distance Function Learning via Heat Flow on Vector Fields Authors: Binbin Lin; Ji Yang; Xiaofei He; Jieping Ye
On learning to localize objects with minimal supervisionAuthors: Hyun Oh Song; Ross Girshick; Stefanie Jegelka; Julien Mairal; Zaid Harchaoui; Trevor Darrell
Active Detection via Adaptive Submodularity Authors: Yuxin Chen; Hiroaki Shioi; Cesar Fuentes Montesinos; Lian Pin Koh; Serge Wich; Andreas Krause
Rank-One Matrix Pursuit for Matrix Completion Authors: Zheng Wang; Ming-Jun Lai; Zhaosong Lu; Wei Fan; Hasan Davulcu; Jieping Ye
Nuclear Norm Minimization via Active Subspace Selection Authors: Cho-Jui Hsieh; Peder Olsen
Riemannian Pursuit for Big Matrix Recovery Authors: Mingkui Tan; Ivor W. Tsang; Li Wang; Bart Vandereycken; Sinno Jialin Pan
Multiresolution Matrix Factorization Authors: Risi Kondor; Nedelina Teneva; Vikas Garg
Coding for Random Projections Authors: Ping Li; Michael Mitzenmacher; Anshumali Shrivastava
Nearest Neighbors Using Compact Sparse Codes Authors: Anoop Cherian
Composite Quantization for Approximate Nearest Neighbor Search Authors: Ting Zhang; Chao Du; Jingdong Wang
Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization Authors: Xiaotong Yuan; Ping Li; Tong Zhang
Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint Authors: Ji Liu; Jieping Ye; Ryohei Fujimaki
Efficient Algorithms for Robust One-bit Compressive Sensing Authors: Lijun Zhang; Jinfeng Yi; Rong Jin
Nonlinear Information-Theoretic Compressive Measurement Design Authors: Liming Wang; Abolfazl Razi; Miguel Rodrigues; Robert Calderbank; Lawrence Carin
Elementary Estimators for High-Dimensional Linear Regression Authors: Eunho Yang; Aurelie Lozano; Pradeep Ravikumar
Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional SettingAuthors: Yudong Chen; Jiaming Xu
Kernel Mean Estimation and Stein Effect Authors: Krikamol Muandet; Kenji Fukumizu; Bharath Sriperumbudur; Arthur Gretton; Bernhard Schoelkopf
Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels Authors: Jiyan Yang; Vikas Sindhwani; Haim Avron; Michael Mahoney
A Unifying View of Representer Theorems Authors: Andreas Argyriou; Francesco Dinuzzo
Provable Bounds for Learning Some Deep Representations Authors: Sanjeev Arora; Aditya Bhaskara; Rong Ge; Tengyu Ma
K-means recovers ICA filters when independent components are sparse Authors: Alon Vinnikov; Shai Shalev-Shwartz
Anti-differentiating approximation algorithms:A case study with min-cuts, spectral, and flow Authors: David Gleich; Michael Mahoney
Nonnegative Sparse PCA with Provable Guarantees Authors: Megasthenis Asteris; Dimitris Papailiopoulos; Alexandros Dimakis
Finding Dense Subgraphs via Low-Rank Bilinear Optimization Authors: Dimitris Papailiopoulos; Ioannis Mitliagkas; Alexandros Dimakis; Constantine Caramanis
Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm Authors: Hadi Daneshmand; Manuel Gomez-Rodriguez; Le Song; Bernhard Schoelkopf
Sparse Reinforcement Learning via Convex OptimizationAuthors: Zhiwei Qin; Weichang Li; Firdaus Janoos
Elementary Estimators for Sparse Covariance Matrices and other Structured MomentsAuthors: Eunho Yang; Aurelie Lozano; Pradeep Ravikumar
Robust Inverse Covariance Estimation under Noisy Measurements Authors: Jun-Kun Wang; Shou-de Lin
An Analysis of State-Relevance Weights and Sampling Distributions on L1-Regularized Approximate Linear Programming Approximation Accuracy Authors: Gavin Taylor; Connor Geer; David Piekut
Compact Random Feature Maps Authors: Raffay Hamid; Ying Xiao; Alex Gittens; Dennis Decoste
Margins, Kernels and Non-linear Smoothed Perceptrons Authors: Aaditya Ramdas; Javier Pe
Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models Authors: Robert McGibbon; Bharath Ramsundar; Mohammad Sultan; Gert Kiss; Vijay Pande

N00228947.jpg was taken on September 03, 2014 and received on Earth September 04, 2014. The camera was pointing toward TITAN at approximately 2,484,614 miles (3,998,598 kilometers) away, and the image was taken using the CL1 and UV3 filters.
Image Credit: NASA/JPL/Space Science Institute
Join the CompressiveSensing subreddit or the Google+ Community and post there !

Monday, August 18, 2014

Fastfood: Approximate Kernel Expansions in Loglinear Time - The Paper -

Compressive sensing is not the only place where multiplying a vector or a matrix with a Gaussian matrix is a big deal for large scale problems (see the recent Random Matrices Are Too Damn Large ! and Another Comment on " Random Matrices Are Too Damn Large !"). If you recall this is also a problem for Random Kitchen Sinks, a randomized version of AdaBoost (a connection with compressive sensing is mentioned here). There, the training set in Machine Learning is used as a dictionary in order to learn a function. Those dictionaries are, however too large and the authors of the paper resort to a fast random projections to learn the function faster.

We had the talk for over a year now, we now have the attendant paper. There is actually more in the paper than what was shown in the presentation earlier: Fastfood: Approximate Kernel Expansions in Loglinear Time by Quoc Viet Le, Tamas Sarlos, Alexander Johannes Smola

Despite their successes, what makes kernel methods difficult to use in many large scale problems is the fact that storing and computing the decision function is typically expensive, especially at prediction time. In this paper, we overcome this difficulty by proposing Fastfood, an approximation that accelerates such computation significantly. Key to Fastfood is the observation that Hadamard matrices, when combined with diagonal Gaussian matrices, exhibit properties similar to dense Gaussian random matrices. Yet unlike the latter, Hadamard and diagonal matrices are inexpensive to multiply and store. These two matrices can be used in lieu of Gaussian matrices in Random Kitchen Sinks proposed by Rahimi and Recht (2009) and thereby speeding up the computation for a large range of kernel functions. Specifically, Fastfood requires O(n log d) time and O(n) storage to compute n non-linear basis functions in d dimensions, a significant improvement from O(nd) computation and storage, without sacrificing accuracy.

Our method applies to any translation invariant and any dot-product kernel, such as the popular RBF kernels and polynomial kernels. We prove that the approximation is unbiased and has low variance. Experiments show that we achieve similar accuracy to full kernel expansions and Random Kitchen Sinks while being 100x faster and using 1000x less memory. These improvements, especially in terms of memory usage, make kernel methods more practical for applications that have large training sets and/or require real-time prediction.

[1] Fastfood | Approximating Kernel Expansions in Loglinear Time, Quoc Le,Tam, Alex Smola

[2] Fast Functions via Randomized Algorithms: Linear Regression with Random Projections

Video: Fast Food: Approximating Kernel Expansion in Loglinear Time

[3] Video: Random Kitchen Sinks, Fast Food and other randomized kernel evaluations

[4] Fast Functions via Randomized Algorithms: Fastfood versus Random Kitchen Sinks,

[5] Compressed Sensing: Random Features for Large-Scale Kernel Machines

[6] Uniform Approximation of Functions with Random Bases, Ali Rahimi and Benjamin Recht

[7] The Summer of the Deeper Kernels

[8] Nystrom Method vs Random Fourier Features:: A Theoretical and Empirical Comparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou

[9 Pruning random features with correlated kitchen sinks -poster- Brian McWilliams and David Balduzzi

[10] Learning Fastfood Feature Transforms for Scalable Neural Networks, Micol Marchetti-Bowick, Willie Neiswanger

[11] About t'em Random Projections in Random Forests

[12] How Close is Compressive Sensing to Random Features with Random Kitchen Sinks?

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Tuesday, July 22, 2014

Context Aware Recommendation Systems ( Lei Tang, Xavier Amatriain)

Much like the presentation by Lei Tang (Wallmart Labs) on Adaptive User Segmentation for Recommendation at last year's GraphLab 2013 (see Slides (pdf) here and video here). Xavier Amatriain, of Netflix, made a presentation of what we should be expecting in terms of recommendation. The idea here is that most of this work cannot be static otherwise your customers just won't be responsive to it. Here are his slides and the attendant videos from the Machine Learning Summer School organized in Pittsburgh 2014 by Alex Smola. I note the focus put on matrix and tensor factorizations and the persistent reference to blog posts. It's a new world...more on that later.

Recommender Systems (Machine Learning Summer School 2014 @ CMU) from Xavier Amatriain

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Monday, July 21, 2014

Video Stream: GraphLab Conference 2014

We mentioned it before, the GraphLab conference is on and it is streamed live here. The program is here (the Twitter tag seems to be #GraphLabConf )

Day 1: Monday, July 21, 2014General Admission. Registration opens at 8:00am.Session

1: Data Product Pipeline in Practice

9:00am Prof. Carlos Guestrin Co-Founder & CEO, GraphLab Keynote: GraphLab Strategy, Vision and Practice
10:10am Baldo Faieta Social Computing Lead, Adobe Systems Algorithms for Creatives Talent Search using GraphLab
10:30am Amit Moran Chief Data Scientist, Crosswise Customer Spotlight: Crosswise
10:40am Coffee Break (20 mins)

Session 2: Data Science

11:00am Alice Zheng Director of Data Science, GraphLab Machine Learning Toolkits in GraphLab Create
11:20am Karthik Ramachandran, Erick Tryzelaar Lab41 Dendrite large scale graph analytics
11:40am Tao Ye Sr. Scientist, Pandora Internet Radio Large scale music recommendation @ Pandora
12:00pm Prof. Alex Smola CMU and Google Scaling Distributed Machine Learning with the Parameter Server
12:20pm Jonathan Dinu Co-Founder, Zipfian Academy Customer Spotlight: Zipfian Academy
12:30pm Lunch (70 mins)

Session 3: Data Engineering

1:40pm Yucheng Low Co-Founder & Chief Architect, GraphLab Scalable Data Structures: SFrame & SGraph
2:00pm Prof. Joe Hellerstein Co-Founder & CEO, Trifacta Data, DSLs and Transformation: Research and Practice
2:20pm Reynold Xin Co-Founder, Databricks Unified Data Pipeline in Apache Spark
2:40pm Wes McKinney Founder & CEO, DataPad Fast Medium Data Analytics at Scale
3:00pm Coffee Break (20 mins)

Session 4: Deployment

3:20pm Rajat Arya Senior Software Engineer, GraphLab Deployment with GraphLab Create
3:40pm Milind Bhandarkar Chief Scientist, Pivotal The Zoo Expands: Labrador ♥ Elephant thanks to Hamster
4:00pm Prof. Vahab Mirrokni Google Research ASYMP: Fault-tolerant Graph Mining via ASYnchronous Message Passing
4:20pm Josh Wills Director of Data Science, Cloudera What Comes After The Star Schema?
4:40pm Dr. Markus Weimer Microsoft Research REEF: Towards a Big Data stdlib
Session 5: Networking and Demos (5:00-7:00pm)

Day 2: Tuesday, July 22, 2014Training Admission. Registration opens at 8:00am.

GraphLab Create Hands-on Training

The goal of the day is to teach participants how to build a machine learning system at scale from prototype to production using GraphLab Create. A laptop is required to participate.

9:30am Alice Zheng Director of Data Science, GraphLab Introduction
9:45am Yucheng Low Co-Founder & Chief Architect, GraphLab Prepping Data for Analysis: Using GraphLab Create Data Structures and GraphLab Canvas
10:30am Coffee Break (15 mins)
10:45am Srikrishna Sridhar Data Scientist, GraphLab Supervised Learning: Regression and Classification
11:15am Brian Kent Data Scientist, GraphLab Unsupervised Learning: Clustering, Nearest Neighbors, Graph Analysis
11:45am Hands-on Training Exercises and Lunch
1:45pm Chris Dubois Data Scientist, GraphLab Recommender Systems and Text Analysis
2:15pm Coffee Break (15 mins)
2:30pm Rajat Arya Sr. Software Engineer, GraphLab Deployment
3:15pm Hands-on Training Exercises
4:00pm Danny Bickson Co-Founder & Data Scientist, GraphLab Practical Data Science Tips
4:45pm Alice Zheng Director of Data Science, GraphLab Closing Remarks

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Saturday, July 12, 2014

Saturday Morning Videos: Machine Learning Summer School Pittsburgh 2014, Muthu Muthukrishnan

Alex Smola has a channel for all the videos that are being made at the Machine Learning Summer School Pittsburgh 2014. It is here. Lectures include (or will include ) Deepak Agarwal, Xavier Amatriain, Anima Anandkumar, Dan Feldman, Nando de Freitas, Zico Kolter, Quoc Le, Mu Li, Tom Mitchell, Muthu Muthukrishnan, Alex Smola, Markus Weimer, Andrew Wilson.

Of note, Muthu was a recent speaker to our Paris/Europe Wide Machine Learning Meetup. RandNLA and Sketching/streaming are coming to ML fast and our meetup was at the forefront, Woohoo ! Here are the very interesting videos of Muthu, enjoy !

Muthu Muthukrishnan Lecture 1 Muthu Muthukrishnan Lecture 2+3

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Tuesday, January 14, 2014

How Close is Compressive Sensing to Random Features with Random Kitchen Sinks?

I don't know but here is how the authors of [1] describe the Random KitcHen Sinks (RKS) for FastFood which are the non adaptive version of approaches like XNV. RKS seems to be a play on words for Reproducing Kernel Hilbert Spaces that one can use to approximate the identity (i.e. the reproducing property). From [1]

Random Kitchen Sinks (Rahimi & Recht, 2007;2008)1, the algorithm that our algorithm is based on, approximates the function f by means of multiplying the input with a Gaussian random matrix, followed by the application of a nonlinearity. If the expansion dimension is n and the input dimension is d (i.e., the Gaussian matrix is n x d), it requires O(nd) time and memory to evaluate the decision function f. For large problems with sample size mxn, this is typically much faster than the aforementioned \kernel trick" because the computation is independent of the size of the training set. Experiments also show that this approximation method achieves accuracy comparable to RBF kernels while offering significant speedup.

Potentially Interesting Reading

[1] Fastfood | Approximating Kernel Expansions in Loglinear Time, Quoc Le, Tam, Alex Smola

[2] Fast Functions via Randomized Algorithms: Linear Regression with Random Projections

Video: Fast Food: Approximating Kernel Expansion in Loglinear Time

[3] Video: Random Kitchen Sinks, Fast Food and other randomized kernel evaluations

[4] Fast Functions via Randomized Algorithms: Fastfood versus Random Kitchen Sinks,

[5] Compressed Sensing: Random Features for Large-Scale Kernel Machines

[6] Uniform Approximation of Functions with Random Bases, Ali Rahimi and Benjamin Recht
[7]The Summer of the Deeper Kernels

[8] Nystrom Method vs Random Fourier Features:: A Theoretical and Empirical Comparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou

[9 Pruning random features with correlated kitchen sinks -poster- Brian McWilliams and David Balduzzi
[11] XNV: Correlated random features for fast semi-supervised learning - implementation -[10] Learning Fastfood Feature Transforms for Scalable Neural Networks, Micol Marchetti-Bowick, Willie Neiswanger

Deep neural networks are ﬂexible models that are able to learn complex nonlinear functions of data. The goal of this project is to build a shallow neural network that has the same representational power as a deep network by learning an extra nonlinear feature transformation at each node. To apply these transformations, we borrow techniques from the area of scalable, approximate kernel methods. In particular, we use the Fastfood method introduced by Le at al. in [1], which allows an approximate feature map for a transition-invariate kernel to be computed in log-linear time. Our method learns an optimal Fastfood feature expansion at each node while simultaneously optimizing the weight parameters of the neural network. We demonstrate our method on multiple datasets and show that it has better classiﬁcation performance than neural networks with similar architectures.

Image Credit: NASA/JPL/Space Science Institute

Full-Res: W00086165.jpg
W00086165.jpg was taken on January 12, 2014 and received on Earth January 12, 2014. The camera was pointing toward SATURN at approximately 1,457,373 miles (2,345,415 kilometers) away, and the image was taken using the MT3 and CL2 filters.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Thursday, December 05, 2013

#NIPS2013 papers, workshops ...

So #NIPS2013 is starting today with a set of tutorials, and a set of workshops listed below. Two words first, if you are prude or at work don't go watch the photos on Twitter (on your desktop) for the #NIPS2013 hashtag just yet! Second, for those of you in Paris next week, we'll have our 6th ML meetup. Third Andrej Karpathy has a nicer way of viewing the NIPS proceedings. It is here.

Without further due:

Here are a few papers I found interesting but the whole electronic proceeding is here (the whole pdf is here):

Faster Ridge Regression via the Subsampled Randomized Hadamard Transform Paramveer Dhillon, Dean P. Foster, Yichao Lu, Lyle Ungar
Dropout Training as Adaptive Regularization Percy Liang, Stefan Wager, Sida Wang
Generalized Denoising Auto-Encoders as Generative Models Pascal Vincent, Yoshua Bengio, Guillaume Alain, Li Yao
Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors Yi Ma, Di Wang, Xiaoqin Zhang, Zhengyuan Zhou
Bayesian optimization explains human active search Laurent Itti, Ali Borji
When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements Richard Baraniuk, Divyanshu Vats
Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching Guillermo Sapiro, Marcelo Fiori, Pablo Muse, Pablo Sprechmann, Joshua T. Vogelstein
New Subsampling Algorithms for Fast Least Squares Regression Paramveer Dhillon, Dean P. Foster, Yichao Lu, Lyle Ungar
Understanding variable importances in forests of randomized trees Pierre Geurts, Gilles Louppe, Antonio Sutera, Louis Wehenkel
Blind Calibration in Compressed Sensing using Message Passing Algorithms Francesco Caltagirone, Florent Krzakala, Christophe Schulke, Lenka Zdeborova
Neural representation of action sequences: how far can a simple snippet-matching model take us? Thomas Serre, Tomaso Poggio, David Sheinberg, Jedediah M. Singer, Cheston Tan
Beyond Pairwise: Provably Fast Algorithms for Approximate k-Way Similarity Search Anshumali Shrivastava, Ping Li
Learning Multi-level Sparse Representations Fred A. Hamprecht, Ferran Diego Andilla
Wavelets on Graphs via Deep Learning Leonidas Guibas, Raif Rustamov
Designed Measurements for Vector Count Data Lawrence Carin, Liming Wang, Robert Calderbank, David Carlson, Miguel Rodrigues, David Wilcox
Near-Optimal Entrywise Sampling for Data Matrices Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty
Approximate Dynamic Programming Finally Performs Well in the Game of Tetris Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon

The Randomized Dependence Coefficient Philipp Hennig, David Lopez-Paz, Bernhard Schölkopf
Provable Subspace Clustering: When LRR meets SSC Chenlei Leng, Huan Xu, Yu-Xiang Wang
Generalized Random Utility Models with Multiple Types David C. Parkes, Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai
Polar Operators for Structured Sparse Estimation Dale Schuurmans, Xinhua Zhang, Yao-Liang Yu
On Decomposing the Proximal Map Yao-Liang Yu
More data speeds up training time in learning halfspaces over sparse vectors Amit Daniely, Nati Linial, Shai Shalev-Shwartz
Causal Inference on Time Series using Restricted Structural Equation Models Dominik Janzing, Jonas Peters, Bernhard Schölkopf
Deep Fisher Networks for Large-Scale Image Classification Andrea Vedaldi, Andrew Zisserman, Karen Simonyan
Sparse Additive Text Models with Low Rank Background Lei Shi
Variance Reduction for Stochastic Gradient Optimization Chong Wang, Xi Chen, Alex Smola, Eric Xing
Training and Analysing Deep Recurrent Neural Networks Benjamin Schrauwen, Michiel Hermans
Decision Jungles: Compact and Rich Models for Classification Antonio Criminisi, John Winn, Pushmeet Kohli, Sebastian Nowozin, Toby Sharp, Jamie Shotton
Actor-Critic Algorithms for Risk-Sensitive MDPs Mohammad Ghavamzadeh, Prashanth L.A.
One-shot learning and big data with n=2 Dean P. Foster, Lee H. Dicker
Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression Miguel Lazaro-Gredilla, Michalis Titsias
Optimal Neural Population Codes for High-dimensional Stimulus Variables Alan Stocker, Daniel Lee, Zhuo Wang
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction Rie Johnson, Tong Zhang
Using multiple samples to learn mixture models Jason Lee, Rich Caruana, Ran Gilad-Bachrach
Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition Tzu-Kuo Huang, Jeff Schneider
Accelerated Mini-Batch Stochastic Dual Coordinate Ascent Shai Shalev-Shwartz, Tong Zhang
Online Robust PCA via Stochastic Optimization Huan Xu, Shuicheng Yan, Jiashi Feng
A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks Junming Yin, Qirong Ho, Eric Xing
Correlated random features for fast semi-supervised learning David Balduzzi, Joachim Buhmann, Brian McWilliams
Better Approximation and Faster Algorithm Using the Proximal Average Yao-Liang Yu
Rapid Distance-Based Outlier Detection via Sampling Karsten Borgwardt, Mahito Sugiyama
Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima Martin J. Wainwright, Po-Ling Loh
Auditing: Active Learning with Outcome-Dependent Query Costs Nati Srebro, Sivan Sabato, Anand D. Sarwate
A message-passing algorithm for multi-agent trajectory planning Jonathan S. Yedidia, Javier Alonso-Mora, Jose Bento, Nate Derbinsky
Learning Stochastic Feedforward Neural Networks Ruslan Salakhutdinov, Yichuan Tang
Inferring neural population dynamics from multiple partial recordings of the same neural circuit Lars Buesing, Henry Dalgleish, Michael Hausser, Jakob Macke, Adam M. Packer, Noah Pettit, Srini Turaga
Multi-Prediction Deep Boltzmann Machines Ian Goodfellow, Yoshua Bengio, Aaron Courville, Mehdi Mirza
Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation Carsten Rother, Philip Torr, Vibhav Vineet
Learning Trajectory Preferences for Manipulators via Iterative Improvement Ashutosh Saxena, Thorsten Joachims, Ashesh Jain, Brian Wojcik
Large Scale Distributed Sparse Precision Estimation Arindam Banerjee, Huahua Wang, Inderjit Dhillon, Cho-Jui Hsieh, Pradeep Ravikumar
On Algorithms for Sparse Multi-factor NMF Siwei Lyu, Xin Wang
Dirty Statistical Models Eunho Yang, Pradeep Ravikumar
Structured Learning via Logistic Regression Justin Domke
Reinforcement Learning in Robust Markov Decision Processes Huan Xu, Shie Mannor, Shiau Hong Lim
On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization Ke Hou, Zhi-Quan Luo, Anthony Man-Cho So, Zirui Zhou
Recurrent networks of coupled Winner-Take-All oscillators for solving constraint satisfaction problems Giacomo Indiveri, Hesham Mostafa, Lorenz. K. Mueller
Latent Structured Active Learning Alex Schwing, Raquel Urtasun, Wenjie Luo
A Gang of Bandits Claudio Gentile, Nicolò Cesa-Bianchi, Giovanni Zappella
Learning Feature Selection Dependencies in Multi-task Learning Daniel Hernández-Lobato, José Miguel Hernández-Lobato
Online PCA for Contaminated Data Huan Xu, Shie Mannor, Shuicheng Yan, Jiashi Feng
Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) Eric Moulines, Francis Bach
Efficient Algorithm for Privately Releasing Smooth Queries Liwei Wang, Kai Fan, Ziteng Wang, Jiaqi Zhang
Unsupervised Spectral Learning of Finite State Transducers Ariadna Quattoni, Xavier Carreras, Raphael Bailly
Learning a Deep Compact Image Representation for Visual Tracking Dit-Yan Yeung, Naiyan Wang
Robust Data-Driven Dynamic Programming Grani Adiwena Hanasusanto, Daniel Kuhn
Low-Rank Matrix and Tensor Completion via Adaptive Sampling Aarti Singh, Akshay Krishnamurthy
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron, Marie Chavent, Adrien Todeschini
Distributed Exploration in Multi-Armed Bandits Tomer Koren, Eshcar Hillel, Zohar S. Karnin, Ronny Lempel, Oren Somekh
The Pareto Regret Frontier Wouter M. Koolen
Direct 0-1 Loss Minimization and Margin Maximization with Boosting Ming Tan, Shaojun Wang, Tian Xia, Shaodan Zhai
Regret based Robust Solutions for Uncertain Markov Decision Processes Yossiri Adulyasak, Asrar Ahmed, Patrick Jaillet, Pradeep VarakanthamSupervised Sparse Analysis and Synthesis Operators Guillermo Sapiro, Tal Ben Yakar, Alexander M. Bronstein, Roee Litman, Pablo Sprechmann
Low-rank matrix reconstruction and clustering via approximate message passing Toshiyuki Tanaka, Ryosuke Matsushita
Reasoning With Neural Tensor Networks for Knowledge Base Completion Christopher D. Manning, Richard Socher, Danqi Chen, Andrew Ng
Zero-Shot Learning Through Cross-Modal Transfer Christopher D. Manning, Richard Socher, Milind Ganjoo, Andrew Ng
Estimating LASSO Risk and Noise Level Andrea Montanari, Mohsen Bayati, Murat A. Erdogdu
Learning Adaptive Value of Information for Structured Prediction Ben Taskar, David J. Weiss
Efficient Online Inference for Bayesian Nonparametric Relational Models Prem Gopalan, David Blei, Dae Il Kim, Erik Sudderth
Approximate inference in latent Gaussian-Markov models from continuous time observations Botond Cseke, Guido Sanguinetti, Manfred Opper
Linear Convergence with Condition Number Independent Access of Full Gradients Mehrdad Mahdavi, Rong Jin, Lijun Zhang
Robust Spatial Filtering with Beta Divergence Klaus-Robert Müller, Motoaki Kawanabe, Duncan Blythe, Wojciech Samek
Convex Relaxations for Permutation Problems Rodolphe Jenatton, Francis Bach, Alexandre D'Aspremont, Fajwel Fogel
High-Dimensional Gaussian Process Bandits Andreas Krause, Volkan Cevher, Josip Djolonga
A memory frontier for complex synapses Surya Ganguli, Subhaneil Lahiri
A Comparative Framework for Preconditioned Lasso Algorithms Fabian L. Wauthier, Nebojsa Jojic, Michael Jordan
Lasso Screening Rules via Dual Polytope Projection Jiayu Zhou, Jieping Ye, Peter Wonka, Jie Wang
Efficient Optimization for Sparse Gaussian Process Regression Aaron Hertzmann, Marcus A. Brubaker, Yanshuai Cao, David Fleet
Lexical and Hierarchical Topic Regression Jordan Boyd-Graber, Viet-An Nguyen, Philip Resnik
Stochastic Convex Optimization with Multiple Objectives Mehrdad Mahdavi, Rong Jin, Tianbao Yang
A Kernel Test for Three-Variable Interactions Arthur Gretton, Dino Sejdinovic, Wicher Bergsma
Robust Transfer Principal Component Analysis with Rank Constraints Yuhong Guo
Online Learning with Switching Costs and Other Adaptive Adversaries Ofer Dekel, Ohad Shamir, Nicolò Cesa-Bianchi
Learning Prices for Repeated Auctions with Strategic Buyers Afshin Rostamizadeh, Umar Syed, Kareem Amin
Probabilistic Principal Geodesic Analysis P.T. Fletcher, Miaomiao Zhang
Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models Adel Javanmard, Andrea Montanari
Learning with Noisy Labels Ambuj Tewari, Inderjit Dhillon, Nagarajan Natarajan, Pradeep Ravikumar
Tracking Time-varying Graphical Structure David Danks, Erich Kummerfeld
Online Learning with Costly Features and Labels András György, Russell Greiner, Gabor Bartok, Csaba Szepesvari, Navid Zolghadr
Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions Liam Paninski, Eftychios A. Pnevmatikakis
A Novel Two-Step Method for Cross Language Representation Learning Yuhong Guo, Min Xiao
Statistical Active Learning Algorithms Maria-Florina Balcan, Vitaly Feldman
Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits Liam Paninski, Brooks Paige, Ari Pakman, Ben Shababo
Reflection methods for user-friendly submodular optimization Stefanie Jegelka, Suvrit Sra, Francis Bach
Unsupervised Structure Learning of Stochastic And-Or Grammars Maria Pavlovskaia, Kewei Tu, Song-Chun Zhu
Convex Tensor Decomposition via Structured Schatten Norm Regularization Ryota Tomioka, Taiji Suzuki
Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs Yoshua Bengio, Yann Dauphin
A Deep Architecture for Matching Short Texts Hang Li, Zhengdong Lu
Reservoir Boosting : Between Online and Offline Ensemble Learning Leonidas Lefakis, François Fleuret
Multiclass Total Variation Clustering David Uminsky, Thomas Laurent, Xavier Bresson, James von Brecht
Approximate Inference in Continuous Determinantal Processes Ben Taskar, Raja Hafiz Affandi, Emily Fox
Global Solver and Its Efficient Approximation for Variational Bayesian Low-rank Subspace Clustering Ichiro Takeuchi, Masashi Sugiyama, Shinichi Nakajima, S. Derin Babacan, Akiko Takeda
Thompson Sampling for 1-Dimensional Exponential Family Bandits Emilie Kaufmann, Nathaniel Korda, Remi Munos
It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals Christoph Lippert, Oliver Stegle, Karsten Borgwardt, Barbara Rakitsch
Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses Ambuj Tewari, Harish G. Ramaswamy, Shivani Agarwal
Inverse Density as an Inverse Problem: the Fredholm Equation Approach Mikhail Belkin, Qichao Que
Robust Image Denoising with Multi-Column Deep Neural Networks Honglak Lee, Forest Agostinelli, Michael R. Anderson
EDML for Learning Parameters in Directed and Undirected Graphical Models Adnan Darwiche, Arthur Choi, Khaled Refaat
Similarity Component Analysis Fei Sha, Soravit Changpinyo, Kuan Liu
Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs Tejas D. Kulkarni, Vikash Mansinghka, Yura N. Perov, Josh Tenenbaum
Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation Martin J. Wainwright, John Duchi, Michael Jordan
Firing rate predictions in optimal balanced networks Christian K. Machens, Sophie Denève, David G. Barrett
Manifold-based Similarity Adaptation for Label Propagation Masayuki Karasuyama, Hiroshi Mamitsuka
Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty David Wipf, Haichao Zhang
Learning to Prune in Metric and Non-Metric Spaces Leonid Boytsov, Bilegsaikhan Naidan
Online learning in episodic Markovian decision processes by relative entropy policy search Gergely Neu, Alexander Zimin
Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result Paul Wagner
Bayesian Hierarchical Community Discovery Charles Blundell, Yee Whye Teh
From Bandits to Experts: A Tale of Domination and Independence Claudio Gentile, Noga Alon, Yishay Mansour, Nicolò Cesa-Bianchi
Predictive PAC Learning and Process Decompositions Cosma Shalizi, Aryeh Kontorovitch
Pass-efficient unsupervised feature selection Crystal Maung, Haim Schweitzer
Solving inverse problem of Markov chain with partial observations Takayuki Osogami, Tetsuro Morimura, Tsuyoshi Ide
Mapping paradigm ontologies to and from the brain Gael Varoquaux, Yannick Schwartz, Bertrand Thirion
Noise-Enhanced Associative Memories Lav R. Varshney, Amin Karbasi, Amir Hesam Salavati, Amin Shokrollahi
Exact and Stable Recovery of Pairwise Interaction Tensors Shouyuan Chen, Irwin King, Michael R. Lyu, Zenglin Xu
Perfect Associative Learning with Spike-Timing-Dependent Plasticity Christian Albers, Klaus Pawelzik, Maren Westkott
On Poisson Graphical Models Eunho Yang, Zhandong Liu, Genevera I. Allen, Pradeep Ravikumar
Streaming Variational Bayes Andre Wibisono, Nicholas Boyd, Tamara Broderick, Michael Jordan, Ashia C. Wilson
Gaussian Process Conditional Copulas with Applications to Financial Time Series Daniel Hernández-Lobato, José Miguel Hernández-Lobato, James R. Lloyd
Extracting regions of interest from biological images with convolutional sparse block coding Maneesh Sahani, Marius Pachitariu, Henry Dalgleish, Michael Hausser, Adam M. Packer, Noah Pettit
DESPOT: Online POMDP Planning with Regularization Nan Ye, Wee Sun Lee, David Hsu, Adhiraj Somani
Matrix Completion From any Given Set of Observations Troy Lee, Adi Shraibman
Regression-tree Tuning in a Streaming Setting Francesco Orabona, Samory Kpotufe
Multiscale Dictionary Learning for Estimating Conditional Distributions Francesca Petralia, David Dunson, Joshua T. Vogelstein
Stochastic Optimization of PCA with Capped MSG Nati Srebro, Raman Arora, Andy Cotter
Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies Joshua T. Abbott, Joseph Austerweil, Trevor Darrell, Thomas Griffiths, Yangqing Jia
Robust Bloom Filters for Large MultiLabel Classification Tasks Nicolas Usunier, Patrick Gallinari, Thierry Artières, Moustapha M. Cisse
Top-Down Regularization of Deep Belief Networks Matthieu Cord, Hanlin Goh, Joo-Hwee Lim, Nicolas Thome
Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions Joseph Keshet, Tamir Hazan, Tommi Jaakkola, Subhransu Maji
Machine Teaching for Bayesian Learners in the Exponential Family Xiaojin Zhu
Scoring Workers in Crowdsourcing: How Many Control Questions are Enough? Alex Ihler, Mark Steyvers, Qiang Liu
Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths Cristian Sminchisescu, Stefan Mathe
Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model Fang Han, Han Liu
Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation Jörg Hendrik Kappes, Bogdan Savchynskyy, Christoph Schnörr, Paul Swoboda
Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic Aarti Singh, Akshay Krishnamurthy, James L. Sharpnack
Demixing odors - fast inference in olfaction Alexandre Pouget, Jeff Beck, Agnieszka Grabska-Barwinska, Peter Latham
Learning Multiple Models via Regularized Weighting Huan Xu, Shie Mannor, Daniel Vainsencher
When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured SparsityAnima Anandkumar, Sham M. Kakade, Daniel Hsu, Majid Janzamin
Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space Wee Sun Lee, Xinhua Zhang, Yee Whye Teh
Distributed Submodular Maximization: Identifying Representative Elements in Massive Data Andreas Krause, Amin Karbasi, Baharan Mirzasoleiman, Rik Sarkar
Adaptive Market Making via Online Learning Satyen Kale, Jacob Abernethy
On the Sample Complexity of Subspace Learning Lorenzo Rosasco, Guillermo D. Canas, Alessandro Rudi
Embed and Project: Discrete Sampling with Universal Hashing Ashish Sabharwal, Bart Selman, Carla P. Gomes, Stefano Ermon
Discriminative Transfer Learning with Tree-based Priors Nitish Srivastava, Ruslan Salakhutdinov
DeViSE: A Deep Visual-Semantic Embedding Model Andrea Frome, Samy Bengio, Greg S. Corrado, Jeff Dean, Tomas Mikolov, Marc'Aurelio Ranzato, Jon Shlens
Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation Aarti Singh, Larry Wasserman, Martin Azizyan
Predicting Parameters in Deep Learning Nando de Freitas, Misha Denil, Laurent Dinh, Marc'Aurelio Ranzato, Babak Shakibi
Estimating the Unseen: Improved Estimators for Entropy and other Properties Gregory Valiant, Paul Valiant
What do row and column marginals reveal about your dataset? John Byers, Behzad Golshan, Evimaria Terzi
RNADE: The real-valued neural autoregressive density-estimator Hugo Larochelle, Iain Murray, Benigno Uria
Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards Thomas Bonald, Alexandre Proutiere
Reconciling priors'' & "priors" without prejudice? Remi Gribonval, Pierre Machart
Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis Timothy T. Rogers, Christopher Cox, Rob Nowak, Nikhil Rao
Sensor Selection in High-Dimensional Gaussian Trees with Nuisances Jonathan P. How, Daniel S. Levine
Sequential Transfer in Multi-armed Bandit with Finite Set of Models Alessandro Lazaric, Emma Brunskill, Mohammad Gheshlaghi azar
Buy-in-Bulk Active Learning Liu Yang, Jaime Carbonell
Contrastive Learning Using Spectral Methods David C. Parkes, Ryan P. Adams, Daniel Hsu, James Y. Zou
Sparse Inverse Covariance Estimation with Calibration Han Liu, Tuo Zhao
Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization Julien Mairal
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Marco Cuturi
Speedup Matrix Completion with Side Information: Application to Multi-Label Learning Rong Jin, Miao Xu, Zhi-Hua Zhou
Compete to Compute Jürgen Schmidhuber, Faustino Gomez, Sohrob Kazerounian, Jonathan Masci, Rupesh K. Srivastava
Information-theoretic lower bounds for distributed statistical estimation with communication constraints Martin J. Wainwright, Yuchen Zhang, John Duchi, Michael Jordan
Projected Natural Actor-Critic Philip S. Thomas, Sridhar Mahadevan, William C. Dabney, Stephen Giguere
How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal Andre Wibisono, Jacob Abernethy, Peter Bartlett, Rafael Frongillo
Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests David Sontag, Yonatan Halpern, Yacine Jernite
Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion Franz Kiraly, Louis Theran
Learning the Local Statistics of Optical Flow Yair Weiss, Dan Rosenbaum, Daniel Zoran
Aggregating Optimistic Planning Trees for Solving Markov Decision Processes Raphael Fonteneau, Gunnar Kedenburg, Remi Munos
Robust learning of low-dimensional dynamics from large neural ensembles Liam Paninski, David Pfau, Eftychios A. Pnevmatikakis
Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising Min Xu, Tie-Yan Liu, Tao Qin
Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization Greg Mori, Leonid Sigal, Michalis Raptis, Nataliya Shapovalova
A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables Seyoung Kim, Jing Xiang
The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram
Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints Jeff A. Bilmes, Rishabh K. Iyer
Synthesizing Robust Plans under Incomplete Domain Models Minh Do, Subbarao Kambhampati, Tuan A. Nguyen
Symbolic Opportunistic Policy Iteration for Factored-Action MDPs Prasad Tadepalli, Roni Khardon, Alan Fern, Aswin Raghavan
One-shot learning by inverting a compositional causal process Ruslan Salakhutdinov, Brenden M. Lake, Josh Tenenbaum
Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators Michel Besserve, Nikos K. Logothetis, Bernhard Schölkopf
Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis Mikhail Belkin, Luis Rademacher, James R. Voss
Deep Neural Networks for Object Detection Dumitru Erhan, Christian Szegedy, Alexander Toshev
Geometric optimisation on positive definite matrices for elliptically contoured distributions Suvrit Sra, Reshad Hosseini
Sign Cauchy Projections and Chi-Square Kernel Ping Li, John Hopcroft, Gennady Samorodnitsk
Relevance Topic Model for Unstructured Social Group Activity Recognition Yongzhen Huang, Tieniu Tan, Liang Wang, Fang Zhao
k-Prototype Learning for 3D Rigid Structures Jinhui Xu, Ronald Berezney, Hu Ding
Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting Angela J. Yu, Shunan Zhang
Probabilistic Movement Primitives Gerhard Neumann, Christian Daniel, Alexandros Paraschos, January Peters
Policy Shaping: Integrating Human Feedback with Reinforcement Learning Shane Griffith, Charles Isbell, Jonathan Scholz, Kaushik Subramanian, Andrea L. Thomaz
Multilinear Dynamical Systems for Tensor Time Series Lei Li, Mark Rogers, Stuart Russell
Deep content-based music recommendation Benjamin Schrauwen, Sander Dieleman, Aaron van den Oord
A Stability-based Validation Procedure for Differentially Private Machine Learning Kamalika Chaudhuri, Staal A. Vinterbo
Capacity of strong attractor patterns to model behavioural and cognitive prototypes Abbas Edalat
Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA Jing Lei, Vincent Q. Vu, Juhee Cho, Karl Rohe
Cluster Trees on Manifolds Aarti Singh, Alessandro Rinaldo, Larry Wasserman, Sivaraman Balakrishnan, Srivatsan Narayanan
Bayesian inference for low rank spatiotemporal neural receptive fields Mijung Park, Jonathan Pillow
Adaptive Submodular Maximization in Bandit Setting Branislav Kveton, Brian Eriksson, Victor Gabillon, S. Muthukrishnan, Zheng Wen
Analyzing Hogwild Parallel Gaussian Gibbs Sampling Matthew Johnson, James Saunderson, Alan Willsky
Minimax Optimal Algorithms for Unconstrained Linear Optimization Jacob Abernethy, Brendan McMahan
(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings Abhradeep Guha Thakurta, Adam Smith
Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions Jeff A. Bilmes, Stefanie Jegelka, Rishabh K. Iyer
Learning Kernels Using Local Rademacher Complexity Corinna Cortes, Marius Kloft, Mehryar Mohri
Annealing between distributions by averaging moments Ruslan Salakhutdinov, Roger B. Grosse, Chris J. Maddison
Optimizing Instructional Policies Harold Pashler, Michael Mozer, William J. Huggins, Robert Lindsey
Translating Embeddings for Modeling Multi-relational Data Antoine Bordes, Jason Weston, Nicolas Usunier, Alberto Garcia-Duran, Oksana Yakhnenko
Phase Retrieval using Alternating Minimization Sujay Sanghavi, Prateek Jain, Praneeth Netrapalli
Real-Time Inference for a Gamma Process Model of Neural Spiking Lawrence Carin, Vinayak Rao, David Carlson, Joshua T. Vogelstein
Understanding Dropout Pierre Baldi, Peter J. Sadowski
The Power of Asymmetry in Binary Hashing Nati Srebro, Ruslan Salakhutdinov, Yury Makarychev, Behnam Neyshabur, Payman Yadollahpour
Estimation, Optimization, and Parallelism when Data is Sparse John Duchi, Michael Jordan, Brendan McMahan
A multi-agent control framework for co-adaptation in brain-computer interfaces Liam Paninski, Tony Jebara, Roy Fox, Josh S. Merel
Modeling Overlapping Communities with Node Popularities Chong Wang, Prem Gopalan, David Blei
Learning from Limited Demonstrations Doina Precup, Joelle Pineau, Amir massoud Farahmand, Beomjoon Kim
Memory Limited, Streaming PCA Constantine Caramanis, Prateek Jain, Ioannis Mitliagkas
An Approximate, Efficient LP Solver for LP Rounding Christopher Re, Ji Liu, Stephen Wright, Victor Bittorf, Srikrishna Sridhar, Ce Zhang
Bayesian inference as iterated random functions with applications to sequential inference in graphical models Xuanlong Nguyen, Arash Amini
Compressive Feature Learning Trevor Hastie, John C. Mitchell, Hristo S. Paskov, Robert West
Moment-based Uniform Deviation Bounds for k-means and Friends Sanjoy Dasgupta, Matus Telgarsky
Fast Template Evaluation with Vector Quantization David Forsyth, Mohammad Amin Sadeghi
Context-sensitive active sensing in humans Angela J. Yu, Sheeraz Ahmad, He Huang
A New Convex Relaxation for Tensor Completion Massimiliano Pontil, Bernardino Romera-Paredes
Variational Planning for Graph-based MDPs Alex Ihler, Qiang Cheng, Qiang Liu, Feng Chen
Convex Two-Layer Modeling Dale Schuurmans, Xinhua Zhang, Özlem Aslan, Hao Cheng
Sketching Structured Matrices for Faster Nonlinear Regression Vikas Sindhwani, Haim Avron, David Woodruff
(More) Efficient Reinforcement Learning via Posterior Sampling Ian Osband, Dan Russo, Benjamin Van Roy
Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition Adel Javanmard, Andrea Montanari
Efficient Exploration and Value Function Generalization in Deterministic Systems Benjamin Van Roy, Zheng Wen
Bellman Error Based Feature Generation using Random Projections on Sparse Spaces Doina Precup, Joelle Pineau, Amir massoud Farahmand, Yuri Grinberg, Mahdi Milani Fard
Learning invariant representations and applications to face verification Joel Z. Leibo, Tomaso Poggio, Qianli Liao
Optimization, Learning, and Games with Predictable Sequences Karthik Sridharan, Sasha Rakhlin
Adaptivity to Local Smoothness and Dimension in Kernel Regression Samory Kpotufe, Vikas Garg
Adaptive dropout for training deep neural networks Jimmy Ba, Brendan Frey
Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream Charles Cadieu, James J. DiCarlo, Ha Hong, Daniel L. Yamins
Distributed Representations of Words and Phrases and their Compositionality Ilya Sutskever, Kai Chen, Greg S. Corrado, Jeff Dean, Tomas Mikolov
Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel Tai Qin, Karl Rohe
Analyzing the Harmonic Structure in Graph-Based Learning Zhenguo Li, Shih-Fu Chang, Xiao-Ming Wu
Recurrent linear models of simultaneously-recorded neural populations Biljana Petreska, Maneesh Sahani, Marius Pachitariu
BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables Inderjit Dhillon, Cho-Jui Hsieh, Russell Poldrack, Pradeep Ravikumar, Matyas A. Sustik
The Fast Convergence of Incremental PCA Sanjoy Dasgupta, Yoav Freund, Akshay Balsubramani
Multisensory Encoding, Decoding, and Identification Aurel A. Lazar, Yevgeniy Slutskiy
Optimal integration of visual speed across different spatiotemporal frequency channels Alan Stocker, Matjaz Jogan
Matrix factorization with binary components Martin Slawski, Matthias Hein, Pavlo Lutsik
Learning to Pass Expectation Propagation Messages Daniel Tarlow, John Winn, Nicolas Heess
Robust Low Rank Kernel Embeddings of Multivariate Distributions Le Song, Bo Dai

Several posters from the workshops are listed below:

CONTRIBUTED TALKS

Mu Li, Li Zhou, Zichao Yang, Aaron Li, Fei Xia, David Andersen and Alexander Smola.
Parameter Server for Distributed Machine Learning
We propose a parameter server framework to solve distributed machine learning problems. Both data and workload are distributed into client nodes, while server nodes maintain globally shared parameters, which are represented as sparse vectors and matrices. The framework manages asynchronous data communications between clients and servers. Flexible consistency models, elastic scalability and fault tolerance are supported by this framework. We present algorithms and theoretical analysis for challenging nonconvex and nonsmooth problems. To demonstrate the scalability of the proposed framework, we show experimental results on real data with billions of parameters.
PDF

Yarin Gal and Zoubin Ghahramani.
Pitfalls in the use of Parallel Inference for the Dirichlet Process
Recent work done by Lovell, Adams, and Mansingka [2012] and Williamson, Dubey, and Xing [2013] has suggested an alternative parametrisation for the Dirichlet process in order to derive non-approximate parallel MCMC inference for it. This approach to parallelisation has been picked-up and implemented in several different fields [Chahuneau et al., 2013, Pan et al., 2013]. In this paper we show that the approach suggested is impractical due to an extremely unbalanced distribution of the data. We characterise the requirements of efficient parallel inference for the Dirichlet process and show that the proposed inference fails most of these conditions (while approximate approaches often satisfy most of them). We present both theoretical and experimental evidence of this, analysing the load balance for the inference showing that it is independent of the size of the dataset and the number of nodes available in the parallel implementation, and end with preliminary suggestions of alternative paths of research for efficient non-approximate parallel inference for the Dirichlet process.
PDF

Yingyu Liang, Maria-Florina Balcan and Vandana Kanchanapally.
Distributed PCA and k-Means Clustering
This paper proposes a distributed PCA algorithm, with the theoretical guarantee that any good approximation solution on the projected data for k-means clustering is also a good approximation on the original data, while the projected dimension required is independent of the original dimension. When combined with the distributed coreset-based clustering approach in [3], this leads to an algorithm in which the number of vectors communicated is independent of the size and the dimension of the original data. Our experiment results demonstrate the effectiveness of the algorithm.
PDF

POSTERS

Julien-Charles Lévesque, Christian Gagné and Robert Sabourin.
Ensembles of Budgeted Kernel Support Vector Machines for Parallel Large Scale Learning
In this work, we propose to combine multiple budgeted kernel support vector machines (SVMs) trained with stochastic gradient descent (SGD) in order to exploit large databases and parallel computing resources. The variance induced by budget restrictions of the kernel SVMs is reduced through the averaging of predictions, resulting in greater generalization performance. The variance of the trainings results in a diversity of predictions, which can help explain the better performance. Finally, the proposed method is intrinsically parallel, which means that parallel computing resources can be exploited in a straightforward manner.
PDF

Zhen Qin, Vaclav Petricek, Nikos Karampatziakis, Lihong Li and John Langford.
Efficient Online Bootstrapping for Large Scale Learning
Bootstrapping is a useful technique for estimating the uncertainty of a predictor, for example, confidence intervals for prediction. It is typically used on small to moderate sized datasets, due to its high computation cost. This work describes a highly scalable online bootstrapping strategy, implemented inside Vowpal Wabbit, that is several times faster than traditional strategies. Our experiments indicate that, in addition to providing a black box-like method for estimating uncertainty, our implementation of online bootstrapping may also help to train models with better prediction performance due to model averaging.
PDF

Arun Kumar, Nikos Karampatziakis, Paul Mineiro, Markus Weimer and Vijay Narayanan.
Distributed and Scalable PCA in the Cloud
Principal Component Analysis (CA) is a popular technique with many applications. Recent randomized PCA algorithms scale to large datasets but face a bottleneck when the number of features is also large. We propose to mitigate this issue using a composition of structured and unstructured randomness within a randomized PCA algorithm. Initial experiments using a large graph dataset from Twitter show promising results. We demonstrate the scalability of our algorithm by implementing it both on Hadoop, and a more flexible platform named REEF.
PDF

Nedim Lipka.
Towards Distributed Reinforcement Learning for Digital Marketing with Spark
A variety of problems in digital marketing can be modeled as Markov decision processes and solved by dynamic programming with the goal of calculating the policy that maximizes the expected discounted reward. Algorithms, such as policy iteration, require a state transition and a reward model, which can be estimated based on a given data set. In this paper, we compare the execution times for estimating the transition function in a map-reduce fashion if the data set becomes large in terms of the number of records and features. Therefore, we create different-sized Spark and Hadoop clusters in the Amazon cloud computing environment. The in-memory clustering system Spark is outperforming Hadoop and runs up to 71% faster. Furthermore, we study the execution times of policy iteration running on Spark clusters and show the execution time reduction gained by increasing the number of instances in the cluster.
PDF

Tuukka Ruotsalo, Jaakko Peltonen, Manuel J. A. Eugster, Dorota Glowacka, Giulio Jacucci, Aki Reijonen and Samuel Kaski.
Lost in Publications? How to Find Your Way in 50 Million Scientific Documents
Researchers must navigate big data. Current scientific knowledge includes 50 million published articles. How can a system help a researcher find relevant documents in her field? We introduce IntentRadar, an interactive search user interface and search engine that anticipates userâ™s search intents by estimating them form userâ™s interaction with the interface. The estimated intents are visualized on a radial layout that organizes potential intents as directions in the information space. The intent radar assists users to direct their search by allowing feedback to be targeted on keywords that represent the potential intents. Users can provide feedback by manipulating the position of the keywords on the radar. The system then learns and visualizes improved estimates and corresponding documents. IntentRadar has been shown to significantly improve usersâ™ task performance and the quality of retrieved information without compromising task execution time.
PDF

Michael Kane and Bryan Lewis.
cnidaria: A Generative Communication Approach to Scalable, Distributed Learning
This paper presents a scalable, software framework that facilitates large-scale learning and numerical computing. Unlike existing MapReduce frameworks our design is not limited to embarrassingly parallel computing challenges. The framework sits on top of existing storage infrastructures and results of a computation may left out on the cluster (a reduce step is not required). Unlike existing distributed numerical frameworks the proposed framework is elastic and works with both dense and sparse data representations. This generality is achieved through a generative communication scheme whose expressions are either consumed by the distributed computing environment or used to move data, in a peer-to-peer (P2P) fashion, between nodes in a cluster/cloud. This approach integrates advances in the both cloud computing and the distributed numerical computing community and can be applied to a general class of learning challenges.
PDF

Anshumali Shrivastava and Ping Li.
Beyond Pairwise: Provably Fast Algorithms for Approximate k-Way Similarity Search
We go beyond the notion of pairwise similarity and look into search problems with k-way similarity functions. In this paper, we focus on problems related to 3-way Jaccard similarity. We show that approximate R3way similarity search problems admit fast algorithms with provable guarantees, analogous to the pairwise case. Our analysis and speedup guarantees naturally extend to k-way resemblance. In the process, we extend traditional framework of locality sensitive hashing (LSH) to handle higher-order similarities, which could be of independent theoretical interest. The applicability of R3way search is shown on the Google Sets application as well as in an application for improving retrieval quality.
PDF

Wei Dai, Jinliang Wei, Xun Zheng, Jin Kyu Kim, Seunghak Lee, Junming Yin, Qirong Ho and Eric Xing.
Petuum: A System for Iterative-Convergent Distributed ML
A major bottleneck to applying advanced ML programs at industrial scales is the migration of an academic implementation, often specialized for a small, wellcontrolled computer platform such as desktop PCs and small lab-clusters, to a big, less predicable platform such as a corporate cluster or the cloud. This poses enormous challenges: how does one train huge models with billions of parameters on massive data, especially when substantial expertise is required to handle many low-level systems issues? We propose a new architecture of systems components that systematically addresses these challenges, thus providing a generalpurpose distributed platform for Big Machine Learning. Our architecture specifically exploits the fact that many ML programs are fundamentally loss function minimization problems, and that their iterative-convergent nature presents many unique opportunities to minimize loss, such as via dynamic variable scheduling and error-bounded consistency models for synchronization. Thus, we treat data, parameter and variable blocks as computing units to be dynamically scheduled and updated in an error-bounded manner, with the goal of minimizing the loss function as quickly as possible.
PDF

Haiqin Yang, Junjie Hu, Michael Lyu and Irwin King.
Online Imbalanced Learning with Kernels
Imbalanced learning, or learning from imbalanced data, is a challenging problem in both academy and industry. Nowadays, the streaming imbalanced data become popular and trigger the volume, velocity, and variety issues of learning from these data. To tackle these issues, online learning algorithms are proposed to learn a linear classifier via maximizing the AUC score. However, the developed linear classifiers ignore the learning power of kernels. In this paper, we therefore propose online imbalanced learning with kernels (OILK) to exploit the non-linearity and heterogeneity embedded in the imbalanced data. Different from previously proposed work, we optimize the AUC score to learn a non-linear representation via the kernel trick. To relieve the computational and storing cost, we also investigate different buffer update policies, including first-in-first-out (FIFO) and reservoir sampling (RS), to maintain a fixed budgeted buffer on the number of support vectors. We demonstrate the properties of our proposed OILK through detailed experiments.
PDF

Alex Beutel, Abhimanu Kumar, Evangelos Papalexakis, Partha Pratim Talukdar, Christos Faloutsos and Eric Xing.
FLEXIFACT: Scalable Flexible Factorization of Coupled Tensors on Hadoop
Given multiple data sets of relational data that share a number of dimensions, how can we efficiently decompose our data into the latent factors? Factorization of a single matrix or tensor has attracted much attention, as, e.g., in the Netflix challenge, with users rating movies. However, we often have additional, side, information, like, e.g., demographic data about the users, in the Netflix example above. Incorporating the additional information leads to the coupled factorization problem. So far, it has been solved for relatively small datasets. We provide a distributed, scalable method for decomposing matrices, tensors, and coupled data sets through stochastic gradient descent on a variety of objective functions. We offer the following contributions: (1) Versatility: Our algorithm can perform matrix, tensor, and coupled factorization, with flexible objective functions including the Frobenius norm, Frobenius norm with an l1 induced sparsity, and non-negative factorization. (2) Scalability: FLEXIFACT scales to unprecedented sizes in both the data and model, with up to billions of parameters. FLEXIFACT runs on standard Hadoop. (3) Convergence proofs showing that FLEXIFACT converges on the variety of objective functions, even with projections.
PDF

Faraz Makari Manshadi and Rainer Gemulla.
A Distributed Approximation Algorithm for Mixed Packing-Covering Linear Programs
Mixed packing-covering linear programs capture a simple but expressive subclass of linear programs. They commonly arise as linear programming relaxations of a number important combinatorial problems, including various network design and generalized matching problems. In this paper, we propose an efficient distributed approximation algorithm for solving mixed packing-covering problems which requires a poly-logarithmic number of passes over the input. Our algorithm is well-suited for parallel processing on GPUs, in shared-memory architectures, or on small clusters of commodity nodes. We report results of a case study for generalized bipartite matching problems.
PDF

Artem Sokolov and Stefan Riezler.
Task-driven Greedy Learning of Feature Hashing Functions
Randomly hashing multiple features into one aggregated feature is routinely used in largescale machine learning tasks to both increase speed and decrease memory requirements, with little or no sacrifice in performance. In this paper we investigate whether using a learned (instead of a random) hashing function improves performance. We show experimentally that with increasing difference between the dimensionalities of the input space and the hashed space, learning hashes is increasingly useful compared to random hashing.
PDF

Ahmed Elgohary, Ahmed Farahat, Mohamed Kamel and Fakhri Karray.
Approximate Nearest Centroid Embedding for Kernel $k$-Means
This paper proposes an efficient embedding method for scaling kernel k-means on cloud infrastructures. The embedding method allows for approximating the computation of the nearest centroid to each data instance and, accordingly, it eliminates the quadratic space and time complexities of the cluster assignment step in the kernel k-means algorithm. We show that the proposed embedding method is effective under memory and computing power constraints, and that it achieves better clustering performance compared to other approximations of the kernel kmeans algorithm.
PDF

Yisheng Liao, Alex Rubinsteyn, Russell Power and Jinyang Li.
Learning Random Forests on the GPU
Random Forests are a popular and powerful machine learning technique, with several fast multi-core CPU implementations. Since many other machine learning methods have seen impressive speedups from GPU implementations, applying GPU acceleration to random forests seems like a natural fit. Previous attempts to use GPUs have relied on coarse-grained task parallelism and have yielded inconclusive or unsatisfying results. We introduce CudaTree, a GPU Random Forest implementation which adaptively switches between data and task parallelism. We show that, for larger datasets, this algorithm is faster than highly tuned multi-core CPU implementations.
PDF

Shravan Narayanamurthy, Markus Weimer, Dhruv Mahajan, Tyson Condie, Sundararajan Sellamanickam and S. Sathiya Keerthi.
Towards Resource-Elastic Machine Learning

PDF

Ignacio Arnaldo, Kalyan Veeramachaneni and Una-May O'Reilly.
Building Multiclass Nonlinear Classifiers with GPUs
The adoption of multiclass classification strategies that train independent binary classifiers becomes challenging when the goal is to retrieve nonlinear models from large datasets and the process requires several passes through the data. In such scenario, the combined use of a search and score algorithm and GPUs allows to obtain binary classifiers in a reduced time. We demonstrate our approach by training a ten class classifier over more than 400K exemplars following the exhaustive Error Correcting Output Code strategy that decomposes into 511 binary problems.
PDF

John Canny and Huasha Zhao.
BIDMach: Large-scale Learning with Zero Memory Allocation
This paper describes recent work on the BIDMach toolkit for large-scale machine learning. BIDMach has demonstrated single-node performance that exceeds that of published cluster systems for many common machine-learning task. BIDMach makes full use of both CPU and GPU acceleration (through a sister library BIDMat), and requires only modest hardware (commodity GPUs). One of the challenges of reaching this level of performance is the allocation barrier. While it is simple and expedient to allocate and recycle matrix (or graph) objects in expressions, this approach is too slow to match the arithmetic throughput possible on either GPUs or CPUs. In this paper we describe a caching approach that allows code with complex matrix (graph) expressions to run at massive scale, i.e. multi-terabyte data, with zero memory allocation after initial start-up. We present a number of new benchmarks that leverage this approach.
PDF

Shohei Hido, Satoshi Oda and Seiya Tokui.
Jubatus: An Open Source Platform for Distributed Online Machine Learning
Distributed computing is essential for handling very large datasets. Online learning is also promising for learning from rapid data streams. However, it is still an unresolved problem how to combine them for scalable learning and prediction on big data streams. We propose a general computational framework called loose model sharing for online and distributed machine learning. The key is to share only models rather than data between distributed servers. We also introduce Jubatus, an open source software platform based on the framework. Finally, we describe the details of implementing classifier and nearest neighbor algorithms, and discuss our experimental evaluations.
PDF

Accepted Papers

Gagan Goel, Afshin Nikzad, Adish Singla; Matching Workers Expertise with Tasks: Incentives in Heterogeneous Crowdsourcing Markets.

Hossein Azari Soufiani, William Z. Chen, David C. Parkes, Lirong Xia; Generalized Method-of-Moments for Rank Aggregation.

Genevieve Patterson, Grant Van Horn, Serge Belongie, Pietro Perona, James Hays; Bootstrapping Fine-Grained Classifiers: Active Learning with a Crowd in the Loop.

Chien-Ju Ho, Aleksandrs Slivkins, Jennifer Wortman Vaughan; Adaptive Contract Design for Crowdsourcing.

Paul Ruvolo, Jacob Whitehill, Javier R. Movellan; Exploiting Commonality and Interaction Effects in Crowdsourcing Tasks Using Latent Factor Models.

Ashwinkumar Badanidiyuru, Robert Kleinberg, Aleksandrs Slivkins; Bandits with Knapsacks: Dynamic procurement for crowdsourcing.

Nicole Immorlica, Greg Stoddard, Vasilis Syrgkanis; Social Status and the Design of Optimal Badges.

Adish Singla, Ilija Bogunovic, Gábor Bartók, Amin Karbasi, Andreas Krause; On Actively Teaching the Crowd to Classify.

Poster presentations

Sparse Combinatorial Autoencoders (ID 2)
Karthik Narayan, Pieter Abbeel
Grounded Compositional Semantics for Finding and Describing Images with Sentences (ID 4)
Richard Socher, Quoc Le, Christopher Manning, Andrew Ng
Curriculum Learning for Handwritten Text Line Recognition (ID 5)
Jérôme Louradour, Christopher Kermorvant
A Deep and Tractable Density Estimator (ID 7)
Benigno Uria, Iain Murray, Hugo Larochelle
Multi-Column Deep Neural Networks for Offline Handwritten Chinese Character Classification (ID 11)
Dan Ciresan, Juergen Schmidhuber
End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks (ID 12)
Dimitri Palaz, Ronan Collobert, Mathew Magimai.-Doss
Scalable Wide Sparse Learning for Connectomics (ID 15)
Jeremy Maitin-Shepard, Pieter Abbeel
Is deep learning really necessary for word embeddings? (ID 16)
Rémi Lebret, Joël Legrand, Ronan Collobert
Recurrent Conditional Random Fields (ID 18)
Kaisheng Yao, Baolin Peng, Geoffrey Zweig, Dong Yu, Xiaolong Li, Feng Gao
Recurrent Convolutional Neural Networks for Scene Parsing (ID 20)
Pedro Pinheiro, Ronan Collobert
Backpropagation in Sequential Deep Belief Networks (ID 22)
Galen Andrew, Jeff Bilmes
Learning semantic representations for the phrase translation model (ID 23)
Jianfeng Gao, Xiaodong He, Wen-tau Yih, Li Deng
Event-driven Contrastive Divergence in Spiking Neural Networks (ID 25)
Emre Neftci, Bruno Pedroni, Gert Cauwenberghs, Kenneth Kreutz-Delgado, Srinjoy Das
Dynamics of learning in deep linear neural networks [supp] (ID 27)
Andrew Saxe, James McClelland, Surya Ganguli
Exploring Deep and Recurrent Architectures for Optimal Control (ID 28)
Sergey Levine
Analyzing noise in autoencoders and deep networks (ID 29)
Ben Poole, Jascha Sohl-Dickstein, Surya Ganguli
Structured Recurrent Temporal Restricted Boltzmann Machines (ID 30)
Roni Mittelman, Benjamin Kuipers, Silvio Savarese, Honglak Lee
Learning Deep Representations via Multiplicative Interactions between Factors of Variation (ID 31)
Scott Reed, Honglak Lee
Learning Input and Recurrent Weight Matrices in Echo State Networks (ID 32)
Hamid Palangi, Li Deng, Rabab Ward
Learning Sum-Product Networks with Direct and Indirect Variable Interactions (ID 33)
Amirmohammad Rooshenas, Daniel Lowd
Bidirectional Recursive Neural Networks for Token-Level Labeling with Structure (ID 34)
Ozan Irsoy, Claire Cardie
Estimating Dependency Structures for non-Gaussian Components (ID 38)
Hiroaki Sasaki, Michael Gutmann, Hayaru Shouno, Aapo Hyvarinen
Multimodal Neural Language Models (ID 42)
Ryan Kiros, Ruslan Salakhutdinov, Richard Zemel
Non-degenerate Priors for Arbitrarily Deep Networks (ID 43)
David Duvenaud, Oren Rippel, Ryan Adams, Zoubin Ghahramani
Learning Multilingual Word Representations using a Bag-of-Words Autoencoder (ID 44)
Stanislas Lauly, Alex Boulanger, Hugo Larochelle
Multilingual Deep Learning (ID 45)
Sarath Chandar A P, Mitesh M. Khapra, Balaraman Ravindran, Vikas Raykar, Amrita Saha
Learned-norm pooling for deep neural networks (ID 46)
Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, Yoshua Bengio
Transition-based Dependency Parsing Using Recursive Neural Networks (ID 47)
Pontus Stenetorp

Sarah R. Allen, Lisa Hellerstein.
Approximation algorithms for reducing classification cost in ensembles of classifiers
Yudong Chen, Jiaming Xu.
Statistical-Computational Tradeoffs in Planted Models: The High-Dimensional Setting
Junkyu Lee, William Lam, Rina Dechter.
Benchmark on DAOOPT and GUROBI with the PASCAL2 Inference Challenge Problems
Hastagiri P. Vanchinathan, Andreas Marfurt, Charles-Antoine Robelin, Donald Kossmann, Andreas Krause
Adaptively Selecting Valuable Diverse Sets via Gaussian Processes and Submodularity
Alon Milchgrub, Rina Dechter. On Minimal Tree-Inducing Cycle-Cutsets and Their Use in a Cutset-Driven Local Search
Adarsh Prasad, Stefanie Jegelka, Dhruv Batra.
Submodular Maximization and Diversity in Structured Output Spaces(Supplement)
K. S. Sesh Kumar, Francis Bach.
Maximizing submodular functions using probabilistic graphical models
Kui Tang, Tony Jebara.
Network Ranking With Bethe Pseudomarginals
Baharan Mirzasoleiman, Amin Karbasi, Andreas Krause, Rik Sarkar.
Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
Hidekazu Oiwa, Issei Sato, Hiroshi Nakagawa
Novel Sparse Modeling by L2 + L0 Regularization
Vitaly Feldman, Jan Vondrak
Approximation of Submodular and XOS Functions by Juntas with Applications to Learning

Accepted Papers

Joint Learning of Modular Structures from Multiple Data Types

Elham Azizi
Large-margin Structured Learning for Link Ranking

Stephen H. Bach, Bert Huang and Lise Getoor
Dynamic Structural Equation Models for Tracking Cascades over Social Networks

Brian Baingana, Gonzalo Mateos and Georgios B. Giannakis
Evolving Groups of Political Interest in Social News

Roja Bandari, Hazhir Rahmandad and Vwani Roychowdhury
Validating Collective Classification Using Cohorts

Eric Bax, James Li, Abdullah Sonmez and Zehra Cataltepe
Content + Context Networks for User Classification in Twitter

William M. Campbell, Elisabeth Baseman and Kara Greenfield
Bayesian Nonparametric Random Graphs

François Caron and Emily B. Fox
Random Walk Features for Network-aware Topic Models

Ahmed Hefny, Geoffrey Gordon and Katia Sycara
Infinite Degree Corrected Stochastic Blockmodel

Tue Herlau, Mikkel N. Schmidt and Morten Morup
The HIM Glocal Metric and Kernel for Network Comparison and Classification

Giuseppe Jurman, Roberto Visintainer, Michele Filosi, Samantha Riccadonna and Cesare Furlanello
CuttingEdge: Influence Minimization in Networks

Elias Khalil, Bistra Dilkina and Le Song
Sequential Monte Carlo Inference of MMSB for Dynamic Social Networks

Tomoki Kobayashi and Koji Eguchi
Variational Bayesian Inference Algorithms for Network Infinite Relational Model

Takuya Konishi, Takatomi Kubo, Kazuho Watanabe and Kazushi Ikeda
Exchangeable Databases and their Functional Representation

James Robert Lloyd, Peter Orbanz, Zoubin Ghahramani and Daniel M. Roy
Exploiting Direct and Indirect Information for Friend Suggestion in ZingMe

Kien Duy Nguyen, Tuan Pham Minh, Quang Nhat Nguyen and Thanh Trung Nguyen
Ranking Networks

Amirali Salehi-Abari and Craig Boutilier
Inferring Multilateral Relations from Dynamic Pairwise Interactions

Aaron Schein, Juston Moore and Hanna Wallach
A New Mathematical Space for Social Networks

Anshumali Shrivastava and Ping Li
Accurate Spectral Clustering for Community Detection in MapReduce

Serafeim Tsironis, Mauro Sozio and Michalis Vazirgiannis
Measuring the Statistical Significance of Local Connections in Directed Networks

James D. Wilson, Shankar Bhamidi and Andrew B. Nobel
Stochastic Slice Models: Consistent Estimation with Application to Missing Covariates Imputations

Justin J. Yang and Edoardo M. Airoldi
Nonparametric Estimation of Exchangeable Graph Models

Justin J. Yang, Christina Q. Han and Edoardo M. Airoldi
Community Detection in Networks with Node Features

Yuan Zhang, Elizaveta Levina and Ji Zhu

Linear Bandits, Matrix Completion, and Recommendation Systems [pdf]

Efficient coordinate-descent for orthogonal matrices through Givens rotations [pdf][supplementary]

Improved Greedy Algorithms for Sparse Approximation of a Matrix in terms of Another Matrix [pdf]

Preconditioned Krylov solvers for kernel regression [pdf]

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms [pdf][supplementary]

Dimension Independent Matrix Square using MapReduce [pdf]

Active Learning of Intuitive Sound Qualities (Huang, Duvenaud, Arnold, Partridge, and Oberholtzer) [pdf]
There is often a mismatch between the high-level goals an artist wants to express and what the parameters of a synthesizer allow them to control. To enable composers to directly adjust personalized high-level qualities during sound synthesis, our system actively learns functions that map from the space of synthesizer control parameters to perceived levels of high-level qualities.
Automatic Construction and Natural-Language Summarization of Additive Nonparametric Models (Lloyd, Duvenaud, Grosse, Tenenbaum, and Ghahramani) [pdf][supplement1][supplement2]
To complement recently introduced automatic model-construction and search methods, we demonstrate an automatic model-summarization procedure. After building an additive nonparametric regression model, our method constructs a report which visualizes and explains in words the meaning and relevance of each component. These reports enable human model-checking and the understanding of complex modeling assumptions and structure. We demonstrate this procedure on two time-series, showing that the automatically constructed models identify clearly interpretable structures that can be automatically described in simple natural language.
Designing Constructive Machine Learning Models based on Generalied Linear Learning Techniques (Kordjamshidi and Moens) [pdf]
We propose a general framework for designing machine learning models that deal with constructing complex structures in the output space. The goal is to provide an abstraction layer to easily represent and design constructive learning models. The learning approach is based on generalized linear training techniques, and exploits techniques from combinatorial optimization to deal with the complexity of the underlying inference required in this type of models. This approach also allows to consider global structural characteristics and constraints over the output elements in an efficient training and prediction setting. The use case focuses on building spatial meaning representations from text to instantiate a virtual world.
Learning Graphical Concepts (Ellis, Dechter, Adams, and Tenenbaum) [pdf]
How can machine learning techniques be used to solve problems whose solutions are best represented as computer programs? For example, suppose a researcher wants to design a probabilistic graphical model for a novel domain. Searching the space of probabilistic models automatically is notoriously difficult, especially difficult when latent variables are involved. However, researchers seem able to easily adapt commonly used modeling motifs to new domains. In doing so, they draw on abstractions such as trees, chains, grids and plates to constrain and direct the kinds of models they produce. This suggests that before we ask machine learning algorithms to discover parsimonious models of new domains, we should develop techniques that enable our algorithms to automatically learn these ?graphical concepts? in much the same way that researchers themselves do, by seeing examples in the literature. One natural way to think of these graphical concepts is as programs that take sets of random variables and produce graphical models that relate them. In this work, we describe the CEC algorithm, which attempts to learn a distribution over programs by incrementally finding program components that commonly help to solve problems in a given domain, and we show preliminary results indicating that CEC is able to discover the graphical concepts that underlie many of the common graphical model structures.
The Constructive Learning Problem: An Efficient Approach for Hypergraphs (Costa and Sorescu) [pdf]
Discriminative systems that can deal with input graphs are known, however, generative/constructive approaches that can output (hyper)graphs belonging with high probability to a desired class, are less studied. Here we propose an approach that, differently from common graph grammars inference systems, is computationally efficient and robust to the presence of outliers in the training sample. We report experimental results in a de-novo molecular synthesis problem. We show that we can construct compounds that, once added to the original training set can improve the performance of a binary classification predictor.
Analyzing Probabilistic Models Generated by EDAs for Simplified Protein Folding Problems (Santana, Mendiburu, and Lozano) [pdf]
Estimation of distribution algorithms (EDAs) are optimization methods that construct at each step a probabilistic graphical model (PGM) of the best evaluated solutions. The model serves as a concise representation of the regularities shared by the good solutions and can serve to unveil structural characteristics of the problem domain. In this paper we use the PGMs learned by EDAs in the optimization of 15, 575 instances of the hydrophobic-polar (HP) functional protein folding model to analyze the relationship between the information contained in the PGMs? structures and the quality of the EDA?s solutions.
Anticipating the Future By Constructing Human Activities using Object Affordances (Koppula and Saxena) [pdf]
An important aspect of human perception is anticipation and anticipating which activities will a human do next (and how to do them) in useful for many applications, for example, anticipation enables an assistive robot to plan ahead for reactive responses in the human environments. In this work, we present a constructive approach for generating various possible future human activities by reasoning about the rich spatial-temporal relations through object affordances. We represent each possible future using an anticipatory temporal conditional random field (ATCRF) where we sample the nodes and edges corresponding to future object trajectories and human poses from a generative model. We then represent the distribution over the potential futures using a set of constructed ATCRF particles. In extensive evaluation on CAD-120 human activity RGB-D dataset, for new subjects (not seen in the training set), we obtain an activity anticipation accuracy (defined as whether one of top three predictions actually happened) of 75.4%, 69.2% and 58.1% for an anticipation time of 1, 3 and 10 seconds respectively. 1
Learning Global-to-Local Discrete Components with Nonparametric Bayesian Feature Construction (Heo, Lee, and Zhang) [pdf]
Finding common latent components from data is an important step in many data mining applications. These latent variables are typically categorical and there are many sources of categorical variables, including dichotomous, nominal, ordinal, and cardinal values. Thus it is important to be able to represent the discrete components (categories) in a flexible way. Here we propose a nonparametric Bayesian approach to learning "plastic" discrete components by considering the uncertainty of the number of components with the Indian buffet processes (IBP). As observation models, we use the product of experts (PoE) to utilize sharper representation power and sparse over-completeness. We apply the proposed method to optical hand-written digit datasets and demonstrate its capability of finding flexible global-to-local components that can be used to describe and generate the observed digit images faithfully.
Racing Tracks Improvisation (Wang and Missura) [pdf][supplement]
Procedural content generation is a popular technique in the game development. One of its typical applications is generation of game levels. This paper presents a method to generate tracks for racing games, by viewing racing track generation as a discrete sequence prediction problem. To solve it we combine two techniques from music improvisation. We show that this method is capable of generating new racing tracks which appear to be interesting enough.
STONES: Stochastic Technique for Generating Songs (Kamp and Manea) [pdf]
We propose a novel approach for automatically constructing new songs from a set of given compositions that involves sampling a melody line as well as the corresponding harmonies given by chords. The song is sampled from a hierarchical Markov model that captures the implicit properties of good composed songs from a set of existing ones. We empirically show that songs generated by our approach are closer to music composed by humans than those of existing methods.
Constructing Cocktails from a Cocktail Map (Paurat, Garnett, and Gärtner) [pdf]
Consider a dataset that describes cocktails by the amount of ingredients used and a lower dimensional embedding of it that can be considered a map of cocktails. The problem we tackle is to query an arbitrary point of interest in this lower dimensional embedding and retrieve a newly constructed cocktail which embeds to that queried location. To do so, we formulate the task as a constrained optimization problem and consider the resulting ingredient mix as a 'hot' candidate. Starting off with a very basic formulation that merely demands the necessities of our problem to be fulfilled, we incorporate additional desired conditions into the problem formulation and compare the resulting cocktail recipes.
Supervised graph summarization for structuring academic search results (Mirylenka and Passerini) [pdf]
In this paper we address the problem of visualizing the query results of the academic search services. We suggest representing the search results as concise topic hierarchies, and propose a method of building such hierarchies through summarization of the intermediate large topic graphs. We describe a supervised learning technique for summarizing the topic graphs in the most informative way using sequential structured prediction, and discuss our ongoing work on the interactive acquisition of the training examples.
Hybrid SRL with Optimization Modulo Theories (Teso, Sebastiani, and Passerini) [pdf]
Generally speaking, the goal of constructive learning could be seen as, given an example set of structured objects, to generate novel objects with similar properties. From a statistical-relational learning (SRL) viewpoint, the task can be interpreted as a constraint satisfaction problem, i.e. the generated objects must obey a set of soft constraints, whose weights are estimated from the data. Traditional SRL approaches rely on (finite) First-Order Logic (FOL) as a description language, and on MAX-SAT solvers to perform inference. Alas, FOL is unsuited for constructive problems where the objects contain a mixture of Boolean and numerical variables. It is in fact difficult to implement, e.g. linear arithmetic constraints within the language of FOL. In this paper we propose a novel class of hybrid SRL methods that rely on Satisfiability Modulo Theories, an alternative class of formal languages that allow to describe, and reason over, mixed Boolean-numerical objects and constraints. The resulting methods, which we call Learning Modulo Theories, are formulated within the structured output SVM framework, and employ a weighted SMT solver as an optimization oracle to perform efficient inference and discriminative max margin weight learning. We also present a few examples of constructive learning applications enabled by our method.

Varun Aggarwal, Shashank Srikant, and Vinay Shashidhar
Principles for using Machine Learning in the Assessment of Open Response Items: Programming Assessment as a Case Study
Sumit Basu, Chuck Jacobs and Lucy Vanderwende
Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading
Franck Dernoncourt, Choung Do, Sherif Halawa, Una-May O’Reilly, Colin Taylor, Kalyan Veeramachaneni and Sherwin Wu
MOOCVIZ: A Large Scale, Open Access,Collaborative, Data Analytics Platform for MOOCs
Jorge Diez, Oscar Luaces, Amparo Alonso-Betanzos, Alicia Troncoso and Antonio Bahamonde
Peer Assessment in MOOCs Using Preference Learning via Matrix Factorization
Stephen E. Fancsali
Data-driven causal modeling of “gaming the system” and off-task behavior in Cognitive Tutor Algebra
Damien Follet
A three-steps classification algorithm to assist criteria grid assessment
Peter W. Foltz and Mark Rosenstein
Tracking Student Learning in a State-Wide Implementation of Automated Writing Scoring
Jose P. Gonzalez-Brenes, Yun Huang and Peter Brusilovsky
FAST: Feature-Aware Student Knowledge Tracing
Fang Han, Kalyan Veeramachaneni and Una-May O’Reilly
Analyzing student behavior during problem solving in MOOCs
Mohammad Khajah, Rowan M. Wing, Robert V. Lindsey and Michael C. Mozer
Incorporating Latent Factors Into Knowledge Tracing To Predict Individual Differences In Learning
Robert V. Lindsey, Jeff D. Shroyer, Harold Pashler and Michael C. Mozer
Improving students’ long-term knowledge retention through personalized review
Yun-En Liu, Travis Mandel, Zoran Popovic and Emma Brunskill
Towards Automatic Experimentation of Educational Knowledge
Andras Lorincz, Gyongyver Molnar, Laszlo A. Jeni, Zoltan Toser, Attila Rausch and Jeffrey F. Cohn
Towards entertaining and efficient educational games
Travis Mandel, Yun-En Liu, Zoran Popovic, Sergey Levin and Emma Brunskill
Unbiased Offline Evaluation of Policy Representations for Educational Games
Sergiy Nesterko, Svetlana Dotsenko, Qiuyi Hu, Daniel Seaton, Justin Reich, Isaac Chuang, and Andrew Ho
Evaluating Geographic Data in MOOCs
Andy Nguyen, Christopher Piech, Jonathan Huang and Leonidas Guibas
Codewebs: Scalable Code Search for MOOCs
Zachary A. Pardos
Simulation study of a HMM based automatic resource recommendation system
Arti Ramesh, Dan Goldwasser, Bert Huang, Snigdha Chaturvedi, Hal Daume III and Lise Getoor
Modeling Learner Engagement in MOOCs using Probabilistic Soft Logic
Nihar B. Shah, Joseph K. Bradley, Abhay Parekh, Martin Wainwright and Kannan Ramchandran
A Case for Ordinal Peer Evaluation in MOOCs
Adish Singla, Ilija Bogunovic, Gabor Bartok, Amin Karbasi and Andreas Krause
On Actively Teaching the Crowd to Classify
Glenda S. Stump, Jennifer DeBoer, Jonathan Whittinghill and Lori Breslow
Development of a Framework to Classify MOOC Discussion Forum Posts: Methodology and Challenges
Weiyi Sun, Siwei Lyu, Hui Jin and Jianwei Zhang
Analyzing Online Learning Discourse using Probabilistic Topic Models
Joseph Jay Williams
Applying Cognitive Science to Online Learning
Joseph Jay Williams and Betsy Williams
Using Interventions to Improve Online Learning
Diyii Yang, Tanmay Sinha, David Adamson and Carolyn Penstein Rose
“Turn on, Tune in, Drop out”: Anticipating student dropouts in Massive Open Online Courses

Poster Session I

Yuxin Chen, Hiroaki Shioi, Cesar Antonio Fuentes Montesinos, Lian Pin Koh, Serge Wich, Andreas Krause.

Active Detection for Biodiversity Monitoring via Adaptive Submodularity.

Christopher R. Dance, Stephane Clinchant, Onno R. Zoeter.

Approximate Inference for a Non-Homogeneous Poisson Model of On-Street Parking. [pdf]

George Mathews, John Vial, Sanjeev Jha, Gregoire Mariethoz, Nickens Okello, Suhinthan Maheswararajah, Dom De Re, Michael Smith.

Bayesian Inference of the Hydraulic Properties of Deep Geological Formations.

Simon O’Callaghan, Alistair Reid, Lachlan McCalman, Edwin V. Bonilla, Fabio Ramos

Bayesian Joint Inversions for the Exploration and Characterization of Geothermal Targets. [pdf]

Jun Yu, Weng-Keen Wong, Steve Kelling.

Clustering Species Accumulation Curves to Identify Groups of Citizen Scientists with Similar Skill Levels. [pdf]

Kalyan Veeramachaneni, Teasha Feldman-Fitzthum, Una-May O’Reilly, Alfredo Cuesta-Infante.

Copula-Based Wind Resource Assessment. [pdf]

Danny Panknin, Tammo Krueger, Mikio Braun, Klaus-Robert Muller, Siegmund Duell.

Detecting changes in Wind Turbine Sensory Data. [pdf]

Shan Xue, Alan Fern, Daniel Sheldon.

Dynamic Resource Allocation for Optimizing Population Diffusion.

Nidhi Singh.

Green-Aware Workload Prediction for Non-stationary Environments.

Mingjun Zhong, Nigel Goddard, Charles Sutton.

Interleaved Factorial Non-Homogeneous Hidden Markov Models for Energy Disaggregation. [pdf]

Poster Session II

Tao Sun, Daniel Sheldon, Akshat Kumar.

Message Passing for Collective Graphical Models. [pdf]

Jun Yu, Rebecca A. Hutchinson, Weng-Keen Wong.

Modeling Misidentification of Bird Species by Citizen Scientists. [pdf]

Anna Ogawa, Akiko Takeda, Toru Namerikawa.

Photovoltaic Output Prediction Using Auto-regression with Support Vector Machine. [pdf]

Rebecca A. Hutchinson, Thomas G. Dietterich.

Posterior Regularization for Occupancy Models.

Xiaojian Wu , Daniel Sheldon, Shlomo Zilberstein.

Stochastic Network Design for River Networks. [pdf]

Daniel Urieli, Peter Stone.

TacTex’13- An Adaptive Champion Power Trading Agent.

Bingsheng Wang, Haili Dong, Chang-Tien Lu.

Using Step Variant Convolutional Neural Networks for Energy Disaggregation. [pdf]

Angela Fernandez, Carlos M. Alaiz, Ana M. Gonzalez, Julia Diaz, Jose R. Dorronsoro

Local Anisotropic Diffusion Detection of Wind Ramps. [pdf]

Mahsa Ghafrianzadeh, Claire Monteleoni.

Climate Prediction via Matrix Completion. [pdf]

8:20-8:40 Factorie
8:40-9:00 pySPACE
9:00 - 9:30 Coffee break
9:30 - 10:30 Demos (15 min for highlights)

ClowdFlows
EPAC
Hivemall
HLearn
Information Theoretical Estimators Toolbox
ISSE
MLlib
PRoNTo
Shogun
RLPy

AFTERNOON SESSION (3:30-6:30)

3:30-4:15 Invited speaker: Fernando Perez, ipython
4:15-4:35 scikit-learn
4:35-4:55 rOpenGov

Domain Adaptation as Learning with Auxiliary Information
Shai Ben-David, Ruth Urner

Sample Complexity of Sequential Multi-task Reinforcement Learning
Emma Brunskill, Lihong Li

Sequential Transfer in Multi-armed Bandit with Logarithmic Transfer Regret

Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

Class-wise Density-ratios for Covariate Shift
Yun-Qian Miao, Ahmed K. Farahat, Mohamed S. Kamel

Domain adaptation for sequence labeling using hidden Markov models
Edouard Grave, Guillaume Obozinski, Francis Bach

Retrieval of Experiments: Sequential Dirichlet Process Mixtures in Model Space
Ritabrata Dutta, Sohan Seth, Samuel Kaski

Multitask Learning with Feature Selection for Groups of Related Tasks
Meenakshi Mishra, Jun Huan

Restricted Transfer Learning for Text Categorization
Rajhans Samdani, Gideon Mann

Transform-based Domain Adaptation for Big Data
Erik Rodner, Judy Hoffman, Trevor Darrell, Jeff Donahue, Kate Saenko

A PAC-Bayesian bound for Lifelong Learning

Anastasia Pentina, Christoph H. Lampert

Multi-task Bilinear Classifiers for Visual Domain Adaptation
Jiaolong Xu, Sebastian Ramos, Xu Hu, David Vazquez, Antonio M. Lopez

Tree-Based Ensemble Multi-Task Learning Method for Classification and Regression

Jaak Simm, Ildefons Magrans de Abril, Masashi Sugiyama

Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer
Emilie Morvant

Multilinear Spectral Regularization for Kernel-based Multitask Learning
Marco Signoretto, Johan A.K. Suykens

Reinforcement Learning with Multi-Fidelity Simulators

Sameer Singh, Sebastian Riedel, and Andrew McCallum. Anytime belief propagation using sparse domains.

W00085459.jpg was taken on December 02, 2013 and received on Earth December 04, 2013. The camera was pointing toward SATURN at approximately 710,353 miles (1,143,202 kilometers) away, and the image was taken using the MT2 and CL2 filters. This image has not been validated or calibrated.

Image Credit: NASA/JPL/Space Science Institute

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Nuit Blanche Referenced in the Dead Tree World!

The Big Picture in Compressive Sensing was mentioned in an article of La Recherche, the french speaking equivalent/competitor to Science. October 2010 issue, page 20-21.
Wired Magazine had a piece on Compressed Sensing featuring links to this blog and the Big Picture. (March 1, 2010)
Emmanuel Candes and Terry Tao wrote about Nuit Blanche in the Dec. '08 issue of the IEEE Information Theory Society Newsletter
Xiaochuan Pan, Emil Sidky and Michael Vannier wrote about Nuit Blanche in Why do commercial CT scanners still employ traditional, filtered back-projection for image reconstruction?.
Check also the acknowledgments in this Ghost Imaging paper and this one.

Nuit Blanche

Page Views on Nuit Blanche since July 2010

Thursday, May 31, 2018

McKernel: A Library for Approximate Kernel Expansions in Log-linear Time - implementation -

Friday, July 29, 2016

Stochastic Frank-Wolfe Methods for Nonconvex Optimization

Thursday, August 13, 2015

Cuckoo Linear Algebra

Tuesday, June 16, 2015

Fast and Guaranteed Tensor Decomposition via Sketching

Saturday, September 20, 2014

Saturday Morning Videos: Random Functions for Dependence and Component Analysis, Randomized Nonlinear Component Analysis and the Automatic Statistician

Wednesday, September 10, 2014

Randomized Nonlinear Component Analysis - implementation -

Saturday, September 06, 2014

Saturday Morning Videos: Some ICML 2014 presentations.

Monday, August 18, 2014

Fastfood: Approximate Kernel Expansions in Loglinear Time - The Paper -

Tuesday, July 22, 2014

Context Aware Recommendation Systems ( Lei Tang, Xavier Amatriain)

Monday, July 21, 2014

Video Stream: GraphLab Conference 2014

Saturday, July 12, 2014

Saturday Morning Videos: Machine Learning Summer School Pittsburgh 2014, Muthu Muthukrishnan

Tuesday, January 14, 2014

How Close is Compressive Sensing to Random Features with Random Kitchen Sinks?

Thursday, December 05, 2013

#NIPS2013 papers, workshops ...

CONTRIBUTED TALKS

POSTERS

Accepted Papers

Accepted Papers

Poster Session I

Poster Session II

AFTERNOON SESSION (3:30-6:30)

Printfriendly