Monday, May 30, 2016

Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks

Chris just sent me the following: 
Hi Igor-

In earlier communications you've been interested in the intersections of compressed sensing and neural computation.  We've just released a preprint that adds one more to the list: low rank factorization.  The preprint below extends our earlier work on determining the memory capacity or recurrent neural networks (RNNs).  RNNs are being used as layers in deep networks when they want to introduce some memory to handle time-varying inputs.

Using random matrix techniques, we can show the memory capacity bounds on one type of RNN (linear echo state networks) with multidimensional inputs.  Those inputs can be sparse in a basis, or have dependencies that result in a low-rank matrix (with no sparsity).  In either case, we show that the network size must scale linearly with the information rate in the data, resulting in networks that can be much smaller than the dimension of the input being remembered.

http://arxiv.org/abs/1605.08346



regards,
chris
 Thanks Chris . And all this with phase transition diagrams, woohoo !

Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks by  Adam Charles, Dong Yin, Christopher Rozell

Recurrent neural networks (RNNs) have drawn interest from machine learning researchers because of their effectiveness at preserving past inputs for time-varying data processing tasks. To understand the success and limitations of RNNs, it is critical that we advance our analysis of their fundamental memory properties. We focus on echo state networks (ESNs), which are RNNs with simple memoryless nodes and random connectivity. In most existing analyses, the short-term memory (STM) capacity results conclude that the ESN network size must scale linearly with the input size for unstructured inputs. The main contribution of this paper is to provide general results characterizing the STM capacity for linear ESNs with multidimensional input streams when the inputs have common low-dimensional structure: sparsity in a basis or significant statistical dependence between inputs. In both cases, we show that the number of nodes in the network must scale linearly with the information rate and poly-logarithmically with the ambient input dimension. The analysis relies on advanced applications of random matrix theory and results in explicit non-asymptotic bounds on the recovery error. Taken together, this analysis provides a significant step forward in our understanding of the STM properties in RNNs.
 
 Because somehow these phase diagrams are the great equalizers.
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, May 28, 2016

Saturday Morning Videos: International Conference on Learning Representations (ICLR) 2016, San Juan



Hugo mentioned it on his twitter feed, the videos of ICLRs are out. The papers are listed here.

Opening Remarks

06:43 Opening, Hugo Larochelle


Keynote Talks




Best Paper Awards


Lectures


Credits: NASA/JHUAPL/SwRI


Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday Morning Video: Proving them wrong, Space X First-stage landing from the Onoard Camera

After the success of the DCX, I still remember vividly a certain amount of annoyance at this Aerospace MIT professor who testified before Congress on his doubts about SSTO and related concepts (reusable TSTO). As in any other things in life, sometimes the best way to fight something is to simply prove them dead wrong. Evidence #4628,

Space X First-stage landing (congrats Damaris and Andrew)
 
 
 
 
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, May 27, 2016

A proposal for a NIPS Workshop or Symposium on "Mapping Machine Learning to Hardware"

So now that the deadline for the papers at NIPS is over, I am interested in organizing a workshop on how Machine Learning is being mapped to Hardware. If you are interested or know somebody who is potentially interested, please get in touch with me and we will improve on that proposal (and yes the title can change too). Here is the first draft:

Dear Colleagues,

Mapping Machine Learning to Hardware

With the recent successes of Deep Learning, we are beginning to see a set of new specialized hardware dedicated to making some of these computations faster, or energy efficient or both. These technologies use either CMOS (CPU, GPUs, FPGAs, ASICs) or exotic technologies (Bio, Memristors, Quantum Chips, Photonics, etc…) as they seek to address a specific trade-off in mapping Machine Learning algorithms to a specific hardware technology.


Conversely there has been quite an effort on the empirical side at devising deep network architectures that can handle binary coefficients so as to be efficiently implementable on low complexity hardware.


A somewhat related issue is the recent interest of the sensing community to map the first layers of sensing hardware with the first layers of models such as deep networks. This approach has the potential of changing the way image reconstruction and signal processing will be performed in the future.


This workshop will bring together researchers at the interface of machine learning, hardware implementation, sensing, physics and biology.


The goals of the workshop are
  • to present how machine learning computations and algorithms are mapped and improved as a result of new hardware technologies and architectures
  • to evaluate the different trade-offs currently investigated in these approaches
  • to understand how sensors may change as a result of this mapping to Machine Learning algorithms.
  • to evaluate the constraints put forth on recent deep learning architectures so as to reduce redundancy and enable a simpler mapping between computing hardware and models.
 
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Riemannian stochastic variance reduced gradient on Grassmann manifold - implementation -

 
 
 
Bamdev just sent me the following:
 
Dear Igor,

I wish to share our recent technical report on "Riemannian stochastic variance reduced gradient on Grassmann manifold", available at http://arxiv.org/abs/1605.07367. In this paper, we extend the Euclidean SVRG algorithm to compact Riemannian manifolds. The results are encouraging.

Additionally, the codes are available at https://bamdevmishra.com/codes/rsvrg/. We also provide a template file, Riemannian_svrg.m, that is compatible with the Manopt toolbox [1].


Regards,
Bamdev
[1] http://manopt.org

Thanks Bamdev ! Here is the paper: Riemannian stochastic variance reduced gradient on Grassmann manifold by Hiroyuki Kasai, Hiroyuki Sato, Bamdev Mishra

Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite, number of loss functions. In this paper, we propose a novel Riemannian extension of the Euclidean stochastic variance reduced gradient algorithm (R-SVRG) to a compact manifold search space. To this end, we show the developments on the Grassmann manifold. The key challenges of averaging, addition, and subtraction of multiple gradients are addressed with notions like logarithm mapping and parallel translation of vectors on the Grassmann manifold. We present a global convergence analysis of the proposed algorithm with decay step-sizes and a local convergence rate analysis under fixed step-size with some natural assumptions. The proposed algorithm is applied on a number of problems on the Grassmann manifold like principal components analysis, low-rank matrix completion, and the Karcher mean computation. In all these cases, the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm.

 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, May 26, 2016

Harvesting Nature: from Computational Imaging to Optical Computing

Today, at the 2016 SIAM Conference on Imaging Science in Albuquerque, Laurent, the CTO and Co-Founder of LightOn will be presenting at the Computational Imaging Systems minisymposium (9:30 AM-11:30 AM, Room:Alvarado Ballroom C).
Among the work Laurent will present, he will mention our earlier Proof of Concept (Random Projections through multiple optical scattering: Approximating kernels at the speed of light,)
I am currently thinking about submitting a workshop or symposium around Machine Learning and Hardware at NIPS. If you are interested or know a similar endeavor, please get in touch and let's make this happen.
This minisymposium emphasizes the interaction between sensing hardware and computational methods in computational imaging systems.  Novel sensing hardware together with algorithms that exploit the unique properties of the data acquired by this hardware has enabled the development of imaging systems with remarkable abilities that could not be achieved via traditional methods. This minisymposium brings together four leaders in the field of computational photography, and provides a cross-section of recent advances in this area.  

9:30-9:55 Macroscopic Fourier Ptychography, Oliver Cossairt, Northwestern University,
USA; Jason R. Holloway and Ashok Veeraraghavan, Rice University, USA; Manoj Sharma, Northwestern University, USA; Salman Asif, Rice University, USA; Nathan Matsuda, Northwestern University, USA; Roarke Horstmeyer, California Institute of
Technology, USA

10:00-10:25 Lensless Imaging Ashok Veeraraghavan, Rice University, USA
10:30-10:55 Photon-Efficient Reflectivity and Depth Imaging under Significant Ambient Light Vivek K. Goyal, Boston University, USA
11:00-11:25 Harvesting Nature: from Computational Imaging to Optical Computing
Laurent Daudet, Université Paris-Diderot, France
Thank you Mario for the invitation.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Tuesday, May 24, 2016

Compressively characterizing high-dimensional entangled states with complementary, random filtering

 How is compressive sensing changing the world ? Here is one way: it makes quantum mechanics simpler to understand: Compressively characterizing high-dimensional entangled states with complementary, random filtering by  Gregory A. Howland, Samuel H. Knarr, James Schneeloch, Daniel J. Lum, John C. Howell

The resources needed to conventionally characterize a quantum system are overwhelmingly large for high- dimensional systems. This obstacle may be overcome by abandoning traditional cornerstones of quantum measurement, such as general quantum states, strong projective measurement, and assumption-free characterization. Following this reasoning, we demonstrate an efficient technique for characterizing high-dimensional, spatial entanglement with one set of measurements. We recover sharp distributions with local, random filtering of the same ensemble in momentum followed by position---something the uncertainty principle forbids for projective measurements. Exploiting the expectation that entangled signals are highly correlated, we use fewer than 5,000 measurements to characterize a 65, 536-dimensional state. Finally, we use entropic inequalities to witness entanglement without a density matrix. Our method represents the sea change unfolding in quantum measurement where methods influenced by the information theory and signal-processing communities replace unscalable, brute-force techniques---a progression previously followed by classical sensing.
 
 Previously:
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thesis: Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks - implementation -


Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks by  Philipp Gysel

Convolutional neural networks (CNN) have achieved major breakthroughs in recent years. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex non-linear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. To enable embedded devices such as smartphones, Google glasses and monitoring cameras with the astonishing power of deep learning, dedicated hardware accelerators can be used to decrease both execution time and power consumption. In applications where fast connection to the cloud is not guaranteed or where privacy is important, computation needs to be done locally. Many hardware accelerators for deep neural networks have been proposed recently. A first important step of accelerator design is hardware-oriented approximation of deep networks, which enables energy-efficient inference. We present Ristretto, a fast and automated framework for CNN approximation. Ristretto simulates the hardware arithmetic of a custom hardware accelerator. The framework reduces the bit-width of network parameters and outputs of resource-intense layers, which reduces the chip area for multiplication units significantly. Alternatively, Ristretto can remove the need for multipliers altogether, resulting in an adder-only arithmetic. The tool fine-tunes trimmed networks to achieve high classification accuracy. Since training of deep neural networks can be time-consuming, Ristretto uses highly optimized routines which run on the GPU. This enables fast compression of any given network. Given a maximum tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8-bit. The code for Ristretto is available.
 
 
 The page for the Ristretto project is at: http://lepsucd.com/?page_id=621
From the page:
 
Ristretto is an automated CNN-approximation tool which condenses 32-bit floating point networks. Ristretto is an extention of Caffe and allows to test, train and finetune networks with limited numerical precision.

Ristretto In a Minute

  • Ristretto Tool: The Ristretto tool performs automatic network quantization and scoring, using different bit-widths for number representation, to find a good balance between compression rate and network accuracy.
  • Ristretto Layers: Ristretto reimplements Caffe-layers with quantized numbers.
  • Testing and Training: Thanks to Ristretto’s smooth integration into Caffe, network description files can be manually changed to quantize different layers. The bit-width used for different layers as well as other parameters can be set in the network’s prototxt file. This allows to directly test and train condensed networks, without any need of recompilation.

Approximation Schemes

Ristretto allows for three different quantization strategies to approximate Convolutional Neural Networks:
  • Dynamic Fixed Point: A modified fixed-point format with more flexibility.
  • Mini Floating Point: Bit-width reduced floating point numbers.
  • Power-of-two parameters: Layers with power-of-two parameters don’t need any multipliers, when implemented in hardware.

Documentation

Cite us

Our approximation framework was presented in an extended abstract at ICLR’16. Check out our poster. All results can be reproduced with our code on Github. If Ristretto helps your research project, please cite us:
  1. @article{gysel2016hardware,
  2. title={Hardware-oriented Approximation of Convolutional Neural Networks},
  3. author={Gysel, Philipp and Motamedi, Mohammad and Ghiasi, Soheil},
  4. journal={arXiv preprint arXiv:1604.03168},
  5. year={2016}
  6. }
 
 
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, May 23, 2016

The Shallow Side of Deep Learning


On his Twitter feed, Atlas mentioned three preprints: the first allows the building of deeper construction out of kernels (shallow methods). The next one shows how some deep nets can be considered as some sorts of ensemble of shallower nets. Finally, the last one makes the case that  dropout and stochastic depth are somehow equivalent to averaging over shallower networks.
 
  
In this paper, we propose a new image representation based on a multilayer kernel machine that performs end-to-end learning. Unlike traditional kernel methods, where the kernel is handcrafted or adapted to data in an unsupervised manner, we learn how to shape the kernel for a supervised prediction problem. We proceed by generalizing convolutional kernel networks, which originally provide unsupervised image representations, and we derive backpropagation rules to optimize model parameters. As a result, we obtain a new type of convolutional neural network with the following properties: (i) at each layer, learning filters is equivalent to optimizing a linear subspace in a reproducing kernel Hilbert space (RKHS), where we project data, (ii) the network may be learned with supervision or without, (iii) the model comes with a natural regularization function (the norm in the RKHS). We show that our method achieves reasonably competitive performance on some standard "deep learning" image classification datasets such as CIFAR-10 and SVHN, and also state-of-the-art results for image super-resolution, demonstrating the applicability of our approach to a large variety of image-related tasks.
 
 

Residual Networks are Exponential Ensembles of Relatively Shallow Networks by Andreas Veit, Michael Wilber, Serge Belongie

In this work, we introduce a novel interpretation of residual networks showing they are exponential ensembles. This observation is supported by a large-scale lesion study that demonstrates they behave just like ensembles at test time. Subsequently, we perform an analysis showing these ensembles mostly consist of networks that are each relatively shallow. For example, contrary to our expectations, most of the gradient in a residual network with 110 layers comes from an ensemble of very short networks, i.e., only 10-34 layers deep. This suggests that in addition to describing neural networks in terms of width and depth, there is a third dimension: multiplicity, the size of the implicit ensemble. Ultimately, residual networks do not resolve the vanishing gradient problem by preserving gradient flow throughout the entire depth of the network - rather, they avoid the problem simply by ensembling many short networks together. This insight reveals that depth is still an open research question and invites the exploration of the related notion of multiplicity.
 
 
Swapout: Learning an ensemble of deep architectures by  Saurabh Singh, Derek Hoiem, David Forsyth

We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive results on CIFAR-10 and CIFAR-100. Swapout samples from a rich set of architectures including dropout, stochastic depth and residual architectures as special cases. When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, similar to dropout, but also across network layers. We conjecture that swapout achieves strong regularization by implicitly tying the parameters across layers. When viewed as an ensemble training method, it samples a much richer set of architectures than existing methods such as dropout or stochastic depth. We propose a parameterization that reveals connections to exiting architectures and suggests a much richer set of architectures to be explored. We show that our formulation suggests an efficient training method and validate our conclusions on CIFAR-10 and CIFAR-100 matching state of the art accuracy. Remarkably, our 32 layer wider model performs similar to a 1001 layer ResNet model.
 

 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, May 21, 2016

Saturday Morning VideoS: Random Instances and Phase Transitions Workshop @ Simons Institute


May 2, 2016. Dawn LAMO Image 79 of Ceres.
Zadeni Crater, at 80 miles (128 kilometers) wide, is a prominent impact feature in the southern hemisphere of Ceres. This image from NASA's Dawn spacecraft shows terrain in Zadeni


The Random Instances and Phase Transitions workshop took place at the Simons Institute at the beginning of the month. The hashtag for this workshop was #SimonsRIPT.All the videos and attendant presentation can be found in the links below:



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Printfriendly