While at NIPS2016, I talked to Chris about his poster on the second day of the Brains and Bit workshop (Saturday). The generic issue in Machine Learning is designing algorithms that can remember and preferably have a long term memory (LSTM). There is a steady storm of new architectures in that area and even a need to avoid or be robust to catastrophe forgetting. Yet, the generic issue is trying to figure out if there a way to evaluate that memory from the hyperparameters of the networks. Adam, Dong and Chris ask a somewhat similar question: can we figure out the connection between the size of the network and what the size of the signal it can remember ? They look at the problem from the standpoint of the information of the signal and how that number is connected to the size of the networks. They do this for a Linear ESN and they use the artillery of compressive sensing to show that the size of the networks more or less depends on the information content of the signal (not the size/dimensionality of the signal). They show this for sparse, then structured then low ranked signals. and even show the acid test/phase transition (figure above). Wow ! Definitely food for thought for the hardware side of things.
Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks by Adam Charles, Dong Yin, Christopher Rozell
Recurrent neural networks (RNNs) have drawn interest from machine learning researchers because of their effectiveness at preserving past inputs for time-varying data processing tasks. To understand the success and limitations of RNNs, it is critical that we advance our analysis of their fundamental memory properties. We focus on echo state networks (ESNs), which are RNNs with simple memoryless nodes and random connectivity. In most existing analyses, the short-term memory (STM) capacity results conclude that the ESN network size must scale linearly with the input size for unstructured inputs. The main contribution of this paper is to provide general results characterizing the STM capacity for linear ESNs with multidimensional input streams when the inputs have common low-dimensional structure: sparsity in a basis or significant statistical dependence between inputs. In both cases, we show that the number of nodes in the network must scale linearly with the information rate and poly-logarithmically with the ambient input dimension. The analysis relies on advanced applications of random matrix theory and results in explicit non-asymptotic bounds on the recovery error. Taken together, this analysis provides a significant step forward in our understanding of the STM properties in RNNs.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.