###
Short and Deep: Sketching and Neural Networks

This is a revised version of an earlier preprint.
Short and Deep: Sketching and Neural Networks by Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar
Data-independent methods for dimensionality reduction such as random projections,
sketches, and feature hashing have become increasingly popular in recent
years. These methods often seek to reduce dimensionality while preserving the
hypothesis class, resulting in inherent lower bounds on the size of projected data.
For example, preserving linear separability requires Ω(1/γ2
) dimensions, where γ
is the margin, and in the case of polynomial functions, the number of required dimensions
has an exponential dependence on the polynomial degree. Despite these
limitations, we show that the dimensionality can be reduced further while maintaining
performance guarantees, using improper learning with a slightly larger
hypothesis class. In particular, we show that any sparse polynomial function of a
sparse binary vector can be computed from a compact sketch by a single-layer neural
network, where the sketch size has a logarithmic dependence on the polynomial
degree. A practical consequence is that networks trained on sketched data are
compact, and therefore suitable for settings with memory and power constraints.
We empirically show that our approach leads to networks with fewer parameters
than related methods such as feature hashing, at equal or better performance.

**Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !**

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.
## 1 comment:

I noticed this paper entitled "Exponential Capacity in an Autoencoder Neural Network with a Hidden Layer."

https://arxiv.org/pdf/1705.07441.pdf

I would kinda guess the exponential capacity is because real value weights are used to encode the binary output. With finite precision arithmetic say, 16 bit half floats or 32 bit floats there probably is an optimal number of weights to sum together to get a result. After that you should likely use locality sensitive hashing to switch in different weight vectors. I also noticed recently that the new nVidia GPU chip offers a 120 Tflop half float matrix operation that might be suitable for random projections. If you were lucky you might get say 40 million 65536-point RPs per second out of it. That would be about 4000 times faster than I can get from my dual core CPU using SIMD.

Post a Comment