From the paper:
Non-linear operators consisting of a redundant linear transformation followed by a point-wise non-linearity and a local pooling, are fundamental building blocks in deep convolutional networks. This is due to their capacity to generate local invariance while preserving discriminative information Le-Cun et al. (1998); Bruna & Mallat (2013).
Unsupervised Learning of Spatiotemporally Coherent Metrics by Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun
Current state-of-the-art object detection and recognition algorithms rely on supervised training, and most benchmark datasets contain only static images. In this work we study feature learning in the context of temporally coherent video data. We focus on training convolutional features on unlabeled video data, using only the assumption that adjacent video frames contain semantically similar information. This assumption is exploited to train a convolutional pooling auto-encoder regularized by slowness and sparsity. We show that nearest neighbors of a query frame in the learned feature space are more likely to be its temporal neighbors.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.