From the paper:
Non-linear operators consisting of a redundant linear transformation followed by a point-wise non-linearity and a local pooling, are fundamental building blocks in deep convolutional networks. This is due to their capacity to generate local invariance while preserving discriminative information Le-Cun et al. (1998); Bruna & Mallat (2013).
Unsupervised Learning of Spatiotemporally Coherent Metrics by Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun
Current state-of-the-art object detection and recognition algorithms rely on supervised training, and most benchmark datasets contain only static images. In this work we study feature learning in the context of temporally coherent video data. We focus on training convolutional features on unlabeled video data, using only the assumption that adjacent video frames contain semantically similar information. This assumption is exploited to train a convolutional pooling auto-encoder regularized by slowness and sparsity. We show that nearest neighbors of a query frame in the learned feature space are more likely to be its temporal neighbors.
No comments:
Post a Comment