Thursday, October 01, 2015

Thesis: Unsupervised Feature Learning in Computer Vision by Rostislav Goroshin

Much of computer vision has been devoted to the question of representation through feature extraction. Ideal features transform raw pixel intensity values to a representation in which common problems such as object identification, tracking, and segmentation are easier to solve. Recently, deep feature hierarchies have proven to be immensely successful at solving many problems in computer vision. In the supervised setting, these hierarchies are trained to solve specific problems by minimizing an objective function of the data and problem specific label information. Recent findings suggest that despite being trained on a specific task, the learned features can be transferred across multiple visual tasks. These findings suggests that there exists a generically useful feature representation for natural visual data. his work aims to uncover the principles that lead to these generic feature representations in the unsupervised setting, which does not require problem specific label information. We begin by reviewing relevant prior work, particularly the literature on auto-encoder networks and energy based learning. We introduce a new regularizer for auto-encoders that plays an analogous role to the partition function in probabilistic graphical models. Next we explore the role of specialized encoder architectures for sparse inference. The remainder of the thesis explores visual feature learning from video. We establish a connection between slow-feature learning and metric learning, and exper- imentally demonstrate that semantically coherent metrics can be learned from natural videos. Finally, we posit that useful features linearize natural image transformations in video. To this end, we introduce a new architecture and loss for training deep feature hierarchies that linearize the transformations observed in unlabeled natural video sequences by learning to predict future frames in the presence of uncertainty.

From Ross'recent work

[1]*New* Learning to Linearize Under Uncertainty
Ross Goroshin, Michael Mathieu, Yann LeCun
[2] Unsupervised Learning of Spatiotemporally Coherent Metrics
Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun
[3] Unsupervised Feature Learning from Temporal Data
Ross Goroshin, Joan Bruna, Arthur Szlam, Jonathan Tompson, David Eigen, Yann LeCun, NIPS 2014 Deep Learning Workshop, Montreal, QC and ICLR 2015 Workshop, San Diego, CA
[4] Efficient Object Localization Using Convolutional Networks
Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, Chris Bregler
[5] Saturating Auto-Encoders
Rostislav Goroshin and Yann LeCun, International Conference on Learning Representations (ICLR 2013), Scottsdale, AZ
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

1 comment:

Sergio Pissanetzky said...

"These findings suggest that there exists a generically useful feature representation for natural visual data."
Of course it does, and it is the causal set. And of course the hierarchies of features are directly derived without any need for training. However, it's not just for `natural visual data', it is universal for all information.