Nuit Blanche: gvnn: Neural Network Library for Geometric Computer Vision

Wednesday, July 27, 2016

gvnn: Neural Network Library for Geometric Computer Vision

I learned about Spatial transformers at a Deep Learning meetup in Paris from a presentation by Alban Desmaison (yes I sometimes do attend other awesome meetups). Spatial transformers are "aiming at boosting the geometric invariance of CNNs" Transformers are using transformations from computer vision and camera models to help deep neural networks learn better or faster. They do so by providing models behind generic invariances like rotation, translation etc.... It is a little bit unsettling since most defenders of deep neural architectures would rather add data than get help from vision models :-) Anyway, let's see this as another instance of the Great Convergence where Machine Learning and Computer graphics, the old computer vision paradigm and signal processing meet. Today, we have a library written in Torch that describes several transformation for inclusion in neural networks.

Let me just add that while these transformers do help neural networks in reducing the number of network coefficients, the remaining coefficients are probably summarizing many of the geometric invariances that we collectively have not yet discovered -this is a sense I got from a presentation by Stephane Mallat a while back when he mentioned the connection between deep neural networks and his scattering networks-.

Unrelated but related to the transformers endeavor:

gvnn: Neural Network Library for Geometric Computer Vision by Ankur Handa, Michael Bloesch, Viorica Patraucean, Simon Stent, John McCormac, Andrew Davison

We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be inserted within a neural network much in the spirit of the original spatial transformers and allow backpropagation to enable end-to-end learning of a network involving any domain knowledge in geometric computer vision. This opens up applications in learning invariance to 3D geometric transformation for place recognition, end-to-end visual odometry, depth estimation and unsupervised learning through warping with a parametric transformation for image reconstruction error.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !