Nuit Blanche: Ce soir: Paris Machine Learning meetup Hors Série 1 (Season 5) Self-Supervised Imitation, Pierre Sermanet

Thursday, January 04, 2018

Ce soir: Paris Machine Learning meetup Hors Série 1 (Season 5) Self-Supervised Imitation, Pierre Sermanet

Video streaming starts at 6:30PM Paris time.

Pierre Sermanet is our guest for the first Paris Machine Learning Hors Série of the season. The talk should be about 30-45 minutes long. Thanks to Meritis for hosting this meetup and providing the food and drinks afterwards. Doors open 6:15PM, talk will go from 6:30-7:15PM and will have food and drinks from 7:15 till 8:15PM Paris time.

Title: Self-Supervised Imitation (presentation slide are 400MB large)

Abstract: We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints. We study how these representations can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a triplet loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitate