Dídac Surís*, Adrià Recasens*, David Bau, David Harwath, James Glass, Antonio Torralba
Massachusetts Institute of Technology

In this paper, we propose a framework for learning through drawing. Our goal is to learn the correspondence between spoken words and abstract visual attributes, from a dataset of spoken descriptions of images. We use the learned representations of GANs and manipulate them to edit semantic concepts in the generated outputs, and use such GAN-generated images to train a model using a triplet loss.
Download our paper


Dídac Surís*, Adrià Recasens*, David Bau, David Harwath, James Glass and Antonio Torralba
Computer Vision and Pattern Recognition (CVPR), 2019


Acknowledgements

Funding for this research was partially supported by Toyota Research Institute.