Emergent translation in multi-agent communication

Paper Summary

Posted by on Thursday, March 15, 2018 Tags: Summaries ML   2 minute read

Emergent translation in multi-agent communication

Details:

  • Authors: J. Lee, K. Cho, J. Weston, and D. Kiela
  • Link: Arxiv
  • Tags: Neural Machine Translation, Image Captioning, Unsupervised Learning
  • Year: 2017
  • Conference: ICLR 2018
  • Implementation: Official in PyTorch

Summary

Problem

Achieve translations between 2 languages from a un-aligned set of image captioning datasets in the two languages. Hence broadly in an unsupervised fashion concerning translation.

How they solve it?

  • The authors propose a 2-agent game that as a side product leads to a model that can translate between the given languages without using a parallel corpus. To achieve this, the agents are equipped with three modules each: an image encoder, a native speaker module, a foreign language encoder.
    • Native Speaker Module (NSM): This is an image captioning model, that is tasked with describing an image as well as Possible. It is a GRU that is fed the image as its first input, via, producing a hidden state that is then used to produce text via a fully connected layer and ST-Gumbel softmax sampling. Further unfolding produces the full description.
    • Foreign Langage Encoder (FLE): This is also an RNN, that can generate features of dimension D from a given text in The foreign language to the agent. This text is the description of the image as done by the other agent in its language.
    • Image Encoder (IE): This is a CNN that encodes the image into features of the same dimension D that FLE does. The aim is that these encodings and the ones that an agent gets with FLE are as close as possible.
  • The overall process is then as below. (_1 and _2 represent the agents)
    • Image –> NSM_1 –> FLE_2 –> Feature <– IE_2 <– (Set of images, one of them being the original image )
  • From this translation can be done as follows:
    • Text in Language 1 –> FLE_1 –> NSM_2 –> Text in Language 2
  • The different loss functions used are as follows:
    • A cross-entropy loss on NSM, this is same as in an image captioning model assessing the quality of the descriptions generated by the NSM * Cross Entropy loss on the inverse of the mean square distance between target image embedding and the embedding of the message passed by the other agent. See Eq 1.