

Title: From Co-Occurrence to Correspondence
Abstract:
While supervised learning methods for classification and structured prediction are very effective in many domains, they require detailed and precise labeling of large amounts of data. Weakly or ambiguously labeled data present major challenges as well as opportunities. For example, to build a machine translation system, we typically have large amounts of translated sentences to learn from, but without word or phrase level correspondence. Copious images and videos on the web or your harddrive are typically labeled with captions of who and what is in the picture, but not where and when. The challenges are both theoretical and algorithmic: under what assumptions can we guarantee effective and efficient learning of precise correspondence from pure
co-occurrence? I will describe our ongoing work on weakly supervised
learning approaches for machine translation and parsing of images, videos and text.