Goals: Machine translation systems are built by automatically extracting patterns from a database of sentence-translation pairs. Sometimes, it is challenging to do so without the help of a bilingual dictionary. For example, suppose we have these pairs of Chinese sentences and their English translation:
From Sentences 1 and 2 we can infer that "wo" in Chinese means "I" in English because these two words always seem to occur in the same sentence-translation pairs. Similarly, since "xihuan" and "like" always seem to appear in the same pairs, we can infer that they are translations. And then by process of elimination, we can assume "pingguo" translates to "apples" in Sentence 1. Once the machine learns these translation equivalents, then it can translate new sentences. This is the main way in which machine translation systems are developed these days.
However, there are cases like "caomei" and "xiangjiao" in Sentence 2 where it is not clear which one refers to "strawberries" and which one refers to "bananas". The database doesn't have enough examples to help us find the pattern. In such case, it would be helpful to have a bilingual dictionary. The goal of this annotation project is to produce word alignments on some standard databases of sentence-translation pairs. These vetted word alignments will then help us construct a trustworthy bilingual dictionary, which we can then use to support our machine translation research.
To get started: