Crowdsourcing Annotation for Machine Learning in Natural Language Processing Tasks

Human annotators are critical for creating the necessary datasets to train statistical learning algorithms. However, there exist several limiting factors to creating large annotated datasets, such as annotation cost and limited access to qualified annotators. In recent years, researchers have investigated overcoming this data bottleneck by resorting to crowdsourcing, which is the delegation of a particular task to a large group of individuals rather than a single person, usually via an online marketplace.

This thesis is concerned with crowdsourcing annotation tasks that aid either the training, tuning, or evaluation of statistical learners, across a variety of tasks in natural language processing. The tasks reflect a spectrum of annotation complexity, from simple class label selection, through selecting textual segments from a document, to composing sentences from scratch. The annotation setups were novel as they involved new types of annotators, new types of tasks, new types of data, and new types of algorithms that can handle such data.

The thesis is divided into two main parts: the first part deals with text classification, and the second part deals with machine translation (MT).

The first part deals with two examples of the text classification task. The first is the identification of dialectal Arabic sentences and distinguishing them from standard Arabic sentences. We utilize crowdsourcing to create a large annotated dataset of Arabic sentences, which is used to train and evaluate language models for each Arabic variety. The second task is a sentiment analysis task, that of distinguishing positive movie reviews from negative ones. We introduce a new type of annotations called rationales, which complement the traditional class labels, and aid learning system parameters that generalize better to unseen data.

In the second part, we examine how crowdsourcing can be beneficial to machine translation. We start with the evaluation of MT systems, and show the potential of crowdsourcing to edit MT output. We also present a new MT evaluation metric, RYPT, that is based on human judgment, and well-suited for a crowdsourced setting. Finally, we demonstrate that crowdsourcing can be helpful in collecting translations to create a parallel dataset. We discuss a set of features that can help distinguish well-formed translations from those that are not, and we show that crowdsourcing translation yields results of near-professional quality at a fraction of the cost.

Throughout the thesis, we will be concerned with how we can ensure that collected data is of high quality, and we will employ a set of quality control measures for that purpose. Those methods will be helpful not only in detecting spammers and unfaithful annotators, but also those who are simply unable to perform the task properly, which is a more subtle form of undesired behavior.

Speaker Biography

Omar Zaidan completed his undergraduate studies at St. Lawrence University, graduating summa cum laude in 2004 with a B.Sc. in Computer Science (with Honors) and Mathematics (with Honors), and a minor in Chemistry. He then joined the Computer Science Ph.D. program at Johns Hopkins University, initially working under the supervision of Christian Scheideler, then became affiliated with the Center for Language and Speech Processing, working first with Jason Eisner and then Chris Callison-Burch, his thesis advisor.

At Hopkins, he was head T.A. for several courses, including Natural Language Processing and Artificial Intelligence; taught the department’s Introduction to Java summer offering in 2007; developed Z-MERT, an open source package for MT parameter tuning; created the Arabic Online Commentary dataset, consisting of over 50M words of Arabic reader commentary; and gained considerable experience with Amazon’s Mechanical Turk. He was also member of the organizing committees for the Workshops for Machine Translation (WMT) in 2010 and 2011, and the North East Student Colloquium on Artificial Intelligence (NESCAI) in 2010.

Omar has joined Microsoft Research as a senior software engineer.