Paraphrasing–communicating the same meaning with different surface forms–is one of the core characteristics of natural language and represents one of the greatest challenges faced by automatic language processing techniques. In this research, we investigate approaches to paraphrasing entire sentences within the constraints of a given task, which we call monolingual sentence rewriting. We focus on three representative tasks: sentence compression, text simplification, and grammatical error correction.
Monolingual rewriting can be thought of as translating between two types of English (such as from complex to simple), and therefore our approach is inspired by statistical machine translation. In machine translation, a large quantity of parallel data is necessary to model the transformations from input to output text. Parallel bilingual data naturally occurs between common language pairs (such as English and French), but for monolingual sentence rewriting, there is little existing parallel data, and annotation is costly. We modify the statistical machine translation pipeline to harness monolingual resources and insights into task constraints in order to drastically diminish the amount of annotated data necessary to train a robust system. Our method generates more meaning-preserving and grammatical sentences than earlier approaches and requires less task-specific data.
Courtney Napoles is a PhD candidate in the Computer Science Department and the Center for Language and Speech Processing at Johns Hopkins University, where she is co-advised by Chris Callison-Burch and Benjamin Van Durme. During her PhD, she interned at Educational Testing Service (ETS) and Yahoo Research. She is the recipient of an NSF Graduate Research Fellowship and holds a Bachelor’s degree in Psychology from Princeton University with a Certificate in Linguistics. Before graduate school, she edited non-fiction books for a trade publisher.