Robust Text Correction for Grammar and Fluency

Robustness has always been a desirable property for natural language processing (NLP). In many cases, NLP models (e.g., parsing) and downstream applications (e.g., machine translation) perform poorly when the input contains noise such as spelling errors, grammatical errors, and disfluency. In this thesis, I present robust error correction models for language learners’ texts at different levels of granularity: character, token, and sentence-level errors.

Character and token-level errors in language learners’ writing are related to grammar, and NLP community has focused on these error types for a long time. I review prior work (particularly focusing in the last ten years) on grammatical error correction (GEC), and present new models for character and token-level error correction. For character level, I introduce a semi-character recurrent neural network, which is motivated by a finding in Psycholinguistics, called Cmabrigde Uinervtisy (Cambridge University) effect. For word-level robustness, I propose an error-repair dependency parsing algorithm for ungrammatical texts. The algorithm can parse sentences and correct grammatical errors simultaneously.

NLP community has also extended the scope of errors to phase and sentence-level errors, where fluency comes into play as an important notion. This extension of the scope and the notion of fluency bring us new challenges for GEC such as evaluation metrics and data annotation. After I introduce a method for efficient human judgment collection using bayesian online updates, I present a new annotation scheme and dataset for sentence-level GEC, followed by a neural encoder-decoder GEC model that directly optimizes toward a metric to avoid exposure bias.

Finally, I conclude the thesis and outline ideas and suggestions for future research in GEC.

Speaker Biography

Keisuke Sakaguchi is a Ph.D. candidate advised by Benjamin Van Durme and Matt Post in the department of Computer Science, Center of Language and Speech Processing at Johns Hopkins University. His research focuses upon robust natural language processing (NLP) for ungrammatical noisy texts and NLP for educational purposes. He has received an Outstanding Paper Award at ACL 2017. He received his M.Eng. in Information Science at Nara Institute of Science and Technology, M.A. in Psycho&Neurolinguistics at University of Essex, and B.A. in Literature (major in Philosophy) at Waseda University.