Human language artifacts represent a plentiful source of rich, unstructured information created by reporters, scientists, and analysts. In this thesis we provide approaches for adding structure: extracting and linking entities, events, and relationships from a collection of documents about a common topic. We pursue this linking at two levels of abstraction. At the document level we propose models for aligning the entities and events described in coherent and related discourses: these models are useful for deduplicating repeated claims, finding implicit arguments to events, and measuring semantic overlap between documents. Then at a higher level of abstraction, we construct and employ knowledge graphs containing salient entities and relations linked to supporting documents: these graphs can be augmented with facts and summaries to give users a structured understanding of the information in a large collection.
Travis Wolfe is a Ph.D. candidate in Computer Science at Johns Hopkins University advised by Mark Dredze and Benjamin Van Durme. He obtained a B.S. in Statistics and Information Systems at Carnegie Mellon University in 2011 and a M.S. in Computer Science from Johns Hopkins University in 2014. His work focuses on information extraction and machine learning for natural language processing.