Compositional Models for Information Extraction

Mark Dredze, Johns Hopkins University

Relation extraction systems are the backbone of many end-user applications, including question answering and web search. They are also increasingly used in clinical text analysis with EHR data to advance goals in population health. Advances in machine learning have led to new neural models for learning effective representations directly from data. Yet for many tasks, years of research have created hand-engineered features that yield state of the art performance. This is the case in relation extraction, in which a system consumes natural language and produces a structured machine readable representation of relationships between entities, such as extracting medication references from clinical notes.

Speaker Biography

Mark Dredze is an Assistant Research Professor in Computer Science at Johns Hopkins University and a research scientist at the Human Language Technology Center of Excellence. He is also affiliated with the Center for Language and Speech Processing, the Center for Population Health Information Technology, and holds a secondary appointment in the Department of Health Sciences Informatics in the School of Medicine. He obtained his PhD from the University of Pennsylvania in 2009. Prof. Dredze has wide-ranging research interests developing machine learning models for natural language processing (NLP) applications. Within machine learning, he develops new methods for graphical models, deep neural networks, topic models and online learning, and has worked in a variety of learning settings, such as semi-supervised learning, transfer learning, domain adaptation and large-scale learning. Within NLP he focuses on information extraction but has considered a wide range of NLP tasks, including syntax, semantics, sentiment and spoke language processing. Beyond his work in core areas of computer science, Prof. Dredze has pioneered new applications of these technologies in public health informatics, including work with social media data, biomedical articles and clinical texts. He has published widely in health journals including the Journal of the American Medical Association (JAMA), the American Journal of Preventative Medicine (AJPM), Vaccine, and the Journal of the American Medical Informatics Association (JAMIA). His work is regularly covered by major media outlets, including NPR, the New York Times and CNN.