We focus on minimally supervised ("low-resource") and massively multilingual techniques in machine learning (ML) and natural language processing (NLP). We apply these methods to machine translation, speech recognition, lexicon induction, and historical linguistics. We are also the core of the Universal Morphology (UniMorph) project.
We are led by David Yarowsky, ACL Fellow and Treasurer, Professor of Computer Science, and member of the multi-departmental Center for Language and Speech Processing at Johns Hopkins University (JHU).
On campus? Visit us in Hackerman 226.
The paper is a tour de force in harnessing multiple on-line resources of varying quality.
I don’t know…you’re running Moses on languages it’s never seen before.
- Accepted at ACL 2019:
- Learning Morphosyntactic Analyzers from the Bible via Iterative Annotation Projection across 26 Languages by Garrett Nicolai and David Yarowsky;
- Meaning to Form: Measuring Systematicity as Information by Tiago Pimental, Arya D. McCarthy, Damian Blasi, Brian Roark and Ryan Cotterell
- Accepted at NAACL 2019: Massively Multilingual Adversarial Speech Recognition by Oliver Adams, Matthew Wiesner, Shinji Watanabe and David Yarowsky
- Cross-language information projection
- Cross-domain knowledge transfer
- Active learning and human computation
- Creative bootstrapping from multiple knowledge sources
- Translation discovery without aligned bilingual text
- Exploiting language universals and language family relationships
Natural Language Processing
- Inflectional and derivational morphology
- Word sense disambiguation
- Broad-coverage core NLP tools for 800+ world languages
- Biographic fact extraction
- Characterizing communicants
We're still adding earlier papers! For now, be sure to check Google Scholar.
(Student co-authors, including undergraduates. Bolded if David advised their dissertation or supervised their postdoc)
- Chris Kirov at Google
- Dylan Lewis
- Steven Shearing
- Ryan Newell
- Lawrence Wolf-Sonkin
- Patrick Xia
- John Hewitt
- John Sylak-Glassman at Mapbox
- Nidhi Vyas
- Sarah Mihuc
- Roger Que at Google
- Jin Yong Shin
- Ann Irvine
- Svitlana Volkova
- Mozhi Zhang
- Delip Rao at Amazon (Alexa)
- Elliot F. Drábek at Atreca
- Nikesh Garera at Treebo
- Charles Schafer at Google
- Gideon Mann, Head of Data Science at Bloomberg
- Silviu Cucerzan, Principal Research Manager at Microsoft Bing
- Richard Wicentowski, Chair of Computer Science at Swarthmore College
- Radu Florian at IBM Research
- Grace Ngai at Hong Kong Polytechnic University
- John Henderson at Mitre