We focus on minimally supervised ("low-resource") and massively multilingual techniques in machine learning (ML) and natural language processing (NLP). We apply these methods to machine translation, speech recognition, lexicon induction, and historical linguistics. We are also the core of the Universal Morphology (UniMorph) project.
We are led by David Yarowsky, ACL Fellow and Treasurer, Professor of Computer Science, and member of the multi-departmental Center for Language and Speech Processing at Johns Hopkins University (JHU), who is also affiliated with the Human Language Technology Center of Excellence.
On campus? Visit us in Hackerman 226.
The paper is a tour de force in harnessing multiple on-line resources of varying quality.
I don’t know…you’re running it on languages it’s never seen before.
- Accepted at CoNLL 2019: Weird Inflects but OK: Making Sense of Morphological Generation Errors by Kyle Gorman, Arya D. McCarthy, Ryan Cotterell, Ekaterina Vylomova, Miikka Silfverberg and Magdalena Markowska
- Accepted at EMNLP-IJCNLP 2019: Modeling Color Terminology Across Thousands of Languages by Arya D. McCarthy, Winston Wu, Aaron Mueller, Bill Watson, and David Yarowsky
- Accepted at EMNLP-IJCNLP 2019: Quantity doesn’t buy quality syntax with neural language models by Marten van Schijndel, Aaron Mueller, and Tal Linzen
- We're all over the place this summer! David's students have internships at Facebook AI, BBN, and Pacific Northwest National Labs.
- Cross-language information projection
- Cross-domain knowledge transfer
- Active learning and human computation
- Creative bootstrapping from multiple knowledge sources
- Translation discovery without aligned bilingual text
- Exploiting language universals and language family relationships
Natural Language Processing
- Inflectional and derivational morphology
- Word sense disambiguation
- Broad-coverage core NLP tools for 800+ world languages
- Biographic fact extraction
- Characterizing communicants
We're still adding earlier papers! For now, be sure to check Google Scholar.
- Trevor Lee (going to Booz Allen Hamilton)
(Student co-authors, including undergraduates. Bolded if David advised their dissertation or supervised their postdoc)
- Chris Kirov at Google
- Dylan Lewis
- Steven Shearing
- Ryan Newell at Amazon
- Lawrence Wolf-Sonkin at Google
- Patrick Xia
- John Hewitt, now PhD student at Stanford
- John Sylak-Glassman at Mapbox
- Nidhi Vyas at Apple
- Sarah Mihuc
- Roger Que at Google
- Jin Yong Shin
- Ann Irvine, Head of Data Science at Arceo
- Svitlana Volkova, Senior Research Scientist at Pacific Northwest National Labs
- Mozhi Zhang
- Delip Rao at Amazon (Alexa)
- Elliot F. Drábek at Atreca
- Nikesh Garera at Treebo
- Shane Bergsma
- Charles Schafer at Google
- Gideon Mann, Head of Data Science at Bloomberg
- Silviu Cucerzan, Principal Research Manager at Microsoft Bing
- Richard Wicentowski, Chair of Computer Science at Swarthmore College
- Radu Florian at IBM Research
- Grace Ngai at Hong Kong Polytechnic University
- John Henderson at Mitre