Benjamin Van Durme :: vandurme@cs.jhu.edu

Johns Hopkins University
Assistant Research Professor Department of Computer Science
Center for Language and Speech Processing
Assistant Research Professor Department of Cognitive Science
Research Scientist Human Language Technology Center of Excellence


My work tends in the areas of natural language processing (specifically computational semantics) and streaming/randomized algorithms.

PhD students Courtney Napoles (educational NLP), Xuchen Yao (question answering), Svitlana Volkova (social media), Frank Ferraro (semantics), Keith Levin (algorithms, speech), Rachel Rudinger (semantics), Chandler May (algorithms), Pushpendre Rastogi (semantics), and Keisuke Sakaguchi (educational NLP).
Current post-docs Charley Beller
Previous post-docs Margaret Mitchell, now at MSR [photo]
Shane Bergsma, (with David Yarowsky, Ken Church), now at Saskatchewan.

Kyle Rawlins and I are teaching Event Semantics in Fall 2013.

Max Thomas and I are consolidating local development efforts under an HLTCOE account on GitHub; more things will be moving there over time.

I lead the Text efforts at HLTCOE, which includes social media (partial photo) and knowledge population from text. Our work is especially concerned with exploring methods that scale, and are robust across languages and domains. Some of my collaborators include David Yarowsky, Glen Coppersmith, Matthew Post, and Mark Dredze.

I collaborate strongly with Chris Callison-Burch, under the larger group heading: "Textual Choreography (TeCho)" (photo), a confederation of students doing research in text-mungy things, such as Text2Text rewriting, paraphrase acquisition, question answering, cross document co-reference, semantic role labeling, etc. Chris has recently moved to Penn, but we continue to co-advise a handful of students, and remain engaged on various projects.

Ashwin Lall and I have collaborated on algorithms for handling large quantities of (streaming) data, furthered by early discussions with Miles Osborne and David Talbot. Lately I've worked with Aren Jansen to create exceptionally scalable tools for searching raw speech data, as part of the Zero Resource effort at the COE. Vladimir Braverman, of JHU CS, is a randomized algorithm theorist I talk to regularly. Much of these efforts are bundled into the Jerboa package.

Further collaborators include: Jason Eisner, and students including Matt Gormley, Juri Ganitkevitch, Travis Wolfe.

My thesis was joint in Computer Science and Linguistics, titled: Extracting Implicit Knowledge from Text. The committee consisted of Len Schubert and Dan Gildea of Rochester Computer Science, Greg Carlson of Rochester Linguistics, and William Cohen of the Machine Learning Department at CMU. This work fell under the KNEXT project, aimed at extracting commonsense knowledge from text. I spent two Summers while a PhD student as an intern at Google Research, working with Marius Pasca on knowledge extraction from search engine query logs. Prior to Rochester I was a student in the Language Technologies Institute at CMU, working with Eric Nyberg and colleagues on Question Answering (which eventually helped lead to IBM's Watson). Before that I worked as a research engineer in the AI Division of the Advanced Technology Laboratory of Lockheed Martin.


user: va nd ur me domain: cs.jhu.edu


my last name in bibtex is: {Van Durme}


---------- 2014 ----------

Matthew R. Gormley, Margaret Mitchell, Benjamin Van Durme and Mark Dredze. Low-Resource Semantic Role Labeling. ACL. 2014.

Xuchen Yao and Benjamin Van Durme. Information Extraction over Structured Data: Question Answering with Freebase. ACL. 2014.

Svitlana Volkova, Glen Coppersmith and Benjamin Van Durme. Inferring User Political Preferences from Streaming Communications. ACL. 2014.

Jennifer Drexler, Pushpendre Rastogi, Jacqueline Aguilar, Benjamin Van Durme and Matt Post. A Wikipedia-based Corpus for Contextualized Machine Translation. LREC. 2014.

---------- 2013 ----------

Francis Ferraro, Benjamin Van Durme and Yanif Ahmad. Evaluating Progress in Probabilistic Programming through Topic Models. NIPS Workshop on Topic Models: Computation, Application, and Evaluation. 2013.

Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch and Peter Clark. Semi-Markov Phrase-based Monolingual Alignment. EMNLP. 2013. [pdf] [demo] [code]

Margaret Mitchell, Jacqui Aguilar, Theresa Wilson and Benjamin Van Durme. Open Domain Targeted Sentiment. EMNLP. 2013. [pdf]

Jonathan Gordon and Benjamin Van Durme. Reporting Bias and Knowledge Extraction. Automated Knowledge Base Construction (AKBC) 2013: The 3rd Workshop on Knowledge Extraction, at CIKM 2013. (Best Paper) [pdf]

Shane Bergsma and Benjamin Van Durme. Using Conceptual Class Attributes to Characterize Social Media Users. ACL. 2013. [pdf]

Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch and Peter Clark. A Lightweight and High Performance Monolingual Word Aligner. ACL Short. 2013. [pdf] [demo] [code]

Xuchen Yao, Benjamin Van Durme and Peter Clark. Automatic Coupling of Answer Extraction and Information Retrieval. ACL Short. 2013. [pdf]

Travis Wolfe, Benjamin Van Durme, Mark Dredze, Nicholas Andrews, Charley Beller, Chris Callison-Burch, Jay DeYoung, Justin Snyder, Jonathan Weese, Tan Xu, and Xuchen Yao. PARMA: A Predicate Argument Aligner. ACL Short. 2013. [pdf] [code]

Xuchen Yao, Benjamin Van Durme, Peter Clark and Chris Callison-Burch. Answer Extraction as Sequence Tagging with Tree Edit Distance. NAACL. 2013. [pdf] [code]

Shane Bergsma, Mark Dredze, Benjamin Van Durme, Theresa Wilson, and David Yarowsky. Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter. NAACL. 2013. [pdf] [data]

Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch. PPDB: The Paraphrase Database. NAACL Short. 2013. [pdf] [data]

David Etter, Francis Ferraro, Ryan Cotterell, Olivia Buzek, and Benjamin Van Durme. Nerit: Named Entity Recognition for Informal Text. Technical Report 11. Human Language Technology Center of Excellence, Johns Hopkins University. July, 2013. [pdf]

---------- 2012 ----------

Benjamin Van Durme. Streaming Analysis of Discourse Participants. EMNLP. 2012. [pdf]

Aren Jansen, Benjamin Van Durme, and Pascal Clark. The JHU-HLTCOE Spoken Web Search System for MediaEval 2012. Proceedings of the MediaEval 2012 Workshop. [pdf]

Aren Jansen and Benjamin Van Durme. Indexing Raw Acoustic Features for Scalable Zero Resource Search. InterSpeech. 2012. [pdf]

Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch. Monolingual Distributional Similarity for Text-to-Text Generation. STARSEM. 2012. [pdf]

Matthew R. Gormley, Mark Dredze, Benjamin Van Durme, and Jason Eisner. Shared Components Topic Models. NAACL. 2012. [pdf]

Xuchen Yao, Benjamin Van Durme and Chris Callison-Burch. Expectations of Word Sense in Parallel Corpora. NAACL Short. 2012. [pdf]

Brian Kjersten and Benjamin Van Durme. Space Efficiencies in Discourse Modeling via Conditional Random Sampling. NAACL Short. 2012. [pdf]

Courtney Napoles, Matthew R. Gormley and Benjamin Van Durme. Annotated Gigaword. NAACL Workshop: AKBC-WEKEX. 2012. [pdf]

Francis Ferraro, Matt Post and Benjamin Van Durme. Judging Grammaticality with Count-Induced Tree Substitution Grammars. NAACL Workshop: BEA. 2012. [pdf]

Francis Ferraro, Benjamin Van Durme and Matt Post. Toward Tree Substitution Grammars with Latent Annotations. NAACL Workshop: Inducing Linguistic Structure. 2012. [pdf]

Benjamin Van Durme. Jerboa: A Toolkit for Randomized and Streaming Algorithms. Technical Report 7, Human Language Technology Center of Excellence, Johns Hopkins University. 2012. [pdf] [git]

Prabhakaran et al. Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing. ACL Workshop: ExProM. 2012. [pdf]

---------- 2011 ----------

Matthew R. Gormley, Mark Dredze, Benjamin Van Durme, and Jason Eisner. Shared Components Topic Models with Application to Selectional Preference. NIPS Workshop: Learning Semantics. 2011.

Aren Jansen and Benjamin Van Durme. Efficient Spoken Term Discovery using Randomized Algorithms. ASRU. 2011. [pdf]

Tsz Ping Chan, Chris Callison-Burch and Benjamin Van Durme. Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity. EMNLP Workshop: GEMS. 2011. [pdf] [slides]

Juri Ganitkevitch, Chris Callison-Burch, Courtney Napoles and Benjamin Van Durme. Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation. EMNLP. 2011. [pdf]

Xuchen Yao and Benjamin Van Durme. Nonparametric Bayesian Word Sense Induction. ACL Workshop: Textgraphs. 2011. [pdf]

Courtney Napoles, Chris Callison-Burch, Juri Ganitkevitch and Benjamin Van Durme. Paraphrastic Sentence Compression with a Character-based Metric: Tightening without Deletion. ACL Workshop on Monolingual Text-To-Text Generation. 2011. [pdf]

Courtney Napoles, Benjamin Van Durme and Chris Callison-Burch. Evaluating sentence compression: Pitfalls and suggested remedies. ACL Workshop on Monolingual Text-To-Text Generation. 2011. [pdf]

Byung Gyu Ahn, Chris Callison-Burch and Benjamin Van Durme. WikiTopics: What is popular on Wikipedia and why. ACL Workshop on Automatic Summarization for Different Genres, Media, and Languages. 2011. [pdf]

Benjamin Van Durme and Ashwin Lall. Efficient Online Locality Sensitive Hashing via Reservoir Counting. ACL Short. 2011. [pdf] [bib] [slides]

Shane Bergsma and Benjamin Van Durme. Learning Bilingual Lexicons using the Visual Similarity of Labeled Web Images. IJCAI. 2011. [pdf] [bib]

Azevedo et al. Reports of the AAAI 2010 Fall Symposia. AI Magazine. Spring 2011. [pdf]

---------- 2010 ----------

Benjamin Van Durme. Extracting Implicit Knowledge from Text. PhD Thesis. University of Rochester. 2010. [link]

Lenhart K. Schubert, Benjamin Van Durme, and Marzieh Bazrafshan. Entailment Inference in a Natural Logic-like General Reasoner. AAAI Fall Symposium on Commonsense Knowledge (CSK10). 2010. [pdf] [bib]

Benjamin Van Durme and Ashwin Lall. Online Generation of Locality Sensitive Hash Signatures. ACL Short. 2010. [pdf] [bib] [slides]

Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. Learning from the Web: Extracting General World Knowledge from Noisy Text. AAAI Workshop on Collaboratively-built Knowledge Sources and Artificial Intelligence. 2010. [pdf]

Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. Evaluation of Commonsense Knowledge with Mechanical Turk. NAACL Workshop on Amazon Mechanical Turk. 2010. [pdf]

---------- 2009 ----------

Benjamin Van Durme and Ashwin Lall. Streaming Pointwise Mutual Information. NIPS. 2009. [pdf] [bib]

Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. Weblogs as a Source for Extracting General World Knowledge. K-CAP. 2009. [pdf]

Benjamin Van Durme and Ashwin Lall. Probabilistic Counting with Randomized Storage. IJCAI. 2009. [pdf] [slides]

Benjamin Van Durme and Daniel Gildea. Topic Models for Corpus-centric Knowledge Generalization. Technical Report TR-946, Department of Computer Science, University of Rochester, Rochester, NY 14627, June 2009. [pdf]

Ting Qian, Benjamin Van Durme and Lenhart K. Schubert. Building a Semantic Lexicon of English Nouns via Bootstrapping. NAACL Student Research Workshop. 2009.

Benjamin Van Durme, Phillip Michalak and Lenhart K. Schubert. Deriving Generalized Knowledge from Corpora using WordNet Abstraction. EACL. 2009. [pdf] [bib]

Benjamin Van Durme, Austin Frank and T. Florian Jaeger. Comparing Sources of Corpus Frequency Information. The 22nd Annual Meeting of the CUNY Conference on Human Sentence Processing (CUNY-09). 2009.

---------- 2008 ----------

Lenhart K. Schubert and Benjamin Van Durme. Open Extraction of General Knowledge through Compositional Semantics. NSF Symposium on Semantic Knowledge Discovery, Organization and Use. 2008. [pdf]

Benjamin Van Durme and Lenhart K. Schubert. Open Knowledge Extraction through Compositional Language Processing. Symposium on Semantics in Systems for Text Processing (STEP). 2008. [pdf]

Benjamin Van Durme, Ting Qian and Lenhart K. Schubert. Class-Driven Attribute Extraction. COLING. 2008. [pdf]

Austin F. Frank, Celeste Kidd, Matthew Post, Benjamin Van Durme and T. Florian Jaeger. The Web as a Psycholinguistic Resource. Presented at the 5th International Workshop on Language Production. Annapolis, MD. July 28-30, 2008.

Benjamin Van Durme and Marius Pasca. Finding Cars, Goddesses and Enzymes: Parametrizable Acquisition of Labeled Instances for Open-Domain Information Extraction. AAAI. 2008. [pdf]

Benjamin Van Durme. Notes on the Acquisition of Conditional Knowledge. Technical Report TR-937, Department of Computer Science, University of Rochester, Rochester, NY 14627. June 2008. [pdf]

Dekang Lin, Shaojun Zhao, Benjamin Van Durme and Marius Pasca. Mining Parenthetical Translations from the Web by Word Alignment. The 46th Annual Meeting of the Association of Computational Linguistics: Human Language Technologies (ACL-08: HLT). Columbus, Ohio, USA. June 15-20, 2008. [pdf]

Marius Pasca and Benjamin Van Durme. Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs. ACL. 2008. [pdf]

---------- 2007 ----------

Marius Pasca, Benjamin Van Durme, Nikesh Garera. The Role of Documents vs. Queries in Extracting Class Attributes from Text . ACM Sixteenth Conference on Information and Knowledge Management (CIKM 2007). Lisboa, Portugal. November 6-9, 2007. [pdf]

Marius Pasca and Benjamin Van Durme. What You Seek is What You Get: Extraction of Class Attributes from Query Logs. Hyderabad, India, February 2007. International Joint Conference on Artificial Intelligence (IJCAI-07). [pdf]

---------- 2004 ----------

Anna Kupsc, Teruko Mitamura, Benjamin Van Durme, Eric Nyberg. Pronominal Anaphora Resolution for Unrestricted Text. Lisbon, Portugal, May 24-30 2004. LREC.

---------- 2003 ----------

Benjamin Van Durme, Yifen Huang, Anna Kupsc, Eric Nyberg. Towards Light Semantic Processing for Question Answering. Edmonton, Canada, May 31 2003. HLT/NAACL Workshop on Text Meaning. [pdf]

E. Nyberg, T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K. Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J. Ko, A. Kupsc, L. Lita, V. Pedro, D. Svoboda and B. Van Durme. The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategy Approach with Dynamic Planning. 12th Text REtrieval Conference, November 2003.