Benjamin Van Durme :: vandurme@cs.jhu.edu

Johns Hopkins University
Assistant Research Professor Department of Computer Science
Center for Language and Speech Processing
Assistant Research Professor Department of Cognitive Science
Research Scientist Human Language Technology Center of Excellence


keywords: natural language processing, semantics, streaming data, randomized algorithms, language models, author attributes, question answering, paraphrases, information extraction, ... .

(co-)Advisees Courtney Napoles (eduNLP), Svitlana Volkova (social media), Frank Ferraro (semantics), Keith Levin (algorithms, speech), Rachel Rudinger (semantics), Chandler May (algorithms), Pushpendre Rastogi (semantics), Keisuke Sakaguchi (eduNLP), and Tongfei Chen.
Student collaborators Matt Gormley (ML, information extraction), Juri Ganitkevitch (MT, paraphrasing), and Travis Wolfe (ML, information extraction).
Previous students Xuchen Yao (question answering), now at AI2 Incubator Program
Previous post-docs: Charley Beller (semantics, social media), now at IBM
Margaret Mitchell (IE), now at MSR [photo]
Shane Bergsma, (collaborated w/) now at Saskatchewan.

Some of my collaborators at the JHU HLTCOE include David Yarowsky, Jason Eisner, Glen Coppersmith, Matthew Post, James Mayfield, Paul McNamee, Craig Harman, Max Thomas, Jacquiline Aguilar and Mark Dredze.

Chris Callison-Burch and I work together with students on paraphrastic-type things (despite his move all the way from JHU to UPenn).

Ashwin Lall, Miles Osborne and I collaborate on algorithms for handling large quantities of (streaming) data. Lately I've applied this thinking to work with Aren Jansen and Keith Levin to create exceptionally scalable tools for searching raw speech data, as part of the Zero Resource effort at the COE. Vladimir Braverman, of JHU CS, is a randomized algorithm theorist I (try to) talk to regularly. Much of these efforts are bundled into the Jerboa package.

My thesis was joint in Computer Science and Linguistics, titled: Extracting Implicit Knowledge from Text. The committee consisted of Len Schubert and Dan Gildea of Rochester Computer Science, Greg Carlson of Rochester Linguistics, and William Cohen of the Machine Learning Department at CMU. This work fell under the KNEXT project, aimed at extracting commonsense knowledge from text. I spent two Summers while a PhD student as an intern at Google Research, working with Marius Pasca on knowledge extraction from search engine query logs. Prior to Rochester I was a student in the Language Technologies Institute at CMU, working with Eric Nyberg and colleagues on Question Answering (which eventually helped lead to IBM's Watson). Before that I worked as a research engineer in the AI Division of the Advanced Technology Laboratory of Lockheed Martin.


user: va nd ur me domain: cs.jhu.edu


my last name in bibtex is: {Van Durme}


---------- 2014 ----------

Information Extraction over Structured Data: Question Answering with Freebase. Xuchen Yao and Benjamin Van Durme. ACL. 2014. [pdf]

Low-Resource Semantic Role Labeling. Matthew R. Gormley, Margaret Mitchell, Benjamin Van Durme and Mark Dredze. ACL. 2014. [pdf]

Inferring User Political Preferences from Streaming Communications. Svitlana Volkova, Glen Coppersmith and Benjamin Van Durme. ACL. 2014. [pdf]

Particle Filter Rejuvenation and Latent Dirichlet Allocation. Chandler May, Alex Clemmer and Benjamin Van Durme. ACL Short. 2014. [pdf]

Exponential Reservoir Sampling for Streaming Language Models. Miles Osborne, Ashwin Lall and Benjamin Van Durme. ACL Short. 2014. [pdf] [slides]

Biases in Predicting the Human Language Model. Alex Fine, Austin Frank, T. Florian Jaeger and Benjamin Van Durme. ACL Short. 2014. [pdf]

I’m a Belieber: Social Roles via Self-identification and Conceptual Attributes. Charley Beller, Rebecca Knowles, Craig Harman, Shane Bergsma, Margaret Mitchell and Benjamin Van Durme. ACL Short. 2014. [pdf]

Freebase QA: Information Extraction or Semantic Parsing?. Xuchen Yao, Jonathan Berant and Benjamin Van Durme. ACL Workshop: Semantic Parsing. 2014. [pdf]

Efficient Elicitation of Annotations for Human Evaluation of Machine Translation. Keisuke Sakaguchi, Matt Post and Benjamin Van Durme. ACL Workshop: WMT. 2014. [pdf]

A Comparison of the Events and Relations Across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards. Jacqueline Aguilar, Charley Beller, Paul McNamee, Benjamin Van Durme, Stephanie Strassel, Zhiyi Song and Joe Ellis. ACL Workshop: EVENTS. 2014. [pdf]

Is the Stanford Dependency Representation Semantic?. Rachel Rudinger and Benjamin Van Durme. ACL Workshop: EVENTS. 2014. [pdf]

Augmenting FrameNet Via PPDB. Pushpendre Rastogi and Benjamin Van Durme. ACL Workshop: EVENTS. 2014. [pdf]

Predicting Fine-grained Social Roles with Selectional Preferences. Charley Beller, Craig Harman and Benjamin Van Durme. ACL Workshop: LACSS. 2014. [pdf]

A Wikipedia-based Corpus for Contextualized Machine Translation. Jennifer Drexler, Pushpendre Rastogi, Jacqueline Aguilar, Benjamin Van Durme and Matt Post. LREC. 2014. [pdf]

---------- 2013 ----------

Evaluating Progress in Probabilistic Programming through Topic Models. Francis Ferraro, Benjamin Van Durme and Yanif Ahmad. NIPS Workshop on Topic Models: Computation, Application, and Evaluation. 2013.

Semi-Markov Phrase-based Monolingual Alignment. Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch and Peter Clark. EMNLP. 2013. [pdf] [demo] [code]

Open Domain Targeted Sentiment. Margaret Mitchell, Jacqui Aguilar, Theresa Wilson and Benjamin Van Durme. EMNLP. 2013. [pdf]

Reporting Bias and Knowledge Extraction. Jonathan Gordon and Benjamin Van Durme. Automated Knowledge Base Construction (AKBC) 2013: The 3rd Workshop on Knowledge Extraction, at CIKM 2013. (Best Paper) [pdf]

Using Conceptual Class Attributes to Characterize Social Media Users. Shane Bergsma and Benjamin Van Durme. ACL. 2013. [pdf]

A Lightweight and High Performance Monolingual Word Aligner. Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch and Peter Clark. ACL Short. 2013. [pdf] [demo] [code]

Automatic Coupling of Answer Extraction and Information Retrieval. Xuchen Yao, Benjamin Van Durme and Peter Clark. ACL Short. 2013. [pdf]

PARMA: A Predicate Argument Aligner. Travis Wolfe, Benjamin Van Durme, Mark Dredze, Nicholas Andrews, Charley Beller, Chris Callison-Burch, Jay DeYoung, Justin Snyder, Jonathan Weese, Tan Xu, and Xuchen Yao. ACL Short. 2013. [pdf] [code]

Answer Extraction as Sequence Tagging with Tree Edit Distance. Xuchen Yao, Benjamin Van Durme, Peter Clark and Chris Callison-Burch. NAACL. 2013. [pdf] [code]

Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter. Shane Bergsma, Mark Dredze, Benjamin Van Durme, Theresa Wilson, and David Yarowsky. NAACL. 2013. [pdf] [data]

PPDB: The Paraphrase Database. Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch. NAACL Short. 2013. [pdf] [data]

Nerit: Named Entity Recognition for Informal Text. David Etter, Francis Ferraro, Ryan Cotterell, Olivia Buzek, and Benjamin Van Durme. Technical Report 11. Human Language Technology Center of Excellence, Johns Hopkins University. July, 2013. [pdf]

---------- 2012 ----------

Streaming Analysis of Discourse Participants. Benjamin Van Durme. EMNLP. 2012. [pdf]

The JHU-HLTCOE Spoken Web Search System for MediaEval 2012. Aren Jansen, Benjamin Van Durme, and Pascal Clark. Proceedings of the MediaEval 2012 Workshop. [pdf]

Indexing Raw Acoustic Features for Scalable Zero Resource Search. Aren Jansen and Benjamin Van Durme. InterSpeech. 2012. [pdf]

Monolingual Distributional Similarity for Text-to-Text Generation. Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch. STARSEM. 2012. [pdf]

Shared Components Topic Models. Matthew R. Gormley, Mark Dredze, Benjamin Van Durme, and Jason Eisner. NAACL. 2012. [pdf]

Expectations of Word Sense in Parallel Corpora. Xuchen Yao, Benjamin Van Durme and Chris Callison-Burch. NAACL Short. 2012. [pdf]

Space Efficiencies in Discourse Modeling via Conditional Random Sampling. Brian Kjersten and Benjamin Van Durme. NAACL Short. 2012. [pdf]

Annotated Gigaword.Courtney Napoles, Matthew R. Gormley and Benjamin Van Durme. NAACL Workshop: AKBC-WEKEX. 2012. [pdf]

Judging Grammaticality with Count-Induced Tree Substitution Grammars. Francis Ferraro, Matt Post and Benjamin Van Durme. NAACL Workshop: BEA. 2012. [pdf]

Toward Tree Substitution Grammars with Latent Annotations. Francis Ferraro, Benjamin Van Durme and Matt Post. NAACL Workshop: Inducing Linguistic Structure. 2012. [pdf]

Jerboa: A Toolkit for Randomized and Streaming Algorithms.Benjamin Van Durme. Technical Report 7, Human Language Technology Center of Excellence, Johns Hopkins University. 2012. [pdf] [git]

Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing. Prabhakaran et al. ACL Workshop: ExProM. 2012. [pdf]

---------- 2011 ----------

Shared Components Topic Models with Application to Selectional Preference. Matthew R. Gormley, Mark Dredze, Benjamin Van Durme, and Jason Eisner. NIPS Workshop: Learning Semantics. 2011.

Efficient Spoken Term Discovery using Randomized Algorithms. Aren Jansen and Benjamin Van Durme. ASRU. 2011. [pdf]

Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity. Tsz Ping Chan, Chris Callison-Burch and Benjamin Van Durme. EMNLP Workshop: GEMS. 2011. [pdf] [slides]

Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation. Juri Ganitkevitch, Chris Callison-Burch, Courtney Napoles and Benjamin Van Durme. EMNLP. 2011. [pdf]

Nonparametric Bayesian Word Sense Induction. Xuchen Yao and Benjamin Van Durme. ACL Workshop: Textgraphs. 2011. [pdf]

Paraphrastic Sentence Compression with a Character-based Metric: Tightening without Deletion. Courtney Napoles, Chris Callison-Burch, Juri Ganitkevitch and Benjamin Van Durme. ACL Workshop on Monolingual Text-To-Text Generation. 2011. [pdf]

Evaluating sentence compression: Pitfalls and suggested remedies. Courtney Napoles, Benjamin Van Durme and Chris Callison-Burch. ACL Workshop on Monolingual Text-To-Text Generation. 2011. [pdf]

WikiTopics: What is popular on Wikipedia and why. Byung Gyu Ahn, Chris Callison-Burch and Benjamin Van Durme. ACL Workshop on Automatic Summarization for Different Genres, Media, and Languages. 2011. [pdf]

Efficient Online Locality Sensitive Hashing via Reservoir Counting. Benjamin Van Durme and Ashwin Lall. ACL Short. 2011. [pdf] [bib] [slides]

Learning Bilingual Lexicons using the Visual Similarity of Labeled Web Images. Shane Bergsma and Benjamin Van Durme. IJCAI. 2011. [pdf] [bib]

Reports of the AAAI 2010 Fall Symposia. Azevedo et al. AI Magazine. Spring 2011. [pdf]

---------- 2010 ----------

Extracting Implicit Knowledge from Text. Benjamin Van Durme. PhD Thesis. University of Rochester. 2010. [link]

Entailment Inference in a Natural Logic-like General Reasoner. Lenhart K. Schubert, Benjamin Van Durme, and Marzieh Bazrafshan. AAAI Fall Symposium on Commonsense Knowledge (CSK10). 2010. [pdf] [bib]

Online Generation of Locality Sensitive Hash Signatures. Benjamin Van Durme and Ashwin Lall. ACL Short. 2010. [pdf] [bib] [slides]

Learning from the Web: Extracting General World Knowledge from Noisy Text. Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. AAAI Workshop on Collaboratively-built Knowledge Sources and Artificial Intelligence. 2010. [pdf]

Evaluation of Commonsense Knowledge with Mechanical Turk. Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. NAACL Workshop on Amazon Mechanical Turk. 2010. [pdf]

---------- 2009 ----------

Streaming Pointwise Mutual Information. Benjamin Van Durme and Ashwin Lall. NIPS. 2009. [pdf] [bib]

Weblogs as a Source for Extracting General World Knowledge.Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. K-CAP. 2009. [pdf]

Probabilistic Counting with Randomized Storage. Benjamin Van Durme and Ashwin Lall. IJCAI. 2009. [pdf] [slides]

Topic Models for Corpus-centric Knowledge Generalization.Benjamin Van Durme and Daniel Gildea. Technical Report TR-946, Department of Computer Science, University of Rochester, Rochester, NY 14627, June 2009. [pdf]

Building a Semantic Lexicon of English Nouns via Bootstrapping. Ting Qian, Benjamin Van Durme and Lenhart K. Schubert. NAACL Student Research Workshop. 2009.

Deriving Generalized Knowledge from Corpora using WordNet Abstraction. Benjamin Van Durme, Phillip Michalak and Lenhart K. Schubert. EACL. 2009. [pdf] [bib]

Comparing Sources of Corpus Frequency Information. Benjamin Van Durme, Austin Frank and T. Florian Jaeger. The 22nd Annual Meeting of the CUNY Conference on Human Sentence Processing (CUNY-09). 2009.

---------- 2008 ----------

Open Extraction of General Knowledge through Compositional Semantics. Lenhart K. Schubert and Benjamin Van Durme. NSF Symposium on Semantic Knowledge Discovery, Organization and Use. 2008. [pdf]

Open Knowledge Extraction through Compositional Language Processing. Benjamin Van Durme and Lenhart K. Schubert. Symposium on Semantics in Systems for Text Processing (STEP). 2008. [pdf]

Class-Driven Attribute Extraction. Benjamin Van Durme, Ting Qian and Lenhart K. Schubert. COLING. 2008. [pdf]

The Web as a Psycholinguistic Resource. Austin F. Frank, Celeste Kidd, Matthew Post, Benjamin Van Durme and T. Florian Jaeger. Presented at the 5th International Workshop on Language Production. Annapolis, MD. July 28-30, 2008.

Finding Cars, Goddesses and Enzymes: Parametrizable Acquisition of Labeled Instances for Open-Domain Information Extraction. Benjamin Van Durme and Marius Pasca. AAAI. 2008. [pdf]

Notes on the Acquisition of Conditional Knowledge. Benjamin Van Durme. Technical Report TR-937, Department of Computer Science, University of Rochester, Rochester, NY 14627. June 2008. [pdf]

Mining Parenthetical Translations from the Web by Word Alignment. Dekang Lin, Shaojun Zhao, Benjamin Van Durme and Marius Pasca. The 46th Annual Meeting of the Association of Computational Linguistics: Human Language Technologies (ACL-08: HLT). Columbus, Ohio, USA. June 15-20, 2008. [pdf]

Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs. Marius Pasca and Benjamin Van Durme. ACL. 2008. [pdf]

---------- 2007 ----------

The Role of Documents vs. Queries in Extracting Class Attributes from Text . Marius Pasca, Benjamin Van Durme, Nikesh Garera. ACM Sixteenth Conference on Information and Knowledge Management (CIKM 2007). Lisboa, Portugal. November 6-9, 2007. [pdf]

What You Seek is What You Get: Extraction of Class Attributes from Query Logs. Marius Pasca and Benjamin Van Durme. Hyderabad, India, February 2007. International Joint Conference on Artificial Intelligence (IJCAI-07). [pdf]

---------- 2004 ----------

Pronominal Anaphora Resolution for Unrestricted Text. Anna Kupsc, Teruko Mitamura, Benjamin Van Durme, Eric Nyberg. Lisbon, Portugal, May 24-30 2004. LREC.

---------- 2003 ----------

Towards Light Semantic Processing for Question Answering. Benjamin Van Durme, Yifen Huang, Anna Kupsc, Eric Nyberg. Edmonton, Canada, May 31 2003. HLT/NAACL Workshop on Text Meaning. [pdf]

The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategy Approach with Dynamic Planning. E. Nyberg, T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K. Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J. Ko, A. Kupsc, L. Lita, V. Pedro, D. Svoboda and B. Van Durme. 12th Text REtrieval Conference, November 2003.