Benjamin Van Durme :: vandurme@cs.jhu.edu

Johns Hopkins University
Assistant Research Professor Department of Computer Science
Center for Language and Speech Processing
Research Scientist Human Language Technology Center of Excellence


My work tends in the areas of natural language processing (specifically computational semantics) and streaming/randomized algorithms.

PhD students (co-)advised: Courtney Napoles, Byung Gyu Ahn, Xuchen Yao, Svitlana Volkova, Keith Levin, and Frank Ferraro.
Awesome post-docs: Margaret Mitchell.
Prior students: Charley (Tsz Ping) Chan (MS).

Kyle Rawlins and I are coteaching Event Semantics in Fall 2013, stay tuned!

Jointly with Chris Callison-Burch, I run the Textual Choreography (TeCho) group (photo): a loose confederation of students doing research in text-mungy things, such as Text2Text rewriting, paraphrase acquisition, question answering, logical form generation, topic detection and tracking (TDT) in news streams, etc.

I co-lead the Text efforts at HLTCOE, including social media (partial photo). Lately I've been concerned with applying my work in streaming algorithms to create dynamic, memory efficient models for analyzing social media. Collaborators include David Yarowsky, Glen Coppersmith, Theresa Wilson, and Mark Dredze.

Ashwin Lall and I have collaborated on algorithms for handling large quantities of (streaming) data, furthered by early discussions with Miles Osborne and David Talbot. Lately I've worked with Aren Jansen to create exceptionally scalable tools for searching raw speech data, as part of the Zero Resource effort at the COE. Other local JHU colleagues on this front include Yanif Ahmad, and Vladimir Braverman. Much of this work is bundled into the Jerboa package. Alex Clemmer works this line remotely from Utah.

Further collaborators include: Matthew Post, Jason Eisner, Shane Bergsma, and students including Matt Gormley, Juri Ganitkevitch, Olivia Buzek, Travis Wolfe and Jonny Weese.

My thesis was joint in Computer Science and Linguistics, titled: Extracting Implicit Knowledge from Text. The committee consisted of Len Schubert and Dan Gildea of Rochester Computer Science, Greg Carlson of Rochester Linguistics, and William Cohen of the Machine Learning Department at CMU. This work fell under the KNEXT project, aimed at extracting commonsense knowledge from text. I spent two Summers while a PhD student as an intern at Google Research, working with Marius Pasca on knowledge extraction from search engine query logs. Prior to Rochester I was a student in the Language Technologies Institute at CMU, working with Eric Nyberg and colleagues on Question Answering (which eventually helped lead to IBM's Watson). Before that I worked as a research engineer in the AI Division of the Advance Technology Laboratory of Lockheed Martin.


user: va nd ur me domain: cs.jhu.edu


my last name in bibtex is: {Van Durme}


---------- 2013 ----------

Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch. PPDB: The Paraphrase Database. NAACL Short. 2013.

Xuchen Yao, Benjamin Van Durme, Peter Clark and Chris Callison-Burch. Answer Extraction as Sequence Tagging with Tree Edit Distance. NAACL. 2013.

Shane Bergsma, Mark Dredze, Benjamin Van Durme, Theresa Wilson, and David Yarowsky. Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter. NAACL. 2013.

---------- 2012 ----------

Benjamin Van Durme. Streaming Analysis of Discourse Participants. EMNLP. 2012. [pdf]

Aren Jansen, Benjamin Van Durme, and Pascal Clark. The JHU-HLTCOE Spoken Web Search System for MediaEval 2012. Proceedings of the MediaEval 2012 Workshop.

Aren Jansen and Benjamin Van Durme. Indexing Raw Acoustic Features for Scalable Zero Resource Search. InterSpeech. 2012.

Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch. Monolingual Distributional Similarity for Text-to-Text Generation. STARSEM. 2012. [pdf]

Matthew R. Gormley, Mark Dredze, Benjamin Van Durme, and Jason Eisner. Shared Components Topic Models. NAACL. 2012. [pdf]

Xuchen Yao, Benjamin Van Durme and Chris Callison-Burch. Expectations of Word Sense in Parallel Corpora. NAACL Short. 2012. [pdf]

Brian Kjersten and Benjamin Van Durme. Space Efficiencies in Discourse Modeling via Conditional Random Sampling. NAACL Short. 2012. [pdf]

Courtney Napoles, Matthew R. Gormley and Benjamin Van Durme. Annotated Gigaword. NAACL Workshop: AKBC-WEKEX. 2012. [pdf]

Frank Ferraro, Matt Post and Benjamin Van Durme. Judging Grammaticality with Count-Induced Tree Substitution Grammars. NAACL Workshop: BEA. 2012. [pdf]

Frank Ferraro, Benjamin Van Durme and Matt Post. Toward Tree Substitution Grammars with Latent Annotations. NAACL Workshop: Inducing Linguistic Structure. 2012. [pdf]

Benjamin Van Durme. Jerboa: A Toolkit for Randomized and Streaming Algorithms. Technical Report 7, Human Language Technology Center of Excellence, Johns Hopkins University. 2012. [pdf] [git]

Prabhakaran et al. Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing. ACL Workshop: ExProM. 2012. [pdf]

---------- 2011 ----------

Matthew R. Gormley, Mark Dredze, Benjamin Van Durme, and Jason Eisner. Shared Components Topic Models with Application to Selectional Preference. NIPS Workshop: Learning Semantics. 2011.

Aren Jansen and Benjamin Van Durme. Efficient Spoken Term Discovery using Randomized Algorithms. ASRU. 2011. [pdf]

Tsz Ping Chan, Chris Callison-Burch and Benjamin Van Durme. Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity. EMNLP Workshop: GEMS. 2011. [pdf] [slides]

Juri Ganitkevitch, Chris Callison-Burch, Courtney Napoles and Benjamin Van Durme. Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation. EMNLP. 2011. [pdf]

Xuchen Yao and Benjamin Van Durme. Nonparametric Bayesian Word Sense Induction. ACL Workshop: Textgraphs. 2011. [pdf]

Courtney Napoles, Chris Callison-Burch, Juri Ganitkevitch and Benjamin Van Durme. Paraphrastic Sentence Compression with a Character-based Metric: Tightening without Deletion. ACL Workshop on Monolingual Text-To-Text Generation. 2011. [pdf]

Courtney Napoles, Benjamin Van Durme and Chris Callison-Burch. Evaluating sentence compression: Pitfalls and suggested remedies. ACL Workshop on Monolingual Text-To-Text Generation. 2011. [pdf]

Byung Gyu Ahn, Chris Callison-Burch and Benjamin Van Durme. WikiTopics: What is popular on Wikipedia and why. ACL Workshop on Automatic Summarization for Different Genres, Media, and Languages. 2011. [pdf]

Benjamin Van Durme and Ashwin Lall. Efficient Online Locality Sensitive Hashing via Reservoir Counting. ACL Short. 2011. [pdf] [bib] [slides]

Shane Bergsma and Benjamin Van Durme. Learning Bilingual Lexicons using the Visual Similarity of Labeled Web Images. IJCAI. 2011. [pdf] [bib]

Azevedo et al. Reports of the AAAI 2010 Fall Symposia. AI Magazine. Spring 2011. [pdf]

---------- 2010 ----------

Benjamin Van Durme. Extracting Implicit Knowledge from Text. PhD Thesis. University of Rochester. 2010. [link]

Lenhart K. Schubert, Benjamin Van Durme, and Marzieh Bazrafshan. Entailment Inference in a Natural Logic-like General Reasoner. AAAI Fall Symposium on Commonsense Knowledge (CSK10). 2010. [pdf] [bib]

Benjamin Van Durme and Ashwin Lall. Online Generation of Locality Sensitive Hash Signatures. ACL Short. 2010. [pdf] [bib] [slides]

Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. Learning from the Web: Extracting General World Knowledge from Noisy Text. AAAI Workshop on Collaboratively-built Knowledge Sources and Artificial Intelligence. 2010. [pdf]

Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. Evaluation of Commonsense Knowledge with Mechanical Turk. NAACL Workshop on Amazon Mechanical Turk. 2010. [pdf]

---------- 2009 ----------

Benjamin Van Durme and Ashwin Lall. Streaming Pointwise Mutual Information. NIPS. 2009. [pdf] [bib]

Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. Weblogs as a Source for Extracting General World Knowledge. K-CAP. 2009. [pdf]

Benjamin Van Durme and Ashwin Lall. Probabilistic Counting with Randomized Storage. IJCAI. 2009. [pdf] [slides]

Benjamin Van Durme and Daniel Gildea. Topic Models for Corpus-centric Knowledge Generalization. Technical Report TR-946, Department of Computer Science, University of Rochester, Rochester, NY 14627, June 2009. [pdf]

Ting Qian, Benjamin Van Durme and Lenhart K. Schubert. Building a Semantic Lexicon of English Nouns via Bootstrapping. NAACL Student Research Workshop. 2009.

Benjamin Van Durme, Phillip Michalak and Lenhart K. Schubert. Deriving Generalized Knowledge from Corpora using WordNet Abstraction. EACL. 2009. [pdf] [bib]

Benjamin Van Durme, Austin Frank and T. Florian Jaeger. Comparing Sources of Corpus Frequency Information. The 22nd Annual Meeting of the CUNY Conference on Human Sentence Processing (CUNY-09). 2009.

---------- 2008 ----------

Lenhart K. Schubert and Benjamin Van Durme. Open Extraction of General Knowledge through Compositional Semantics. NSF Symposium on Semantic Knowledge Discovery, Organization and Use. 2008. [pdf]

Benjamin Van Durme and Lenhart K. Schubert. Open Knowledge Extraction through Compositional Language Processing. Symposium on Semantics in Systems for Text Processing (STEP). 2008. [pdf]

Benjamin Van Durme, Ting Qian and Lenhart K. Schubert. Class-Driven Attribute Extraction. COLING. 2008. [pdf]

Austin F. Frank, Celeste Kidd, Matthew Post, Benjamin Van Durme and T. Florian Jaeger. The Web as a Psycholinguistic Resource. Presented at the 5th International Workshop on Language Production. Annapolis, MD. July 28-30, 2008.

Benjamin Van Durme and Marius Pasca. Finding Cars, Goddesses and Enzymes: Parametrizable Acquisition of Labeled Instances for Open-Domain Information Extraction. AAAI. 2008. [pdf]

Benjamin Van Durme. Notes on the Acquisition of Conditional Knowledge. Technical Report TR-937, Department of Computer Science, University of Rochester, Rochester, NY 14627. June 2008. [pdf]

Dekang Lin, Shaojun Zhao, Benjamin Van Durme and Marius Pasca. Mining Parenthetical Translations from the Web by Word Alignment. The 46th Annual Meeting of the Association of Computational Linguistics: Human Language Technologies (ACL-08: HLT). Columbus, Ohio, USA. June 15-20, 2008. [pdf]

Marius Pasca and Benjamin Van Durme. Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs. ACL. 2008. [pdf]

---------- 2007 ----------

Marius Pasca, Benjamin Van Durme, Nikesh Garera. The Role of Documents vs. Queries in Extracting Class Attributes from Text . ACM Sixteenth Conference on Information and Knowledge Management (CIKM 2007). Lisboa, Portugal. November 6-9, 2007. [pdf]

Marius Pasca and Benjamin Van Durme. What You Seek is What You Get: Extraction of Class Attributes from Query Logs. Hyderabad, India, February 2007. International Joint Conference on Artificial Intelligence (IJCAI-07). [pdf]

---------- 2004 ----------

Anna Kupsc, Teruko Mitamura, Benjamin Van Durme, Eric Nyberg. Pronominal Anaphora Resolution for Unrestricted Text. Lisbon, Portugal, May 24-30 2004. LREC.

---------- 2003 ----------

Benjamin Van Durme, Yifen Huang, Anna Kupsc, Eric Nyberg. Towards Light Semantic Processing for Question Answering. Edmonton, Canada, May 31 2003. HLT/NAACL Workshop on Text Meaning. [pdf]

E. Nyberg, T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K. Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J. Ko, A. Kupsc, L. Lita, V. Pedro, D. Svoboda and B. Van Durme. The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategy Approach with Dynamic Planning. 12th Text REtrieval Conference, November 2003.