Benjamin Van Durme :: vandurme@cs.jhu.edu

Johns Hopkins University
Assistant Research Professor Department of Computer Science
Center for Language and Speech Processing
Research Scientist Human Language Technology Center of Excellence


My work tends in the areas of natural language processing (specifically computational semantics) and streaming/randomized algorithms.

Jointly with Chris Callison-Burch, I run the Textual Choreography (TeCho) group: a loose confederation of students doing research in text-mungy things, such as Text2Text rewriting, paraphrase acquisition, question generation, logical form generation, topic detection and tracking (TDT) in news streams, etc. This group includes Katherine Wu, our talented undergraduate research assistant/lab manager.

Jointly with David Yarowsky, I co-lead the Language Understanding effort at HLTCOE. Lately in this area I've been concerned with applying my work in streaming algorithms to create dynamic, memory efficient models for analyzing social media. Some of my fellow collaborators in this space include Theresa Wilson, and Mark Dredze at COE, and Owen Rambow at Columbia.

I (co-)supervise: Courtney Napoles, Byung Gyu Ahn, Xuchen Yao, Brian Kjersten, Charley (Tsz Ping) Chan, and Frank Ferraro.

I interact with a variety of researchers at JHU, such as: Kyle Rawlins, Matthew Post, Jason Eisner, Shane Bergsma, and students including Matt Gormley, Juri Ganitkevitch, Olivia Buzek, and Jonny Weese.

Ashwin Lall and I collaborate on algorithms for handling large quantities of (streaming) data. My interest in this area was partially motivated by conversations with Miles Osborne and David Talbot. Locally I discuss these matters with Glen Coppersmith, Damianos Karakos, Aren Jansen, Yanif Ahmad, and Vladimir Braverman.
On December 17th, 2009, I defended my thesis, titled: Extracting Implicit Knowledge from Text. My committee consisted of Len Schubert and Dan Gildea of Rochester Computer Science, Greg Carlson of Rochester Linguistics, and William Cohen of the Machine Learning Department at CMU. This work fell under the KNEXT project, aimed at extracting commonsense knowledge from text.
I spent the summers of '06 and '07 doing research at Google with Marius Pasca. This work was focussed on finding characteristic attributes (e.g., "mayor") for concept classes (e.g., "city"), as well as collecting such classes automatically. This work benefited from interactions with Deepak Ravichandran and Dekang Lin.

I am an alumnus of the HLP Lab, led by T. Florian Jaeger, of the Brain and Cognitive Science Department at Rochester, where I looked for ways to co-mingle ideas from computer science and psycholinguistics. I was primarily concerned with measuring the impact of corpora source-selection when deriving n-gram statistics (these statistics are known to correlate with human performance in language production and comprehension, but which collection gives the best frequencies?). This involved collaborations with Austin Frank, Alex Fine, and Celeste Kidd.

While a grad student at CMU (working primarily with Eric Nyberg and Bob Frederking), I was involved with the department's efforts in the AQUAINT project on Question Answering, as well as Project HALO, aimed at building a system that could answer AP science exam questions.


user: va nd ur me domain: cs.jhu.edu


my last name in bibtex is: {Van Durme}


---------- 2012 ----------

Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch. Monolingual Distributional Similarity for Text-to-Text Generation. STARSEM. 2012.

Matthew R. Gormley, Mark Dredze, Benjamin Van Durme, and Jason Eisner. Shared Components Topic Models. NAACL. 2012.

Xuchen Yao, Benjamin Van Durme and Chris Callison-Burch. Expectations of Word Sense in Parallel Corpora. NAACL Short. 2012.

Brian Kjersten and Benjamin Van Durme. Space Efficiencies in Discourse Modeling via Conditional Random Sampling. NAACL Short. 2012.

Courtney Napoles, Matthew R. Gormley and Benjamin Van Durme. Annotated Gigaword. NAACL Workshop: AKBC-WEKEX. 2012.

Frank Ferraro, Matt Post and Benjamin Van Durme. Judging Grammaticality with Count-Induced Tree Substitution Grammars. NAACL Workshop: BEA. 2012.

Frank Ferraro, Matt Post and Benjamin Van Durme. Toward Tree Substitution Grammars with Latent Annotations. NAACL Workshop: Inducing Linguistic Structure. 2012.

Prabhakaran et al. Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing. ACL Workshop: ExProM. 2012.

---------- 2011 ----------

Matthew R. Gormley, Mark Dredze, Benjamin Van Durme, and Jason Eisner. Shared Components Topic Models with Application to Selectional Preference. NIPS Workshop: Learning Semantics. 2011.

Aren Jansen and Benjamin Van Durme. Efficient Spoken Term Discovery using Randomized Algorithms. ASRU. 2011. [pdf]

Tsz Ping Chan, Chris Callison-Burch and Benjamin Van Durme. Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity. EMNLP Workshop: GEMS. 2011. [pdf] [slides]

Juri Ganitkevitch, Chris Callison-Burch, Courtney Napoles and Benjamin Van Durme. Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation. EMNLP. 2011. [pdf]

Xuchen Yao and Benjamin Van Durme. Nonparametric Bayesian Word Sense Induction. ACL Workshop: Textgraphs. 2011. [pdf]

Courtney Napoles, Chris Callison-Burch, Juri Ganitkevitch and Benjamin Van Durme. Paraphrastic Sentence Compression with a Character-based Metric: Tightening without Deletion. ACL Workshop on Monolingual Text-To-Text Generation. 2011. [pdf]

Courtney Napoles, Benjamin Van Durme and Chris Callison-Burch. Evaluating sentence compression: Pitfalls and suggested remedies. ACL Workshop on Monolingual Text-To-Text Generation. 2011. [pdf]

Byung Gyu Ahn, Chris Callison-Burch and Benjamin Van Durme. WikiTopics: What is popular on Wikipedia and why. ACL Workshop on Automatic Summarization for Different Genres, Media, and Languages. 2011. [pdf]

Benjamin Van Durme and Ashwin Lall. Efficient Online Locality Sensitive Hashing via Reservoir Counting. ACL Short. 2011. [pdf] [bib] [slides]

Shane Bergsma and Benjamin Van Durme. Learning Bilingual Lexicons using the Visual Similarity of Labeled Web Images. IJCAI. 2011. [pdf] [bib]

Azevedo et al. Reports of the AAAI 2010 Fall Symposia. AI Magazine. Spring 2011. [pdf]

---------- 2010 ----------

Benjamin Van Durme. Extracting Implicit Knowledge from Text. PhD Thesis. University of Rochester. 2010. [link]

Lenhart K. Schubert, Benjamin Van Durme, and Marzieh Bazrafshan. Entailment Inference in a Natural Logic-like General Reasoner. AAAI Fall Symposium on Commonsense Knowledge (CSK10). 2010. [pdf] [bib]

Benjamin Van Durme and Ashwin Lall. Online Generation of Locality Sensitive Hash Signatures. ACL Short. 2010. [pdf] [bib] [slides]

Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. Learning from the Web: Extracting General World Knowledge from Noisy Text. AAAI Workshop on Collaboratively-built Knowledge Sources and Artificial Intelligence. 2010. [pdf]

Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. Evaluation of Commonsense Knowledge with Mechanical Turk. NAACL Workshop on Amazon Mechanical Turk. 2010. [pdf]

---------- 2009 ----------

Benjamin Van Durme and Ashwin Lall. Streaming Pointwise Mutual Information. NIPS. 2009. [pdf] [bib]

Jonathan Gordon, Benjamin Van Durme and Lenhart K. Schubert. Weblogs as a Source for Extracting General World Knowledge. K-CAP. 2009. [pdf]

Benjamin Van Durme and Ashwin Lall. Probabilistic Counting with Randomized Storage. IJCAI. 2009. [pdf] [slides]

Benjamin Van Durme and Daniel Gildea. Topic Models for Corpus-centric Knowledge Generalization. Technical Report TR-946, Department of Computer Science, University of Rochester, Rochester, NY 14627, June 2009. [pdf]

Ting Qian, Benjamin Van Durme and Lenhart K. Schubert. Building a Semantic Lexicon of English Nouns via Bootstrapping. NAACL Student Research Workshop. 2009.

Benjamin Van Durme, Phillip Michalak and Lenhart K. Schubert. Deriving Generalized Knowledge from Corpora using WordNet Abstraction. EACL. 2009. [pdf] [bib]

Benjamin Van Durme, Austin Frank and T. Florian Jaeger. Comparing Sources of Corpus Frequency Information. The 22nd Annual Meeting of the CUNY Conference on Human Sentence Processing (CUNY-09). 2009.

---------- 2008 ----------

Lenhart K. Schubert and Benjamin Van Durme. Open Extraction of General Knowledge through Compositional Semantics. NSF Symposium on Semantic Knowledge Discovery, Organization and Use. 2008. [pdf]

Benjamin Van Durme and Lenhart K. Schubert. Open Knowledge Extraction through Compositional Language Processing. Symposium on Semantics in Systems for Text Processing (STEP). 2008. [pdf]

Benjamin Van Durme, Ting Qian and Lenhart K. Schubert. Class-Driven Attribute Extraction. COLING. 2008. [pdf]

Austin F. Frank, Celeste Kidd, Matthew Post, Benjamin Van Durme and T. Florian Jaeger. The Web as a Psycholinguistic Resource. Presented at the 5th International Workshop on Language Production. Annapolis, MD. July 28-30, 2008.

Benjamin Van Durme and Marius Pasca. Finding Cars, Goddesses and Enzymes: Parametrizable Acquisition of Labeled Instances for Open-Domain Information Extraction. AAAI. 2008. [pdf]

Benjamin Van Durme. Notes on the Acquisition of Conditional Knowledge. Technical Report TR-937, Department of Computer Science, University of Rochester, Rochester, NY 14627. June 2008. [pdf]

Dekang Lin, Shaojun Zhao, Benjamin Van Durme and Marius Pasca. Mining Parenthetical Translations from the Web by Word Alignment. The 46th Annual Meeting of the Association of Computational Linguistics: Human Language Technologies (ACL-08: HLT). Columbus, Ohio, USA. June 15-20, 2008. [pdf]

Marius Pasca and Benjamin Van Durme. Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs. ACL. 2008. [pdf]

---------- 2007 ----------

Marius Pasca, Benjamin Van Durme, Nikesh Garera. The Role of Documents vs. Queries in Extracting Class Attributes from Text . ACM Sixteenth Conference on Information and Knowledge Management (CIKM 2007). Lisboa, Portugal. November 6-9, 2007. [pdf]

Marius Pasca and Benjamin Van Durme. What You Seek is What You Get: Extraction of Class Attributes from Query Logs. Hyderabad, India, February 2007. International Joint Conference on Artificial Intelligence (IJCAI-07). [pdf]

---------- 2004 ----------

Anna Kupsc, Teruko Mitamura, Benjamin Van Durme, Eric Nyberg. Pronominal Anaphora Resolution for Unrestricted Text. Lisbon, Portugal, May 24-30 2004. LREC.

---------- 2003 ----------

Benjamin Van Durme, Yifen Huang, Anna Kupsc, Eric Nyberg. Towards Light Semantic Processing for Question Answering. Edmonton, Canada, May 31 2003. HLT/NAACL Workshop on Text Meaning. [pdf]

E. Nyberg, T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K. Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J. Ko, A. Kupsc, L. Lita, V. Pedro, D. Svoboda and B. Van Durme. The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategy Approach with Dynamic Planning. 12th Text REtrieval Conference, November 2003.