David A. Smith

David on the slope

I am now at the University of Massachusetts, Amherst.

But in the past:

Ph.D. student in Natural Language Processing at Johns Hopkins University
Computer Science & Engineering Building 322
3400 N. Charles St.
Baltimore, MD 21218
dasmithATSIGNjhu.edu

Research Interests: machine translation, natural language parsing, semi-supervised machine learning methods, digital libraries

Some of my research interests have web pages: syntax for statistical machine translation and the dyna language for dynamic programming.

Formerly: Head Programmer, Perseus Project, Tufts University

August 2006: Charles Schafer and I presented a tutorial, Overview of Statistical Machine Translation [pdf], at the Association for Machine Translation in the Americas.

Fall 2005: Noah Smith and I designed and taught a course on Empirical Research Methods in Computer Science.

See also my curriculum vitae in PDF.

Refereed Conference Proceedings

Natural Language Processing and Machine Learning

David A. Smith and Jason Eisner. Bootstrapping feature-rich dependency parsers with entropic priors. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 667-677, 2007. Nominated for best paper award. [ PDF | PowerPoint slides ]

David A. Smith and Noah A. Smith. Probabilistic models of nonprojective dependency trees. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 132-140, 2007. [ PDF | PowerPoint slides ]

Keith Hall, Jiří Havelka, and David A. Smith. Log-linear models of non-projective trees, k-best MST parsing and tree-ranking. In Proceedings of the CoNLL Shared Task, pages 962-966, 2007.

David A. Smith and Jason Eisner. Quasi-synchronous grammars: Alignment by soft projection of syntactic dependencies. In Proceedings of the HLT-NAACL Workshop on Statistical Machine Translation, pages 23-30, 2006. [ PDF | PowerPoint slides ]

Markus Dreyer, David A. Smith, and Noah A. Smith. Vine parsing and minimum risk reranking for speed and precision. In Proceedings of the CoNLL Shared Task, pages 201-205, 2006. [ PDF ]

David A. Smith and Jason Eisner. Minimum risk annealing for training log-linear models. In Proceedings of the International Conference on Computational Linguistics and the Association for Computational Linguistics, pages 787-794, 2006. [ PDF ]

Noah A. Smith, David A. Smith, and Roy W. Tromble. Context-based morphological disambiguation with random fields. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 475-482, 2005. [ PDF ]

F.J. Och, D. Gildea, S. Khudanpur, A. Sarkar, K. Yamada, A. Fraser, S. Kumar, L. Shen, D. Smith, K. Eng, V. Jain, Z. Jin, and D. Radev. A smorgasbord of features for statistical machine translation. In Proceedings of the Conference on Human Language Technology and the North American Association for Computational Linguistics, pages 161-168, 2004. [ PDF ]

David A. Smith and Noah A. Smith. Bilingual parsing with factored estimation: Using English to parse Korean. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 49-56, 2004. [ PDF ]

Information Extraction and Retrieval

David A. Smith and Gideon S. Mann. Bootstrapping toponym classifiers. In Proceedings of the HLT-NAACL Workshop on Analysis of Geographic References, pages 45-49, 2003. [ PDF ]

David A. Smith. Detecting and browsing events in unstructured text. In Proceedings of the 25th Annual ACM SIGIR Conference, pages 73-80, Tampere, Finland, August 2002. [ PDF ]

David A. Smith. Detecting events with date and place information in unstructured text. In Proceedings of the 2nd ACM+IEEE Joint Conference on Digital Libraries, pages 191-196, Portland, OR, July 2002. [ PDF ]

David A. Smith and Gregory Crane. Disambiguating geographic names in a historical digital library. In Proceedings of the European Conference on Digital Libraries (ECDL), pages 127-136, Darmstadt, Germany, September 2001. [ PDF ]

Digital Libraries

Gregory Crane, Clifford E. Wulfman, Lisa M. Cerrato, Anne Mahoney, Thomas L. Milbank, David Mimno, Jeffrey A. Rydberg-Cox, David A. Smith, and Christopher York. Towards a cultural heritage digital library. In Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2003, pages 75-86, Houston, TX, June 2003. [ PDF ]

David A. Smith, Anne Mahoney, and Gregory Crane. Integrating harvesting into digital library content. In Proceedings of the 2nd ACM+IEEE Joint Conference on Digital Libraries, pages 183-184, Portland, OR, July 2002. [ PDF ]

Gregory Crane, David A. Smith, and Clifford E. Wulfman. Building a hypertextual digital library in the humanities: A case study on London. In Proceedings of the First ACM+IEEE Joint Conference on Digital Libraries, pages 426-434, Roanoke, VA, June 2001. Best paper award. [ PDF ]

David A. Smith, Anne Mahoney, and Jeffrey A. Rydberg-Cox. Management of XML documents in an integrated digital library. In Proceedings of Extreme Markup Languages 2000, pages 219-224, Montreal, August 2000.

Refereed Journal Articles

David A. Smith, Jason Eisner, and Noah A. Smith. Minimum risk annealing: Case studies in nonprojective dependency parsing and machine translation. Submitted to Computational Linguistics, 2008.

Gregory R. Crane, Robert F. Chavez, Anne Mahoney, Thomas L. Milbank, Jeffrey A. Rydberg-Cox, David A. Smith, and Clifford E. Wulfman. Drudgery and deep thought: Designing a digital library for the humanities. Communications of the Association for Computing Machinery, 44(5):35-40, 2001. [ PDF ]

David A. Smith, Jeffrey A. Rydberg-Cox, and Gregory R. Crane. The Perseus Project: A digital library for the humanities. Literary and Linguistic Computing, 15(1):15-25, 2000.

David A. Smith, Anne Mahoney, and Jeffrey A. Rydberg-Cox. Management of XML documents in an integrated digital library. Markup Languages: Theory and Practice, 2(3):205-214, 2000. [ PDF ]

David A. Smith. Textual variation and version control in the TEI. Computers and the Humanities, 33(1-2):103-112, 1999.

Other Publications

David A. Smith. Debabelizing libraries: Machine translation by and for digital collections. D-Lib Magazine, 12(3), March 2006. [ HTML ]

Anne Mahoney, Jeffrey A. Rydberg-Cox, David A. Smith, and Clifford E. Wulfman. Generalizing the Perseus XML document manager. In Linguistic Exploration: Workshop on Web-based Language Documentation and Description, Philadelphia, December 2000. [ HTML ]