![]() |
|
I have accepted a tenure-track job offer from the Computer and Information Sciences Department at the University of Pennsylvania. On the personal front, Dawn and I got married this September.
I am an associate research professor in the computer science department at Johns Hopkins University. My research interests include statistical machine translation, data-driven paraphrasing, crowdsourcing, and evaluation metrics.
I am the Chair of the NAACL Executive Board, and I am an action editor for the new journal Transactions of the ACL (TACL). I recently finished my term on the editorial board of Computational Linguistics
I co-organize the annual Workshop on Statistical Machine Translation (WMT), which has several shared tasks to evaluate the quality of machine translation, confidence estimation for MT, and automatic evaluation metrics. WMT13 will be held at ACL in Bulgia.
My PhD student Omar Zaidan handed in his thesis last April. You can watch Omar's thesis defense talk on Vimeo.
My research group is currently developing Joshua, an open source decoder for statistical machine translation, which uses synchronous context free grammars. We have recently updated the software so that it extracts linguistically informed translation rules. You can find information about the latest version on the Joshua decoder web site.
I released my software for generating paraphrases from parallel corpora, along with step-by-step instructions on how to use it.
I work with lots of very talented students at Johns Hopkins University. My research group is a small army (here's a group photo):
I co-supervise many of my students with Ben Van Durme. I work with two other machine translation researchers at the HLTCOE: Adam Lopez and Matt Post. We co-taught a Machine Translation course in Spring 2012.
Past members: Serena Jeblee (now a grad student at CMU/LTI) Dmitry Kachaev (now a Presidential Innovation Fellow), Omar Zaidan (now at Microsoft Research) Charley Chan (now at Bloomberg in NYC), Alex Klementiev (now a postdoc at Saarland University), Lane Schwartz (now at AFRL), Zhifei Li (now at Google), Wren Thornton (now doing a PhD in cognitive science at Indiana University).
I served on the thesis committees of Chang Hu, Hala Almaghout, Emily Tucker Prud'hommeaux, Omar Zaidan, Lane Schwartz, Zhifei Li, Nitin Madnani, Yuval Marton, Elliott Drabek and Roy Tromble.
Findings of the 2012 Workshop on Statistical Machine Translation.Callison-Burch, Chris and Koehn, Philipp and Monz, Christof and Post, Matt and Soricut, Radu and Specia, Lucia, 2012. In Proceedings of WMT12. [abstract] [bib]
Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing. Matt Post, Chris Callison-Burch, and Miles Osborne, 2012. In Proceedings of WMT12. [abstract] [bib]
@InProceedings{post-callisonburch-osborne:2012:WMT,
author = {Post, Matt and Callison-Burch, Chris and Osborne, Miles},
title = {Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing},
booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation},
month = {June},
year = {2012},
address = {Montr{\'e}al, Canada},
publisher = {Association for Computational Linguistics},
pages = {401--409},
url = {http://www.aclweb.org/anthology/W12-3152}
}
Using Categorial Grammar to Label Translation Rules. Jonathan Weese, Chris Callison-Burch, and Adam Lopez, 2012. In Proceedings of WMT12. [abstract] [bib]
@InProceedings{weese-callisonburch-lopez:2012:WMT,
author = {Weese, Jonathan and Callison-Burch, Chris and Lopez, Adam},
title = {Using Categorial Grammar to Label Translation Rules},
booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation},
month = {June},
year = {2012},
address = {Montr{\'e}al, Canada},
publisher = {Association for Computational Linguistics},
pages = {222--231},
url = {http://cs.jhu.edu/~ccb/publications/using-categorial-grammar-to-label-translation-rules.pdf}
}
Joshua 4.0: Packing, PRO, and Paraphrases. Juri Ganitkevitch, Yuan Cao, Jonathan Weese, Matt Post, and Chris Callison-Burch, 2012. In Proceedings of WMT12. [abstract] [bib]
@InProceedings{ganitkevitch-EtAl:2012:WMT,
author = {Ganitkevitch, Juri and Cao, Yuan and Weese, Jonathan and Post, Matt and Callison-Burch, Chris},
title = {Joshua 4.0: Packing, PRO, and Paraphrases},
booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation},
month = {June},
year = {2012},
address = {Montr{\'e}al, Canada},
publisher = {Association for Computational Linguistics},
pages = {283--291},
url = {http://cs.jhu.edu/~ccb/publications/joshua-4.0.pdf}
}
Processing Informal, Romanized Pakistani Text Messages. Ann Irvine, Jonathan Weese, and Chris Callison-Burch, 2012. In Proceedings of the NAACL Workshop on Language in Social Media. [abstract] [bib]
@InProceedings{irvine-weese-callisonburch:2012:LSM,
author = {Irvine, Ann and Weese, Jonathan and Callison-Burch, Chris},
title = {Processing Informal, Romanized Pakistani Text Messages},
booktitle = {Proceedings of the Second Workshop on Language in Social Media},
month = {June},
year = {2012},
address = {Montr{\'e}al, Canada},
publisher = {Association for Computational Linguistics},
pages = {75--78},
url = {http://www.aclweb.org/anthology/W12-2109}
}
Monolingual Distributional Similarity for Text-to-Text Generation. Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch, 2012. In Proceedings of *SEM 2012. [abstract] [bib]
@InProceedings{Ganitkevitch-etal:2012:StarSEM,
author = {Juri Ganitkevitch and Benjamin Van Durme and Chris Callison-Burch},
title = {Monolingual Distributional Similarity for Text-to-Text Generation},
booktitle = {*SEM First Joint Conference on Lexical and Computational Semantics},
month = {June},
year = {2012},
address = {Montreal},
publisher = {Association for Computational Linguistics},
url = {http://cs.jhu.edu/~ccb/publications/monolingual-distributional-similarity-for-text-to-text-generation.pdf}
}
Machine Translation of Arabic Dialects. Rabih Zbib, Erika Malchiodi, Jacob Devlin, David Stallard, Spyros Matsoukas, Richard Schwartz, John Makhoul, Omar F. Zaidan and Chris Callison-Burch, 2012. In Proceedings of NAACL 2012. [abstract] [bib]
@InProceedings{Zbib-etal:2012:NAACL,
author = {Rabih Zbib and Erika Malchiodi and Jacob Devlin and David Stallard and Spyros Matsoukas and Richard Schwartz and John Makhoul and Omar F. Zaidan and Chris Callison-Burch},
title = {Machine Translation of Arabic Dialects},
booktitle = {The 2012 Conference of the North American Chapter of the Association for Computational Linguistics},
month = {June},
year = {2012},
address = {Montreal},
publisher = {Association for Computational Linguistics},
url = {http://cs.jhu.edu/~ccb/publications/machine-translation-of-arabic-dialects.pdf}
}
Toward Statistical Machine Translation without Parallel Corpora. Alex Klementiev, Ann Irvine, Chris Callison-Burch, and David Yarowsky, 2012. In Proceedings of EACL 2012. [abstract] [bib]
@InProceedings{klementiev-etal:2012:EACL,
author = {Alex Klementiev and Ann Irvine and Chris Callison-Burch and David Yarowsky},
title = {Toward Statistical Machine Translation without Parallel Corpora},
booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for computational Linguistics},
month = {April},
year = {2012},
address = {Avignon, France}
publisher = {Association for Computational Linguistics},
}
Use of Modality and Negation in Semantically-Informed Syntactic MT. Kathryn Baker, Bonnie Dorr, Michael Bloodgood, Chris Callison-Burch, Wes Filardo, Christine Piatko, Lori Levin, and Scott Miller, 2012. Computational Linguistics, Vol. 38, No. 2, pages 411–438. [abstract] [bib]
@article{baker-etal:2012:CL,
author = {Kathryn Baker and Bonnie Dorr and Michael Bloodgood and Chris Callison-Burch and Nathaniel Filardo and Christine Piatko and Lori Levin and Scott Miller},
title = {Use of Modality and Negation in Semantically-Informed Syntactic MT},
journal = {Computational Linguistics},
year = {2012},
volume = {38},
number = {2},
pages = {411-438}
}
Findings of the 2011 Workshop on Statistical Machine Translation. Chris Callison-Burch, Philipp Koehn, Christof Monz, and Omar Zaidan, 2011. In Proceedings of Workshop on Statistical Machine Translation (WMT11). [abstract] [bib]
Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation. Juri Ganitkevitch, Chris Callison-Burch, Courtney Napoles, and Benjamin Van Durme, 2011. In Proceedings of EMNLP 2011. [abstract] [bib]
@InProceedings{ganitkevitch-EtAl:2011:EMNLP,
author = {Ganitkevitch, Juri and Callison-Burch, Chris and Napoles, Courtney and Van Durme, Benjamin},
title = {Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation},
booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing},
month = {July},
year = {2011},
address = {Edinburgh, Scotland, UK.},
publisher = {Association for Computational Linguistics},
pages = {1168--1179},
url = {http://www.aclweb.org/anthology/D11-1108}
}
Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity. Charley Chan, Chris Callison-Burch, and Benjamin Van Durme, 2011. In Proceedings of GEometrical Models of Natural Language Semantics (GEMS-2011). [abstract] [bib]
@InProceedings{chan-callisonburch-vandurme:2011:GEMS,
author = {Chan, Tsz Ping and Callison-Burch, Chris and Van Durme, Benjamin},
title = {Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity},
booktitle = {Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics},
month = {July},
year = {2011},
address = {Edinburgh, UK},
publisher = {Association for Computational Linguistics},
pages = {33--42},
url = {http://www.aclweb.org/anthology/W11-2504}
}
Joshua 3.0: Syntax-based Machine Translation with the Thrax Grammar Extractor Jonathan Weese, Juri Ganitkevitch, Chris Callison-Burch, Matt Post and Adam Lopez, 2011. In Proceedings of WMT11. [abstract] [bib]
@InProceedings{weese-EtAl:2011:WMT,
author = {Weese, Jonathan and Ganitkevitch, Juri and Callison-Burch, Chris and Post, Matt and Lopez, Adam},
title = {Joshua 3.0: Syntax-based Machine Translation with the Thrax Grammar Extractor},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
year = {2011},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {478--484},
url = {http://www.aclweb.org/anthology/W11-2160}
}
WikiTopics: What is Popular on Wikipedia and Why. Byung Gyu Ahn, Ben Van Durme and Chris Callison-Burch, 2011. In Proceedings of ACL Workshop on Automatic Summarization for Different Genres, Media, and Languages. [abstract] [bib]
@InProceedings{ahn-vandurme-callisonburch:2011:SummarizationWorkshop,
author = {Ahn, Byung Gyu and Van Durme, Benjamin and Callison-Burch, Chris},
title = {WikiTopics: What is Popular on Wikipedia and Why},
booktitle = {Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages},
month = {June},
year = {2011},
address = {Portland, Oregon},
publisher = {Association for Computational Linguistics},
pages = {33--40},
url = {http://www.aclweb.org/anthology/W11-0505}
}
Evaluating sentence compression: Pitfalls and suggested remedies. Courtney Napoles, Ben Van Durme, 2011. In Proceedings of Workshop on Monolingual Text-To-Text Generation (Text-To-Text-2011). [abstract] [bib]
@InProceedings{napoles-vandurme-callisonburch:2011:T2TW-2011,
author = {Napoles, Courtney and Van Durme, Benjamin and Callison-Burch, Chris},
title = {Evaluating Sentence Compression: Pitfalls and Suggested Remedies},
booktitle = {Proceedings of the Workshop on Monolingual Text-To-Text Generation},
month = {June},
year = {2011},
address = {Portland, Oregon},
publisher = {Association for Computational Linguistics},
pages = {91--97},
url = {http://www.aclweb.org/anthology/W11-1611}
}
Paraphrastic Sentence Compression with a Character-based Metric: Tightening without Deletion. Courtney Napoles, Chris Callison-Burch, Juri Ganitevitch, Ben Van Durme, 2011. In Proceedings of Workshop on Monolingual Text-To-Text Generation (Text-To-Text-2011). [abstract] [bib]
@InProceedings{napoles-EtAl:2011:T2TW-2011,
author = {Napoles, Courtney and Callison-Burch, Chris and Ganitkevitch, Juri and Van Durme, Benjamin},
title = {Paraphrastic Sentence Compression with a Character-based Metric: Tightening without Deletion},
booktitle = {Proceedings of the Workshop on Monolingual Text-To-Text Generation},
month = {June},
year = {2011},
address = {Portland, Oregon},
publisher = {Association for Computational Linguistics},
pages = {84--90},
url = {http://www.aclweb.org/anthology/W11-1610}
}
Paraphrase Fragment Extraction from Monolingual Comparable Corpora. Rui Wang and Chris Callison-Burch, 2011. In Proceedings of Fourth Workshop on Building and Using Comparable Corpora (BUCC). [abstract] [bib]
@InProceedings{wang-callisonburch:2011:BUCC,
author = {Wang, Rui and Callison-Burch, Chris},
title = {Paraphrase Fragment Extraction from Monolingual Comparable Corpora},
booktitle = {Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web},
month = {June},
year = {2011},
address = {Portland, Oregon},
publisher = {Association for Computational Linguistics},
pages = {52--60},
url = {http://www.aclweb.org/anthology/W11-1208}
}
The Arabic Online Commentary Dataset: An Annotated Dataset of Informal Arabic with High Dialectal Content. Omar Zaidan and Chris Callison-Burch, 2011. In Proceedings ACL-2011. [abstract] [bib] [data]
@InProceedings{zaidan-callisonburch:2011:ACL-HLT2011,
author = {Zaidan, Omar F. and Callison-Burch, Chris},
title = {The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {37--41},
url = {http://www.aclweb.org/anthology/P11-2007}
}
Crowdsourcing Translation: Professional Quality from Non-Professionals. Omar Zaidan and Chris Callison-Burch, 2011. In Proceedings ACL-2011. [abstract] [bib]
@InProceedings{zaidan-callisonburch:2011:ACL-HLT2011,
author = {Zaidan, Omar F. and Callison-Burch, Chris},
title = {Crowdsourcing Translation: Professional Quality from Non-Professionals},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {1220--1229},
url = {http://www.aclweb.org/anthology/P11-1122}
}
Incremental Syntactic Language Models for Phrase-based Translation. Lane Schwartz, Chris Callison-Burch, William Schuler and Stephen Wu, 2011. In Proceedings ACL-2011. [abstract] [bib] [errata]
@InProceedings{schwartz-EtAl:2011:ACL-HLT20111,
author = {Schwartz, Lane and Callison-Burch, Chris and Schuler, William and Wu, Stephen},
title = {Incremental Syntactic Language Models for Phrase-based Translation},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {620--631},
url = {http://www.aclweb.org/anthology/P11-1063}
}
Predicting Human-Targeted Translation Edit Rate via Untrained Human Annotators. Omar Zaidan and Chris Callison-Burch, 2010. In Proceedings NAACL-2010. [abstract] [bib]
@InProceedings{zaidan-callisonburch:2010:NAACLHLT,
author = {Zaidan, Omar F. and Callison-Burch, Chris},
title = {Predicting Human-Targeted Translation Edit Rate via Untrained Human Annotators},
booktitle = {Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
month = {June},
year = {2010},
address = {Los Angeles, California},
publisher = {Association for Computational Linguistics},
pages = {369--372},
url = {http://www.aclweb.org/anthology/N10-1057}
}
Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach. Kathryn Baker, Michael Bloodgood, Chris Callison-Burch, Bonnie Dorr, Scott Miller, Christine Piatko, Nathaniel W. Filardo, and Lori Levin, 2010. In Proceedings of AMTA-2010. [abstract] [bib]
@InProceedings{Baker-EtAl:2010:AMTA,
author = {Kathryn Baker and Michael Bloodgood and Chris Callison-Burch and Bonnie J. Dorr and Nathaniel W. Filardo and Lori Levin and Scott Miller and Christine Piatko},
title = {Semantically-Informed Machine Translation: A Tree-Grafting Approach},
booktitle = {Proceedings of The Ninth Biennial Conference of the Association for Machine Translation in the Americas},
address = {Denver, Colorado},
url = {http://www.mt-archive.info/AMTA-2010-Baker.pdf},
year = {2010}
}
Joshua 2.0: A Toolkit for Parsing-Based Machine Translationwith Syntax, Semirings, Discriminative Training and Other Goodies. Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Ann Irvine, Lane Schwartz, Wren N. G. Thornton, Ziyuan Wang, Jonathan Weese and Omar F. Zaidan, 2010. In Proceedings of Workshop on Statistical Machine Translation (WMT10). [abstract] [bib]
@InProceedings{li-EtAl:2010:WMT,
author = {Li, Zhifei and Callison-Burch, Chris and Dyer, Chris and Ganitkevitch, Juri and Irvine, Ann and Khudanpur, Sanjeev and Schwartz, Lane and Thornton, Wren and Wang, Ziyuan and Weese, Jonathan and Zaidan, Omar},
title = {Joshua 2.0: A Toolkit for Parsing-Based Machine Translation with Syntax, Semirings, Discriminative Training and Other Goodies},
booktitle = {Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR},
month = {July},
year = {2010},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {133--137},
url = {http://www.aclweb.org/anthology/W10-1718}
}
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation. Chris Callison-Burch, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki, Omar Zaidan, 2010. In Proceedings of Workshop on Statistical Machine Translation (WMT10). [abstract] [bib]
@InProceedings{callisonburch-EtAl:2010:WMT,
author = {Callison-Burch, Chris and Koehn, Philipp and Monz, Christof and Peterson, Kay and Przybocki, Mark and Zaidan, Omar},
title = {Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation},
booktitle = {Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR},
month = {July},
year = {2010},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {17--53},
url = {http://www.aclweb.org/anthology/W10-1703}
}
Large-Scale, Cost-Focused Active Learning for Statistical Machine Translation. Michael Bloodgood and Chris Callison-Burch, 2010. In Proceedings ACL-2010. [abstract] [bib]
@InProceedings{bloodgood-callisonburch:2010:ACL,
author = {Bloodgood, Michael and Callison-Burch, Chris},
title = {Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation},
booktitle = {Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics},
month = {July},
year = {2010},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {854--864},
url = {http://www.aclweb.org/anthology/P10-1088}
}
Creating Speech and Language Data With Amazon’s Mechanical Turk. Chris Callison-Burch and Mark Dredze, 2010. In Proceedings NAACL-2010 Workshop on Creating Speech and Language Data With Amazon’s Mechanical Turk. [abstract] [bib]
@InProceedings{callisonburch-dredze:2010:MTURK,
author = {Callison-Burch, Chris and Dredze, Mark},
title = {Creating Speech and Language Data With Amazon's Mechanical Turk},
booktitle = {Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk},
month = {June},
year = {2010},
address = {Los Angeles},
publisher = {Association for Computational Linguistics},
pages = {1--12},
url = {http://www.aclweb.org/anthology/W10-0701}
}
Using Mechanical Turk to Build Machine Translation Evaluation Sets. Michael Bloodgood and Chris Callison-Burch, 2010. In Proceedings NAACL-2010 Workshop on Creating Speech and Language Data With Amazon’s Mechanical Turk. [abstract] [bib]
@InProceedings{bloodgood-callisonburch:2010:MTURK,
author = {Bloodgood, Michael and Callison-Burch, Chris},
title = {Using Mechanical Turk to Build Machine Translation Evaluation Sets},
booktitle = {Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk},
month = {June},
year = {2010},
address = {Los Angeles},
publisher = {Association for Computational Linguistics},
pages = {208--211},
url = {http://www.aclweb.org/anthology/W10-0733}
}
Crowdsourced Accessibility: Elicitation of Wikipedia Articles. Scott Novotoney and Chris Callison-Burch, 2010. In Proceedings NAACL-2010 Workshop on Creating Speech and Language Data With Amazon’s Mechanical Turk. [abstract] [bib]
@InProceedings{novotney-callisonburch:2010:MTURK,
author = {Novotney, Scott and Callison-Burch, Chris},
title = {Crowdsourced Accessibility: Elicitation of Wikipedia Articles},
booktitle = {Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk},
month = {June},
year = {2010},
address = {Los Angeles},
publisher = {Association for Computational Linguistics},
pages = {41--44},
url = {http://www.aclweb.org/anthology/W10-0706}
}
Cheap Facts and Counter-Facts. Rui Wang and Chris Callison-Burch, 2010. In Proceedings NAACL-2010 Workshop on Creating Speech and Language Data With Amazon’s Mechanical Turk. [abstract] [bib]
@InProceedings{wang-callisonburch:2010:MTURK,
author = {Wang, Rui and Callison-Burch, Chris},
title = {Cheap Facts and Counter-Facts},
booktitle = {Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk},
month = {June},
year = {2010},
address = {Los Angeles},
publisher = {Association for Computational Linguistics},
pages = {163--167},
url = {http://www.aclweb.org/anthology/W10-0725}
}
Stream-based Translation Models for Statistical Machine Translation. Abby Levenberg, Chris Callison-Burch, and Miles Osborne, 2010. In Proceedings NAACL-2010. [abstract] [bib]
@InProceedings{levenberg-callisonburch-osborne:2010:NAACLHLT,
author = {Levenberg, Abby and Callison-Burch, Chris and Osborne, Miles},
title = {Stream-based Translation Models for Statistical Machine Translation},
booktitle = {Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
month = {June},
year = {2010},
address = {Los Angeles, California},
publisher = {Association for Computational Linguistics},
pages = {394--402},
url = {http://www.aclweb.org/anthology/N10-1062}
}
Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription. Scott Novotney and Chris Callison-Burch, 2010. In Proceedings NAACL-2010. [abstract] [bib]
@InProceedings{novotney-callisonburch:2010:NAACLHLT,
author = {Novotney, Scott and Callison-Burch, Chris},
title = {Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription},
booktitle = {Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
month = {June},
year = {2010},
address = {Los Angeles, California},
publisher = {Association for Computational Linguistics},
pages = {207--215},
url = {http://www.aclweb.org/anthology/N10-1024}
}
Integrating Output from Specialized Modules in Machine Translation: Transliteration in Joshua. Ann Irvine, Mike Kayser, Zhifei Li, Wren Thornton, and Chris Callison-Burch, 2010. In The Prague Bulletin of Mathematical Linguistics (PBML), Number 93, January 2010. [abstract] [bib]
@article{Irvine-EtAl:2010:PBML,
author = {Ann Irvine and Mike Kayser and Zhifei Li and Wren Thornton and Chris Callison-Burch },
title = {Integrating Output from Specialized Modules in Machine Translation: Transliteration in {J}oshua},
journal = {The Prague Bulletin of Mathematical Linguistics},
volume = {93},
pages = {107--116},
year = {2010}
}
Visualizing Data Structures in Parsing-Based Machine Translation. Jonathan Weese and Chris Callison-Burch, 2010. In The Prague Bulletin of Mathematical Linguistics (PBML), Number 93, January 2010. [abstract] [bib]
@article{Weese-CallisonBurch:2010:PBML,
author = {Jonathan Weese and Chris Callison-Burch},
title = {Visualizing Data Structures in Parsing-based Machine Translation},
journal = {The Prague Bulletin of Mathematical Linguistics},
volume = {93},
pages = {127--136},
year = {2010}
}
Hierarchical Phrase-Based Grammar Extraction in Joshua: Suffix Arrays and Prefix Trees. Lane Schwartz and Chris Callison-Burch, 2010. In The Prague Bulletin of Mathematical Linguistics (PBML), Number 93, January 2010. [abstract] [bib]
@article{Schwartz-CallisonBurch:2010:PBML,
author = {Lane Schwartz and Chris Callison-Burch },
title = {Hierarchical Phrase-Based Grammar Extraction in Joshua: Suffix Arrays and Prefix Tree},
journal = {The Prague Bulletin of Mathematical Linguistics},
volume = {93},
pages = {157--166},
year = {2010}
}
Semantically Informed Machine Translation (SIMT). Kathy Baker, Steven Bethard, Michael Bloodgood, Ralf Brown, Chris Callison-Burch, Glen Coppersmith, Bonnie Dorr, Wes Filardo, Kendall Giles, Anni Irvine, Mike Kayser, Lori Levin, Justin Martineau, Jim Mayfield, Scott Miller, Aaron Phillips, Andrew Philpot, Christine Piatko, Lane Schwartz and David Zajic. SCALE 2009 Summer Workshop Final Report. Tech report for the Human Language Technology Center Of Excellence (HLTCOE). [abstract] [bib]
@techreport{Baker-EtAl:2010:HLTCOE,
author = {Kathy Baker and Steven Bethard and Michael Bloodgood and Ralf Brown and Chris Callison-Burch and Glen Coppersmith and Bonnie Dorr and Wes Filardo and Kendall Giles and Anni Irvine and Mike Kayser and Lori Levin and Justin Martineau and Jim Mayfield and Scott Miller and Aaron Phillips and Andrew Philpot and Christine Piatko and Lane Schwartz and David Zajic},
title = {Semantically Informed Machine Translation},
address = {Human Language Technology Center of Excellence},
institution = {Johns Hopkins University, Baltimore, MD},
number = {002},
url = {http://web.jhu.edu/bin/u/l/HLTCOE-TechReport-002-SIMT.pdf},
year = {2010}
}
Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. Chris Callison-Burch, 2009. In Proceedings of EMNLP 2009. [abstract] [bib] [NPR]
@InProceedings{callisonburch:2009:EMNLP,
author = {Callison-Burch, Chris},
title = {Fast, Cheap, and Creative: Evaluating Translation Quality Using {Amazon's} {Mechanical Turk}},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
year = {2009},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {286--295},
url = {http://www.aclweb.org/anthology/D/D09/D09-1030}
}
Feasibility of Human-in-the-loop Minimum Error Rate Training. Omar Zaidan and Chris Callison-Burch, 2009. In Proceedings of EMNLP 2009. [abstract] [bib]
@InProceedings{zaidan-callisonburch:2009:EMNLP,
author = {Zaidan, Omar F. and Callison-Burch, Chris},
title = {Feasibility of Human-in-the-loop Minimum Error Rate Training},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
year = {2009},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {52--61},
url = {http://www.aclweb.org/anthology/D/D09/D09-1006}
}
Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases. Yuval Marton, Chris Callison-Burch and Philip Resnik, 2009. In Proceedings of EMNLP 2009. [abstract] [bib]
@InProceedings{marton-callisonburch-resnik:2009:EMNLP,
author = {Marton, Yuval and Callison-Burch, Chris and Resnik, Philip},
title = {Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
year = {2009},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {381--390},
url = {http://www.aclweb.org/anthology/D/D09/D09-1040}
}
Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences. Nikesh Garera, Chris Callison-Burch and David Yarowsky, 2009. In Proceedings of the Conference on Natural Language Learning (CoNLL). [poster] [abstract] [bib]
@InProceedings{garera-callisonburch-yarowsky:2009:CoNLL,
author = {Garera, Nikesh and Callison-Burch, Chris and Yarowsky, David},
title = {Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences},
booktitle = {Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)},
month = {June},
year = {2009},
address = {Boulder, Colorado},
publisher = {Association for Computational Linguistics},
pages = {129--137},
url = {http://www.aclweb.org/anthology/W09-1117}
}
Findings of the 2009 Workshop on Statistical Machine Translation. In Proceedings of Workshop on Statistical Machine Translation (WMT09). Chris Callison-Burch, Philipp Koehn, Christof Monz and Josh Schroeder, 2009. [slides] [abstract] [bib]
@InProceedings{callisonburch-EtAl:2009:WMT,
author = {Callison-Burch, Chris and Koehn, Philipp and Monz, Christof and Schroeder, Josh},
title = {Findings of the 2009 {W}orkshop on {S}tatistical {M}achine {T}ranslation},
booktitle = {Proceedings of the Fourth Workshop on Statistical Machine Translation},
month = {March},
year = {2009},
address = {Athens, Greece},
publisher = {Association for Computational Linguistics},
pages = {1--28},
url = {http://www.aclweb.org/anthology/W09-0401}
}
Joshua: An Open Source Toolkit for Parsing-based Machine Translation. Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Jonathan Weese and Omar Zaidan, 2009. In Proceedings of the Workshop on Statistical Machine Translation (WMT09). [slides] [keynote] [abstract] [bib]
@InProceedings{li-EtAl:2009:WMT1,
author = {Li, Zhifei and Callison-Burch, Chris and Dyer, Chris and Khudanpur, Sanjeev and Schwartz, Lane and Thornton, Wren and Weese, Jonathan and Zaidan, Omar},
title = {{Joshua}: An Open Source Toolkit for Parsing-Based Machine Translation},
booktitle = {Proceedings of the Fourth Workshop on Statistical Machine Translation},
month = {March},
year = {2009},
address = {Athens, Greece},
publisher = {Association for Computational Linguistics},
pages = {135--139},
url = {http://www.aclweb.org/anthology/W09-0424}
}
Decoding in Joshua: Open Source, Parsing-Based Machine Translation. Zhifei Li, Chris Callison-Burch, Sanjeev Khudanpur, and Wren Thornton, 2009. In The Prague Bulletin of Mathematical Linguistics (PBML), Number 91, January 2009. [abstract] [bib]
@article{Li-EtAl:2010:PBML,
author = {Lane Schwartz and Chris Callison-Burch },
title = {Hierarchical Phrase-Based Grammar Extraction in Joshua: Suffix Arrays and Prefix Tree},
journal = {The Prague Bulletin of Mathematical Linguistics},
volume = {91},
pages = {47--56},
year = {2009}
}
Syntactic Constraints on Paraphrases Extracted from Parallel Corpora. Chris Callison-Burch, 2008. In Proceedings of EMNLP 2008. [slides] [software] [abstract] [bib]
@InProceedings{callisonburch:2008:EMNLP,
author = {Callison-Burch, Chris},
title = {Syntactic Constraints on Paraphrases Extracted from Parallel Corpora},
booktitle = {Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing},
month = {October},
year = {2008},
address = {Honolulu, Hawaii},
publisher = {Association for Computational Linguistics},
pages = {196--205},
url = {http://www.aclweb.org/anthology/D08-1021}
}
ParaMetric: An Automatic Evaluation Metric for Paraphrasing. Chris Callison-Burch, Trevor Cohn, Mirella Lapata, 2008. In Proceedings of CoLing 2008. [slides] [keynote] [abstract] [bib]
@InProceedings{callisonburch-cohn-lapata:2008:Coling,
author = {Callison-Burch, Chris and Cohn, Trevor and Lapata, Mirella},
title = {ParaMetric: An Automatic Evaluation Metric for Paraphrasing},
booktitle = {Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)},
month = {August},
year = {2008},
address = {Manchester, UK},
publisher = {Coling 2008 Organizing Committee},
pages = {97--104},
url = {http://www.aclweb.org/anthology/C08-1013}
}
Further Meta-Evaluation of Machine Translation. In Proceedings of ACL-2008 Workshop on Statistical Machine Translation. Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz and Josh Schroeder, 2008. [slides] [abstract] [bib]
@InProceedings{callisonburch-EtAl:2008:WMT,
author = {Callison-Burch, Chris and Fordyce, Cameron and Koehn, Philipp and Monz, Christof and Schroeder, Josh},
title = {Further Meta-Evaluation of Machine Translation},
booktitle = {Proceedings of the Third Workshop on Statistical Machine Translation},
month = {June},
year = {2008},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {70--106},
url = {http://www.aclweb.org/anthology/W/W08/W08-0309}
}
Constructing Corpora for the Development and Evaluation of Paraphrase Systems. Trevor Cohn, Chris Callison-Burch, Mirella Lapata, 2008. Computational Linguistics: Volume 34, Number 4. [abstract] [bib]
@article{cohn-callisonburch-lapata:2008:CL,
author = {Trevor Cohn and Chris Callison-Burch and Mirella Lapata},
title = {Constructing Corpora for the Development and Evaluation of Paraphrase Systems},
journal = {Computational Linguistics},
year = {2008},
volume = {34},
number = {4},
pages = {597--614}
}
Affinity Measures based on the Graph Laplacian. Delip Rao, David Yarowsky, Chris Callison-Burch, 2008. In Proceedings of Proceedings of the 3rd Textgraphs workshop on Graph-based Algorithms for Natural Language Processing at CoLing 2008. [abstract] [bib]
@InProceedings{rao-yarowsky-callisonburch:2008:TG3,
author = {Rao, Delip and Yarowsky, David and Callison-Burch, Chris},
title = {Affinity Measures Based on the Graph {L}aplacian},
booktitle = {Coling 2008: Proceedings of the 3rd Textgraphs workshop on Graph-based Algorithms for Natural Language Processing},
month = {August},
year = {2008},
address = {Manchester, UK},
publisher = {Coling 2008 Organizing Committee},
pages = {41--48},
url = {http://www.aclweb.org/anthology/W08-2006}
}
Paraphrasing and Translation. Chris Callison-Burch, 2007. PhD Thesis, University of Edinburgh. [slides] [abstract] [bib]
Paraphrasing and translation have previously been treated as unconnected natural language processing tasks. Whereas translation represents the preservation of meaning when an idea is rendered in the words in a different language, paraphrasing represents the preservation of meaning when an idea is expressed using different words in the same language. We show that the two are intimately related. The major contributions of this thesis are as follows:
Whereas previous data-driven approaches to paraphrasing were dependent upon either data sources which were uncommon such as multiple translation of the same source text, or language specific resources such as parsers, our approach is able to harness more widely parallel corpora and can be applied to any language which has a parallel corpus. The technique was evaluated by replacing phrases with their paraphrases, and asking judges whether the meaning of the original phrase was retained and whether the resulting sentence remained grammatical. Paraphrases extracted from a parallel corpus with manual alignments are judged to be accurate (both meaningful and grammatical) 75% of the time, retaining the meaning of the original phrase 85% of the time. Using automatic alignments, meaning can be retained at a rate of 70%.
Being a language independent and probabilistic approach allows our method to be easily integrated into statistical machine translation. A paraphrase model derived from parallel corpora other than the one used to train the translation model can be used to increase the coverage of statistical machine translation by adding translations of previously unseen words and phrases. If the translation of a word was not learned, but a translation of a synonymous word has been learned, then the word is paraphrased and its paraphrase is translated. Phrases can be treated similarly. Results show that augmenting a state-of-the-art SMT system with paraphrases in this way leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs, we increase the coverage of unique test set unigrams from 48% to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches.
@PhdThesis{callisonburch:2007:thesis,
author = {Chris Callison-Burch},
title = {Paraphrasing and Translation},
school = {University of Edinburgh},
address = {Edinburgh, Scotland},
year = {2007},
url = {http://cs.jhu.edu/~ccb/publications/callison-burch-thesis.pdf}
}
(Meta-) Evaluation of Machine Translation. In Proceedings of ACL-2007 Workshop on Statistical Machine Translation. Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz and Josh Schroeder, 2007. [slides] [abstract] [bib]
@InProceedings{callisonburch-EtAl:2007:WMT,
author = {Callison-Burch, Chris and Fordyce, Cameron and Koehn, Philipp and Monz, Christof and Schroeder, Josh},
title = {(Meta-) Evaluation of Machine Translation},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
year = {2007},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {136--158},
url = {http://www.aclweb.org/anthology/W/W07/W07-0718}
}
Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Confusion Network Decoding. Philipp Koehn, Nicola Bertoldi, Ondrej Bojar, Chris Callison-Burch, Alexandra Constantin, Brooke Cowan, Chris Dyer, Marcello Federico, Evan Herbst, Hieu Hoang, Christine Moran, Wade Shen, and Richard Zens, 2007. CLSP Summer Workshop Final Report WS-2006, Johns Hopkins University. [abstract] [bib]
@techreport{Koehn-EtAl:2007:CLSP,
author = { Philipp Koehn and Nicola Bertoldi and Ondrej Bojar and Chris Callison-Burch and Alexandra Constantin and Brooke Cowan and Chris Dyer and Marcello Federico and Evan Herbst and Hieu Hoang and Christine Moran and Wade Shen and Richard Zens},
title = {Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Confusion Network Decoding. },
institution = {Johns Hopkins University},
number = {WS-2006},
type = {CLSP Summer Workshop Final Report},
year = {2007}
}
Moses: Open source toolkit for statistical machine translation, Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst, 2007. In Proceedings of the ACL-2007 Demo Session. [software] [abstract] [bib]
@InProceedings{koehn-EtAl:2007:PosterDemo,
author = {Koehn, Philipp and Hoang, Hieu and Birch, Alexandra and Callison-Burch, Chris and Federico, Marcello and Bertoldi, Nicola and Cowan, Brooke and Shen, Wade and Moran, Christine and Zens, Richard and Dyer, Chris and Bojar, Ondrej and Constantin, Alexandra and Herbst, Evan},
title = {Moses: Open Source Toolkit for Statistical Machine Translation},
booktitle = {Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions},
month = {June},
year = {2007},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {177--180},
url = {http://www.aclweb.org/anthology/P07-2045}
}
Paraphrase Substitution for Recognizing Textual Entailment. Wauter Bosma and Chris Callison-Burch, 2007. In Evaluation of Multilingual and Multimodal Information Retrieval, Lecture Notes in Computer Science, C. Peters et al editors. [abstract] [bib]
@InProceedings{bosma-callisonburch:2006:CLEF,
author = {Wauter Bosma and Chris Callison-Burch},
title = {Paraphrase Substitution for Recognizing Textual Entailment},
booktitle = {Proceedings of CLEF},
year = {2006}
url = {http://cs.jhu.edu/~ccb/publications/paraphrase-substitution-for-recognizing-textual-entailment.pdf}
}
Improved Statistical Machine Translation Using Paraphrases. Chris Callison-Burch, Philipp Koehn and Miles Osborne, 2006. In Proceedings NAACL-2006. [slides] [abstract] [bib]
@InProceedings{callisonburch-koehn-osborne:2006:HLT-NAACL06-Main,
author = {Callison-Burch, Chris and Koehn, Philipp and Osborne, Miles},
title = {Improved Statistical Machine Translation Using Paraphrases},
booktitle = {Proceedings of the Human Language Technology Conference of the NAACL, Main Conference},
month = {June},
year = {2006},
address = {New York City, USA},
publisher = {Association for Computational Linguistics},
pages = {17--24},
url = {http://www.aclweb.org/anthology/N/N06/N06-1003}
}
Re-evaluating the Role of Bleu in Machine Translation Research. Chris Callison-Burch, Miles Osborne and Philipp Koehn, 2006. In Proceedings of EACL-2006. [slides] [abstract] [bib]
@InProceedings{callisonburch-koehn-osborne:2006:HLT-NAACL06-Main,
author = {Callison-Burch, Chris and Osborne, Miles and Koehn, Philipp},
title = {Re-evaluating the Role of BLEU in Machine Translation Research},
booktitle = {11th Conference of the European Chapter of the Association for Computational Linguistics},
month = {April},
year = {2006},
address = {Trento, Italy},
publisher = {Association for Computational Linguistics},
pages = {249--256},
url = {http://aclweb.org/anthology-new/E/E06/E06-1032}
}
Constraining the Phrase-Based, Joint Probability Statistical Translation Model. Alexandra Birch, Chris Callison-Burch and Miles Osborne, 2006. In Proceedings of WMT06. [slides] [abstract] [bib]
@InProceedings{birch-EtAl:2006:WMT,
author = {Birch, Alexandra and Callison-Burch, Chris and Osborne, Miles and Koehn, Philipp},
title = {Constraining the Phrase-Based, Joint Probability Statistical Translation Model},
booktitle = {Proceedings on the Workshop on Statistical Machine Translation},
month = {June},
year = {2006},
address = {New York City},
publisher = {Association for Computational Linguistics},
pages = {154--157},
url = {http://www.aclweb.org/anthology/W/W06/W06-3123}
}
Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases. Chris Callison-Burch, Colin Bannard and Josh Schroeder, 2005. In Proceedings of ACL-2005. [slides] [abstract] [bib]
@InProceedings{callisonburch-bannard-schroeder:2005:ACL,
author = {Callison-Burch, Chris and Bannard, Colin and Schroeder, Josh},
title = {Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases},
booktitle = {Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05)},
month = {June},
year = {2005},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {255--262},
url = {http://www.aclweb.org/anthology/P05-1032},
}
Paraphrasing with Bilingual Parallel Corpora. Colin Bannard and Chris Callison-Burch, 2005. In Proceedings of ACL-2005. [slides] [abstract] [bib]
@InProceedings{bannard-callisonburch:2005:ACL,
author = {Bannard, Colin and Callison-Burch, Chris},
title = {Paraphrasing with Bilingual Parallel Corpora},
booktitle = {Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05)},
month = {June},
year = {2005},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {597--604},
url = {http://www.aclweb.org/anthology/P05-1074},
}
A Compact Data Structure for Searchable Translation Memories. Chris Callison-Burch, Colin Bannard and Josh Schroeder, 2005. In Proceedings of EAMT-2005. [slides] [abstract] [bib]
@InProceedings{callison-burch-EtAl:2005:EAMT,
author = {Chris Callison-Burch and Colin Bannard and Josh Schroeder},
title = {A Compact Data Structure for Searchable Translation Memories},
booktitle = {European Association for Machine Translation},
year = {2005}
}
Linear B System Description for the 2005 NIST MT Evaluation Exercise. Chris Callison-Burch, 2005. In Proceedings of Machine Translation Evaluation Workshop. [slides] [abstract] [bib]
@InProceedings{callisonburch:2005:NIST,
author = {Chris Callison-Burch },
title = {A Compact Data Structure for Searchable Translation Memories},
booktitle = {Proceedings of Machine Translation Evaluation Workshop},
year = {2005}
}
Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation. Philipp Koehn, Amittai Axelrod, Alexandra Birch Mayne, Chris Callison-Burch, Miles Osborne, and David Talbot, 2005. In Proceedings of International Workshop on Spoken Language Translation. [abstract] [bib]
@InProceedings{Koehn-EtAl:2005:IWSLT,
author = {Philipp Koehn and Amittai Axelrod and Alexandra Birch and Chris Callison-Burch and Miles Osborne and David Talbot and Michael White},
title = {Edinburgh System Description for the 2005 {IWSLT} Speech Translation Evaluation},
booktitle = {Proceedings of International Workshop on Spoken Language Translation},
year = {2005},
url = {http://cs.jhu.edu/~ccb/publications/iwslt05-report.pdf}
}
Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora. Chris Callison-Burch, David Talbot and Miles Osborne, 2004. In Proceedings of ACL-2004. [slides] [abstract] [bib]
@inproceedings{callisonburch-talbot-osborne:2004:ACL,
author = {Callison-Burch, Chris and Talbot, David and Osborne, Miles},
title = {Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora},
booktitle = {Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL'04), Main Volume},
year = {2004},
month = {July},
address = {Barcelona, Spain},
pages = {175--182},
url = {http://www.aclweb.org/anthology/P04-1023},
}
Searchable Translation Memories. In Proceedings of ASLIB Translating and the Computer 26. Chris Callison-Burch, Colin Bannard and Josh Schroeder, 2004. [slides] [abstract] [bib]
@inproceedings{Callison-Burch:2004:ASLIB,
author = {Chris Callison-Burch and Colin Bannard and Josh Schroeder},
title = {Searchable Translation Memories},
booktitle = {Proceedings of ASLIB Translating and the Computer 26},
year = {2004}
}
Improved Statistical Translation Through Editing. Chris Callison-Burch, Colin Bannard and Josh Schroeder, 2004. In European Association for Machine Translation (EAMT-2004) Workshop. [slides] [abstract] [bib]
@inproceedings{Callison-Burch-EtAl:2004:EAMT,
author = {Chris Callison-Burch and Colin Bannard and Josh Schroeder},
title = {Improved Statistical Translation Through Editing},
booktitle = {European Association for Machine Translation},
year = {2004}
}
Statistical Natural Language Processing Chris Callison-Burch and Miles Osborne, 2003. In A Handbook for Language Engineers Ali Farghaly, Editor. [abstract] [bib]
Statistical natural language processing (SNLP) is a field lying in the intersection of natural language processing and machine learning. SNLP differs from traditional natural language processing in that instead of having a linguist manually construct some model of a given linguistic phenomenon, that model is instead (semi-) automatically constructed from linguistically annotated resources. Methods for assigning partof-speech tags to words, categories to texts, parse trees to sentences, and so on, are (semi-) automatically acquired using machine learning techniques.
The recent trend of applying statistical techniques to natural language processing came largely from industrial speech recognition research groups at AT&T's Bell Laboratories and IBM's T.J. Watson Research Center. Statistical techniques in speech recognition have so vastly outstripped the performance of their non-statistical counterparts that rule-based speech recognition systems are essentially no longer an area of research. The success of machine learning techniques in speech processing led to an interest in applying them to a broader range of NLP applications. In addition to being useful from the perspective of producing high-quality results, as in speech recognition, SNLP systems are useful for a number of practical reasons. They are cheap and fast to produce, and they handle the wide variety of input required by a real-world application. SNLP is therefore especially useful in industry. In particular:
A common theme with many early SNLP systems was a pride in minimizing the amount of linguistic knowledge used in the system. For example, Fred Jelinek, the then leader of IBM's speech recognition research group, purportedly said, "Every time I fire a linguist, my performance goes up." The sentiment is rather shocking. Should Jelinek's statement strike fear into the hearts of all linguists reading this chapter? Is there a strong opposition between theoretical linguistics and SNLP? Will SNLP put linguists out of work?
We put forth a positive answer in this chapter: there is a useful role for linguistic expertise in statistical systems. Jelinek's infamous quote represents biases of the early days of SNLP. While a decade's worth of research has shown that SNLP can be an extremely powerful tool and is able to produce impressive results, recent trends indicate that using naive approaches that are divorced from linguistics can only go so far. There is therefore a revival of interest in integrating more sophisticated linguistic information into statistical models. For example, language models for speech recognition are moving from being word-based "ngram" models towards incorporating statistical grammars (Chelba and Jelinek, 1998, Charniak, 2001). So there is indeed a role for the linguist. This chapter will provide an entry point for linguists entering into the field of SNLP so that they may apply their expertise to enhance an already powerful approach to natural language processing.
Lest we represent SNLP as a completely engineering-oriented discipline, we point the interested reader to Abney (1996) which describes a number of ways in which SNLP might inform academic topics in linguistics. For example, SNLP can be useful for psycholinguistic research since systems typically encode graduated notions of well-formedness. This offers a more psychologically plausible alternative to the traditional binary grammatical/ungrammatical distinction. In a similarly academic vein, Johnson (1998) shows how Optimality Theory can be interpreted in terms of statistical models. This in turn suggests a number of interesting directions that OT might take.
The rest of this chapter is as follows: We begin by presenting a simple worked example designed to illustrate some of the aspects of SNLP in Section 1.2. After motivating the usefulness of SNLP, we then move onto the core methods used in SNLP: modeling, learning, data and evaluation (Sections 1.3, 1.4, 1.5, and 1.6 respectively). These core methods are followed by a brief review of some of the many applications of SNLP (Section 1.7). We conclude with a discussion (Section 1.8) where we make some comments about the current state of SNLP and possible future directions it might take.
@incollection{Callison-Burch2003b,
author = {Chris Callison-Burch and Miles Osborne},
title = {Statistical Natural Language Processing},
booktitle = {A Handbook for Language Engineers},
editor = {Ali Farghaly},
publisher = {CSLI},
year = {2003}
}
Bootstrapping Parallel Corpora. Chris Callison-Burch and Miles Osborne, 2003 In NAACL workshop "Building and Using Parallel Texts: Data Driven Machine Translation and Beyond". [slides] [abstract] [bib]
@inproceedings{CallisonBurch-Osborne:2003:PARALLEL,
author = {Callison-Burch, Chris and Osborne, Miles},
title = {Bootstrapping Parallel Corpora},
booktitle = {Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond},
editor = {Rada Mihalcea and Ted Pedersen},
url = {http://www.aclweb.org/anthology/W03-0310},
year = 2003,
pages = {44--49}
}
Co-training for Statistical Machine Translation. Chris Callison-Burch and Miles Osborne, 2003. In Proceedings of the 6th Annual CLUK Research Colloquium. [abstract] [bib]
@inproceedings{CallisonBurch-Osborne:2003:CLUK,
author = {Callison-Burch, Chris and Osborne, Miles},
title = {Co-Training For Statistical Machine Translation},
booktitle = {Proceedings of the 6th Annual CLUK Research Colloquium},
year = {2003}
}
Evaluating Question Answering Systems Using FAQ Answer Injection. Jochen Leidner and Chris Callison-Burch, 2003. In Proceedings of the 6th Annual CLUK Research Colloquium. [abstract] [bib]
@inproceedings{Leidner-CallisonBurch:2003:CLUK,
author = {Jochen L. Leidner and Chris Callison-Burch},
title = {Evaluating Question Answering Systems Using FAQ Answer Injection},
booktitle = {Proceedings of the 6th Annual CLUK Research Colloquium},
year = {2003}
}
Co-Training for Statistical Machine Translation. Chris Callison-Burch, 2002. Master's thesis, School of Informatics, University of Edinburgh. [slides] [abstract] [bib]
@MastersThesis{Callison-Burch2002,
author = {Chris Callison-Burch},
title = {Co-training for Statistical Machine Translation},
school = {University of Edinburgh},
year = {2002}
}
Upping the Ante for "Best of Breed" Machine Translation Providers. Chris Callison-Burch, 2001. In Proceedings of ASLIB Translating and the Computer 23, London, England. [abstract] [bib]
@inproceedings{Callison-Burch:2001:ASLIB,
title = {Upping the Ante for "Best of Breed" Machine Translation Providers},
author = {Chris Callison-Burch},
booktitle = {Proceedings of ASLIB Translating and the Computer 23},
year = {2001},
}
A program for automatically selecting the best output from multiple machine translation engines. Chris Callison-Burch and Raymond Flournoy, 2001. In Proceedings of the Machine Translation Summit VIII, Santiago de Compostela, Spain. [abstract] [bib]
@inproceedings{Callison-Burch-Flournoy:2001:MTSummit,
title = {A Program for Automatically Selecting the Best Output from Multiple Machine Translation Engines},
author = {Chris Callison-Burch and Raymond S. Flournoy},
booktitle = {Proceedings of the Machine Translation Summit VIII},
year = {2001},
}
Secondary Benefits of Feedback and User Interaction in Machine Translation Tools. Raymond Flournoy and Chris Callison-Burch, 2001. Workshop paper for "MT2010: Towards a Roadmap for MT" of the MT Summit VIII. [abstract] [bib]
@inproceedings{Flournoy-Callison-Burch:2001:MTSummit,
title = {Secondary Benefits of Feedback and User Interaction in Machine Translation Tools},
author = {Raymond S. Flournoy and Chris Callison-Burch},
booktitle = {Workshop paper for "MT2010: Towards a Roadmap for MT" of the MT Summit VIII},
year = {2001},
}
A Computer Model of a Grammar for English Questions. Chris Callison-Burch, 2000. Undergraduate thesis, Symbolic Systems Program, Stanford University. My undergraduate advisor was Ivan Sag. [handout] [abstract] [bib]
@MISC{Callison-Burch2000,
author = {Chris Callison-Burch},
title = {A Computer Model of a Grammar for English Questions},
school = {Stanford University},
address = {Palo Alto, California},
note = {Undergraduate honors thesis},
year = {2000}
}

Head of machine translation research at the HLTCOE

DARPA DEFT: Large-Scale Paraphrasing for Natural Language Understanding

EAGER: Combining natural language inference and data-driven paraphrasing

Computer Science Study Group phase 3: "Crowdsourcing Translation"

Acquisition and use of paraphrases in a knowledge-rich setting

Crowdsourcing Arabic Dialects

Computer Science Study Group phase 2: "BABEL: Bayesian Architecture Begetting Every Language"

Multi-level modeling of language and translation

EuroMatrixPlus: Bringing machine translation for European languages to the user

Translation of informal texts via Mechanical Turk

Global Autonomous Language Exploitation (GALE)

SCALE: Summer Camp for Applied Language Exploration

EuroMatrix: Statistical and hybrid machine translation between all European languages