|
Juri GanitkevitchPhD StudentCenter for Language and Speech Processing Johns Hopkins University |
About Me
I'm a fourth-year PhD student at the Computer Science Department of the Johns Hopkins University. More
precisely, I work at the Center
for Language and Speech Processing.
My advisor is Chris Callison-Burch. I also frequently consult with Ben Van Durme, Matt Post, Alexandre Klementiev, and Adam Lopez.
Primarily my interest is in large-scale statistical natural language transformation models and their applications. This includes paraphrasing, shallow semantic parsing, text-to-text generation, entailment recognition, and machine translation. I work on grammar acquisition, as well as decoding approaches and algorithms. I'm also curious about efficient processing of vast amounts of data, particularly randomized and approximative algorithms, probabilistic data structures and online methods. I'm quite convinced that semi-supervised learning is a pretty good idea.
I did internships with the Google Translate team in Mountain View twice (Summers 2010 and 2011), and the Microsoft Research NLP group (Summer 2012). At Google I worked with Ashish Venugopal, David Talbot, and Jakob Uszkoreit. My project at MSR was in collaboration with Chris Quirk and Bill Dolan.
I hold a Master's degree in Computer Science from JHU as well as a Diplom (equivalent to a Master's) of Computer Science from RWTH Aachen University. My Diplom thesis project as well as some prior research assistant work was done at the Human Language Technology and Pattern Recognition Group, advised by Sasa Hasan and Hermann Ney.
I also spent a year as a visiting Master's student at the ENST/Télécom Paris and took a year off school to work with the Voice Technology Group at IBM Germany Research & Development as a full-time intern. While working on my Diplom thesis I held a part-time software engineering position with Nuance Communications in Aachen, Germany.
My legal name, as per my passport, is Jurij Ganitkevic. It's the result of an unfortunate transliteration accident and I much prefer the old spelling of my name that you see above. I continue to use it in publications, and generally wherever I can get away with it.
My advisor is Chris Callison-Burch. I also frequently consult with Ben Van Durme, Matt Post, Alexandre Klementiev, and Adam Lopez.
Primarily my interest is in large-scale statistical natural language transformation models and their applications. This includes paraphrasing, shallow semantic parsing, text-to-text generation, entailment recognition, and machine translation. I work on grammar acquisition, as well as decoding approaches and algorithms. I'm also curious about efficient processing of vast amounts of data, particularly randomized and approximative algorithms, probabilistic data structures and online methods. I'm quite convinced that semi-supervised learning is a pretty good idea.
I did internships with the Google Translate team in Mountain View twice (Summers 2010 and 2011), and the Microsoft Research NLP group (Summer 2012). At Google I worked with Ashish Venugopal, David Talbot, and Jakob Uszkoreit. My project at MSR was in collaboration with Chris Quirk and Bill Dolan.
I hold a Master's degree in Computer Science from JHU as well as a Diplom (equivalent to a Master's) of Computer Science from RWTH Aachen University. My Diplom thesis project as well as some prior research assistant work was done at the Human Language Technology and Pattern Recognition Group, advised by Sasa Hasan and Hermann Ney.
I also spent a year as a visiting Master's student at the ENST/Télécom Paris and took a year off school to work with the Voice Technology Group at IBM Germany Research & Development as a full-time intern. While working on my Diplom thesis I held a part-time software engineering position with Nuance Communications in Aachen, Germany.
My legal name, as per my passport, is Jurij Ganitkevic. It's the result of an unfortunate transliteration accident and I much prefer the old spelling of my name that you see above. I continue to use it in publications, and generally wherever I can get away with it.
Projects
- I'm involved in the Joshua decoder, an open-source statistical machine translation system developed at JHU and written in Java. We're trying to make it easily accessible. Have a go.
- I am one of the main contributors to Thrax, a sub-project of Joshua developed by Jonny Weese. Thrax is a Hadoop-based grammar extractor for SCFGs (it does both Hiero and grammars with rich syntactic labels). I'm chiefly responsible for its extensions towards paraphrase and distributional context signature extraction. It's open-source as well, so come and lend a hand.
- I also do some work on the cdec decoder, another open-source statistical machine translation system. This one is written by CMU's Chris Dyer.
- Feel free to check on my most recent misadventures on GitHub.
Publications
-
Monolingual Distributional Similarity for Text-to-Text Generation
J. Ganitkevitch, B. Van Durme, and C. Callison-Burch
In Proceedings of *SEM; Montreal, Canada, June 2012. - Joshua 4.0: Packing, PRO,
and Paraphrases
J. Ganitkevitch, Y. Cao, J. Weese, M. Post, and C. Callison-Burch
In Proceedings of the Seventh Workshop on Statistical Machine Translation; Montreal, Canada, June 2012. - Learning
Sentential Paraphrases from Bilingual Parallel Corpora for
Text-to-Text Generation
J. Ganitkevitch, C. Callison-Burch, C. Napoles, and B. Van Durme
In Proceedings of EMNLP; Edinburgh, United Kingdom, July 2011. - Watermarking the Outputs of Structured
Prediction with an Application in Statistical Machine Translation
A. Venugopal, J. Uszkoreit, D. Talbot, F. Och, and J. Ganitkevitch
In Proceedings of EMNLP; Edinburgh, United Kingdom, July 2011. - Paraphrastic Sentence Compression with
a Character-based Metric: Tightening without Deletion
C. Napoles, C. Callison-Burch, J. Ganitkevitch and B. Van Durme
In Proceedings of Workshop on Monolingual Text-To-Text Generation; Portland, USA, June 2011. - Joshua 3.0: Syntax-based Machine
Translation with the Thrax Grammar Extractor
J. Weese, J. Ganitkevitch, C. Callison-Burch, M. Post, and A. Lopez
In Proceedings of the Sixth Workshop on Statistical Machine Translation; Edinburgh, United Kingdom, July 2011. - cdec: A Decoder, Alignment, and
Learning Framework for Finite-State and Context-Free Translation
Models
C. Dyer, A. Lopez, J. Ganitkevitch, J. Weese, F. Ture, P. Blunsom, H. Setiawan, V. Eidelman, and P. Resnik
In Proceedings of ACL, Software Demonstrations; Uppsala, Sweden, July 2010. - Joshua 2.0: A Toolkit for
Parsing-Based Machine Translation with Syntax, Semirings,
Discriminative Training and Other Goodies
Z. Li, C. Callison-Burch, C. Dyer, J. Ganitkevitch, A. Irvine, L. Schwartz, W. Thornton, Z. Wang, J. Weese, and O. Zaidan
In Proceedings of the Fifth Workshop on Statistical Machine Translation; Uppsala, Sweden, July 2010. - An Enriched MT
Grammar for Under $100
O. Zaidan and J. Ganitkevitch
In Proceedings of the Workshop on Creating Speech and Language Data With Amazon's Mechanical Turk; Los Angeles, USA, June 2010. - Demonstration of Joshua: An
Open Source Toolkit for Parsing-Based Machine Translation
Z. Li, C. Callison-Burch, C. Dyer, J. Ganitkevitch, S. Khudanpur, L. Schwartz, W. Thronton, J. Weese, and O. Zaidan
In Proceedings of ACL/IJCNLP, Software Demonstrations; Suntec, Singapore, August 2009. - Joshua: An Open Source
Toolkit for Parsing-Based Machine Translation
Z. Li, C. Callison-Burch, C. Dyer, J. Ganitkevitch, S. Khudanpur, L. Schwartz, W. Thronton, J. Weese, and O. Zaidan
In Proceedings of the Fourth Workshop on Statistical Machine Translation; Athens, Greece, March 2009. - Triplet Lexicon Models
for Statistical Machine Translation
Sasa Hasan, Juri Ganitkevitch, Hermann Ney, and J. Andrés-Ferrer
Proceedings of EMNLP; Honolulu, Hawaii, October 2008. - Speaker Adaptation using
Maximum Likelihood Linear Regression
Juri Ganitkevitch
Seminar paper at RWTH Aachen University; Aachen, Germany, Summer 2005.
2012
2011
2010
2009
2008
2005
Contact
My email address is juri at CS dot JHU dot edu. You can also
follow my rather unprofessional musings on Twitter.