Welcome to my homepage. Like most pages out there, you could say that this page
is perpetually "under construction." That said, it is actually rather up-to-date:
NEW: I have uploaded my PhD thesis here.
You'll find a PDF of my dissertation as well as a brief summary. And for those interested, I have created
a video of my defense talk, which you can view here. Let me know
if you have any comments or questions!
NEW: I have accepted a job offer from Microsoft Research!! I will be joining the Machine Translation team as a senior software
engineer upon graduation.
Two new datasets are now available.
The first dataset
is a recreation of the English side of NIST's Urdu-English 2009 evaluation set,
created by crowdsourcing the translation task.
(This data accompanies
my ACL 2011 paper
on crowdsourcing translation.)
The second dataset
is a collection of dialectal annotations on Arabic sentences.
(This data accompanies
my ACL 2011 short paper
describing the AOC (see below) and the related dialect annotation effort.)
I crawled the websites of three Arabic newspapers and extracted reader commentary
on their articles. The resulting Arabic Online Commentary dataset contains 3.1M segments,
corresponding to 52.1M words, much of which is in dialectal
Arabic, making this a unique Arabic dataset. The distribution also includes the body texts of the articles
themselves, which is an additional 1.4M segments, corresponding to 42.5M words (of mostly
MSA Arabic).
Check it out here.
I wrote a software package called MAISE, which makes it easy
to use Amazon's Mechanical Turk for human evaluation of AI/ML systems.
This package was used to run the manual evaluation component of the Workshop on Machine Translation
(in 2010 and 2011).
Check it out here.
If you are looking for our data and papers on rationales, click here.
If you are looking for the Z-MERT software I wrote, click here.
If you are looking for my RYPT material (aka human in the loop MERT), click here.
I am interested in machine learning in general, especially applied to natural language
processing (NLP) tasks.
This has led me to collaborate with people like
Jason Eisner
and my advisor, Chris Callison-Burch.
I study ways whereby we can better utilize human
annotators and their knowledge, and I develop interesting(?) models that take advantage
of such human knowledge. In applying these techniques, I've focused on text classification
and on machine translation, but I try to formulate those models so that they are
as general and abstract as possible.
Other than NLP, I am interested in almost anything that allows machines to perform
tasks that we (humans) are really good at, like computer vision, pattern recognition,
game playing, and artificial intelligence in general. Basically anything that would
get a non-CS person to go, "whoa...you can make my computer do that?"
You sound cool. What is your geek code?
I don't have a geek code. I'm working on it though, I promise.