MAISE (MT)
A Flexible, Configurable, Extensible Open Source
Package for
Mass AI System Evaluation (currently for Machine Translation)
Department of
Computer Science
and
The Center for
Language and Speech Processing
Latest Release:
Latest Webpage Update:
1. Overview
MAISE is a package that allows researchers to evaluate the
output of their AI system(s) using human judgments collected via
Amazon Mechanical Turk (www.mturk.com).
MAISE is open source, easy to run, and platform-independent. Most importantly, it has
been proven to be completely bug-free. :-)
2. Description
Amazon's Mechanical Turk (MTurk) is a virtual
marketplace that allows anyone to create and post tasks to be completed by
human workers around the globe. Each instance of those tasks, called a Human
Intelligence Task (
One such kind of data is human evaluation of systems that are
attempting to do what humans are good at. For instance, if you construct
several systems that perform automatic speech transcription (i.e. converting
speech to text), and would like to know how well each of the systems performs,
you could create HITs on MTurk that 'showcase' the
transcriptions obtained by the different systems, and ask workers to tell you
which ones they like and which ones they find inferior. Such human feedback
would also be valuable because it would help identify systematic errors and
guide future development of your system(s).
The same can be applied to a variety of tasks besides speech
transcription, such as machine translation, object recognition, emotion
detection, etc.
The aim of the MAISE package is to streamline the
process of creating those evaluation tasks and uploading the relevant content
to MTurk to be judged, without having to familiarize and involve oneself with
the mechanics, if you will, of Mechanical Turk. This would allow you to spend
more time worrying about improving your system rather than dealing with file
input and output and MTurk's sometimes finicky
interface.
Note: At the moment, MAISE is designed to aid the evaluation of machine
translation (MT) systems. However, it can be used for other AI/ML tasks
as well. Please see the FAQ.
3. Download, Licensing, and Citation
MAISE's source code, instructions,
documentation, and a tutorial are all included in the distribution.
MAISE is an open-source tool, licensed under the terms of the
GNU Lesser General Public License (LGPL). Therefore, it is free for personal
and scientific use by individuals and/or research groups. It may not be
modified or redistributed, publicly or privately, unless the licensing terms
are observed. If in doubt, contact the author for clarification and/or an
explicit permission.
If you use MAISE in your work, please cite the software package
and include the URL in your paper.
4. The Mechanics of MAISE (Abbreviated
Version)
MAISE is quite easy to use. There are a couple of Java
programs to compile, but there is no need to install anything, mess with
environment variables, etc. Whenever MAISE needs to communicate with MTurk, it
will rely on MTurk's Java
You can use MAISE with any evaluation setup you like, as long
as you design the user interface for it. Currently, MAISE comes with existing
support for a particular evaluation setup that asks annotators to rank the
outputs of different systems relative to each other. When we say
"existing support" we mean the user interface is included, and so is
an analysis tool that can make sense of the judgments. This way, and you don't
need to do anything extra to obtain rankings of the systems. You can read more
about this evaluation setup in the overview papers of the Workshop on
Statistical Machine Translation (
5. FAQ
Q:
It looks like MAISE was written for machine translation, not general ML/AI
tasks. Can I really use MAISE for my task?
A:
True, when MAISE was being written (before it was even named MAISE), it was meant to aid evaluating MT systems. However, it can
be used for other ML/AI tasks as well, but some of the supported features may
not make sense for your task, and you'll have to pretend they're not there.
Basically, you'll also have to trick MAISE, telling it silly things like which
languages you're translating to and from, only because it expects you to tell
it. It's fairly simple to do so, but I would be more than willing to help you
get started if you're not sure how. (This is the main reason why MAISE is not
in version 1.0 yet...I promise you MAISE v1.0 will not need to be tricked into
thinking it's doing MT!)
Q:
Why did you develop MAISE?
A:
My advisor at JHU, Chris Callison-Burch, recruited me to help him run the
manual evaluation component of
Q:
The
A:
Not yet. I first focused on getting the ranking task supported, since it's the
main evaluation setup we used in
Q:
I have a feeling that you'd like to thank some people. So I'm going to let
you...
A:
Thanks for letting me thank some people. (See what I did there?) For starters,
I developed MAISE while I was funded by DARPA's GALE
Program, and in part by the EuroMatrixPlus Project. I
would like to thank Chris Callison-Burch, Ondrej Bojar, and everybody who gave feedback during the
Q:
I'd like to offer you a job. Are you interested?
A:
Yes! Check my website for my CV and publications, and drop me a line: ozaidan@cs.jhu.edu
Q:
But you didn't ask me what kind of job it is...
A:
I have questionable morals.
6. History
Note: Version changes in the first decimal place (e.g. v1.05
to v1.10) reflect significant changes, such as changes in functionality
or use. Changes in the second decimal place (e.g. v1.23 to v1.24)
reflect minor changes in the documentation, instructions, output, etc.
v0.20 (
Added
more MTurk functions.
Added
more options to existing functions, particularly the Retriever module.
v0.10 (
Initial release!
(coming soon)