A Flexible, Configurable, Extensible Open Source Package for
Mass AI System Evaluation (currently for Machine Translation)
Latest Webpage Update:
MAISE is a package that allows researchers to evaluate the output of their AI system(s) using human judgments collected via Amazon Mechanical Turk (www.mturk.com). MAISE is open source, easy to run, and platform-independent. Most importantly, it has been proven to be completely bug-free. :-)
Amazon's Mechanical Turk (MTurk) is a virtual
marketplace that allows anyone to create and post tasks to be completed by
human workers around the globe. Each instance of those tasks, called a Human
Intelligence Task (
One such kind of data is human evaluation of systems that are attempting to do what humans are good at. For instance, if you construct several systems that perform automatic speech transcription (i.e. converting speech to text), and would like to know how well each of the systems performs, you could create HITs on MTurk that 'showcase' the transcriptions obtained by the different systems, and ask workers to tell you which ones they like and which ones they find inferior. Such human feedback would also be valuable because it would help identify systematic errors and guide future development of your system(s).
The same can be applied to a variety of tasks besides speech transcription, such as machine translation, object recognition, emotion detection, etc.
The aim of the MAISE package is to streamline the process of creating those evaluation tasks and uploading the relevant content to MTurk to be judged, without having to familiarize and involve oneself with the mechanics, if you will, of Mechanical Turk. This would allow you to spend more time worrying about improving your system rather than dealing with file input and output and MTurk's sometimes finicky interface.
Note: At the moment, MAISE is designed to aid the evaluation of machine translation (MT) systems. However, it can be used for other AI/ML tasks as well. Please see the FAQ.
MAISE's source code, instructions, documentation, and a tutorial are all included in the distribution.
MAISE is an open-source tool, licensed under the terms of the GNU Lesser General Public License (LGPL). Therefore, it is free for personal and scientific use by individuals and/or research groups. It may not be modified or redistributed, publicly or privately, unless the licensing terms are observed. If in doubt, contact the author for clarification and/or an explicit permission.
If you use MAISE in your work, please cite the software package and include the URL in your paper.
4. The Mechanics of MAISE (Abbreviated Version)
MAISE is quite easy to use. There are a couple of Java
programs to compile, but there is no need to install anything, mess with
environment variables, etc. Whenever MAISE needs to communicate with MTurk, it
will rely on MTurk's Java
You can use MAISE with any evaluation setup you like, as long
as you design the user interface for it. Currently, MAISE comes with existing
support for a particular evaluation setup that asks annotators to rank the
outputs of different systems relative to each other. When we say
"existing support" we mean the user interface is included, and so is
an analysis tool that can make sense of the judgments. This way, and you don't
need to do anything extra to obtain rankings of the systems. You can read more
about this evaluation setup in the overview papers of the Workshop on
Statistical Machine Translation (
Q: It looks like MAISE was written for machine translation, not general ML/AI tasks. Can I really use MAISE for my task?
A: True, when MAISE was being written (before it was even named MAISE), it was meant to aid evaluating MT systems. However, it can be used for other ML/AI tasks as well, but some of the supported features may not make sense for your task, and you'll have to pretend they're not there. Basically, you'll also have to trick MAISE, telling it silly things like which languages you're translating to and from, only because it expects you to tell it. It's fairly simple to do so, but I would be more than willing to help you get started if you're not sure how. (This is the main reason why MAISE is not in version 1.0 yet...I promise you MAISE v1.0 will not need to be tricked into thinking it's doing MT!)
Q: Why did you develop MAISE?
My advisor at JHU, Chris Callison-Burch, recruited me to help him run the
manual evaluation component of
Not yet. I first focused on getting the ranking task supported, since it's the
main evaluation setup we used in
Q: I have a feeling that you'd like to thank some people. So I'm going to let you...
Thanks for letting me thank some people. (See what I did there?) For starters,
I developed MAISE while I was funded by DARPA's GALE
Program, and in part by the EuroMatrixPlus Project. I
would like to thank Chris Callison-Burch, Ondrej Bojar, and everybody who gave feedback during the
Q: I'd like to offer you a job. Are you interested?
A: Yes! Check my website for my CV and publications, and drop me a line: firstname.lastname@example.org
Q: But you didn't ask me what kind of job it is...
A: I have questionable morals.
Note: Version changes in the first decimal place (e.g. v1.05 to v1.10) reflect significant changes, such as changes in functionality or use. Changes in the second decimal place (e.g. v1.23 to v1.24) reflect minor changes in the documentation, instructions, output, etc.
Added more MTurk functions.
Added more options to existing functions, particularly the Retriever module.