Finding Consensus In Speech Recognition

Lidia Mangu, Johns Hopkins University

Most state-of-the-art speech recognizers output word lattices as a compact representation of a set of alternate hypotheses. We introduce a new framework for distilling information from word lattices to improve the accuracy of the recognition output and obtain a more perspicuous representation of a set of alternative hypotheses. The motivation for our approach comes from the mismatch between the commonly used evaluation method, namely the number of correct words in the output string, and the standard maximum aposteriori probability hypothesis selection paradigm. Our technique is shown to improve the recognition accuracy on two standard corpora.

We then show that our algorithm can be used as an efficient lattice compression technique. Its success comes from the ability to discard low probability words and recombine the remaining ones to create a new set of hypotheses. In essence, our method is an estimator of word posterior probabilities, and as such could benefit a number of other tasks like word spotting and confidence annotation. Also, the new representation of a set of candidate hypotheses provides a new framework for finding linguistic cues and applying them to disambiguating the word-level confusion sets.