Topic-Enhanced Models for Speech Recognition and Retrieval

Informal spoken content is being generated, stored, and shared on mind-boggling scales across the globe. Smart phones and social media, among other technologies, have enabled the creation of large volume repositories of user-generated, informal content in all spoken languages.

In spite of the lack of resources typically needed to train automatic speech recognition (ASR) systems in most languages, we are motivated by the robustness of topic information - content words, keywords, etc - in the presence of ASR errors. We explore ways in which topic information can be used to enable efficient recognition and retrieval of this wealth of spoken documents. From this we consider the interrelated concepts of locality, repetition, and ‘subject of discourse’ in the context of speech processing applications: ASR, speech retrieval (KWS), and topic identification of speech.

We can demonstrate how supervised and unsupervised models of topics, applicable to any language, can improve accuracy in accessing spoken content. In particular we focus on two complementary aspects of topic information in lexical content: local context - locality or repetition of word usage - and broad context - the typical ‘subject matter’ definition of a topic. By augmenting ASR language models with topic information we can demonstrate consistent improvements in both recognition and retrieval metrics.

We add locality to bags-of-words topic identification models, quantify the relationship between topic information and keyword retrieval, and consider word repetition both in terms of keyword based retrieval and language modeling. Finally, we combine these concepts and develop joint models of local and broad context via latent topic models.

Speaker Biography

Jonathan Wintrode received the A. B. degree cum laude in Computer Science from Harvard University in 2000, the M. S. degree in Computer Science from the Naval Postgraduate School in 2005, enrolled in the Computer Science Ph.D. program at Johns Hopkins University in 2010, and completed the M. S. E. degree in Computer Science in 2014. He won the Naval Postgraduate School Computer Science Department’s Outstanding Department of Defense Student Award in 2005. His research focuses on identifying and leveraging topic content for speech recognition and retrieval systems.