The task of query-by-example search is to retrieve, from among a collection of data, the observations most similar to a given query. A common approach to this problem is based on viewing the data as vertices in a graph in which edge weights reflect similarities between observations. Errors arise in this graph-based framework both from errors in measuring these similarities and from approximations required for fast retrieval. In this thesis, we use tools from graph inference to analyze and control the sources of these errors. We establish novel theoretical results related to representation learning and to vertex nomination, and use these results to control the effects of model misspecification, noisy similarity measurement and approximation error on search accuracy. We present a state-of-the-art system for query-by-example audio search in the context of low-resource speech recognition, which also serves as an illustrative example and testbed for applying our theoretical results.
Keith Levin is a Ph.D. candidate in Computer Science at Johns Hopkins University, where he works on graph inference, with applications to speech processing and neuroscience. Keith received B.S. degrees in Psychology and Linguistics from Northeastern University in 2011. Prior to joining Johns Hopkins University, he worked as a data analyst at BBN Technologies.