Twenty Questions with Noisy Answers for Object Detection and Tracking

Raphael Sznitman, Johns Hopkins University

In the traditional “twenty questions” game, the task at hand is to determine a fact or target location, by sequentially asking a knowledgeable oracle questions. This problem has been extensively studied in the past, and results on optimal questioning strategies are well understood. In this thesis however, we consider the case where the answers from the oracle are corrupted with noise from a known model. With this problem occurring both in nature and in a number of computer vision applications (i.e. object detection and localization, tracking, image registration) the goal then is to determine some policy, or sequence of questions, that reduces the uncertainty on the target location as much as possible.

We begin by presenting a Bayesian formulation of a simple and idealized parameter estimation problem. Starting with a prior distribution on the parameter, principles in dynamic programming and information theory can be used to characterize an optimal policy when minimizing the expected entropy of the distribution of the parameter. We then show the existence of a simple greedy policy that is globally optimal. Given these results, we describe a series of stochastic optimization algorithms that embody the noisy twenty questions game paradigm in the context of computer vision. These algorithms are referred to as: Active Testing. We describe the benefit of using this technique in two real-world applications: (i) face detection and localization, and (ii) tool tracking during retinal microsurgery. In the first application, we show that substantial computational gains over existing approaches are achieved when localizing faces in images. In the second, we tackle a much more challenging real-world application where one must find the position and orientation of a surgical tool during surgery. Our approach provides a new and innovative way to perform fast and reliable tool tracking in cases when the tool moves in and out of the field of view often.