The Spatial Inductive Bias of Deep Learning

Benjamin R. Mitchell, Johns Hopkins University

In the past few years, Deep Learning has become the method of choice for producing state-of-the-art results on machine learning problems involving images, text, and speech. The explosion of interest in these techniques has resulted in a large number of successful applications, but relatively few studies exploring the nature of and reason for that success.

This dissertation is an examination of the inductive biases that underpin the success of deep learning, focusing in particular on the success of Convolutional Neural Networks (CNNs) on image data. We show that CNNs rely on a type of spatial structure being present in the data, and then describe ways this type of structure can be quantified. We further demonstrate that a similar type of inductive bias can be explicitly introduced into a variety of other techniques, including non-connectionist ones. The result is both a better understanding of why deep learning works, and a set of tools that can be used to improve the performance of a wide range of machine learning tools on these tasks.

Speaker Biography

Benjamin R. Mitchell received a B.A. in Computer Science from Swarthmore College in 2005, and a M.S.E. in Computer Science from the Johns Hopkins University in 2008. He received a certification from the JHU Preparing Future Faculty Teaching Academy in 2016.

He has worked as a Teaching Assistant and a Research Assistant from 2005 to 2008, and he has been an Instructor at the Johns Hopkins University since 2009. He has taught courses including Introductory Programming in Java, Intermediate Programming in C/C++, Artificial Intelligence, and Computer Ethics. In 2015, he received the Professor Joel Dean Award for Excellence in Teaching, and he was a finalist for the Whiting School of Engineering Excellence in Teaching Award in 2016.

In addition to the field of machine learning, he has peer-reviewed publications in fields including operating systems, mobile robotics, medicalrobotics, and semantic modeling.