Recurrent Neural Networks for Representing, Segmenting, and Classifying Surgical Activities

Robot-assisted surgery has enabled scalable, transparent capture of high-quality data during operation, and this has in turn led to many new research opportunities. Among these opportunities are those that aim to improve the objectivity and efficiency of surgical training, which include making performance assessment and feedback more objective and consistent; providing more specific or localized assessment and feedback; delegating this responsibility to machines, which have the potential to provide feedback in any desired abundance; and having machines go even further, for example by optimizing practice routines, in the form of a virtual coach. In this thesis, we focus on a foundation that serves all of these objectives: automated surgical activity recognition, or in other words the ability to automatically determine what activities a surgeon is performing and when those activities are taking place.

First, we introduce the use of recurrent neural networks (RNNs) for localizing and classifying surgical activities from motion data. Here, we show for the first time that this task is possible at the level of maneuvers, which unlike the activities considered in prior work are already a part of surgical training curricula. Second, we investigate unsupervised learning using surgical motion data: we show that predicting future motion from past motion with RNNs, using motion data alone, leads to meaningful and useful representations of surgical motion. This approach leads to the discovery of surgical activities from unannotated data, and to state-of-the-art performance for querying a database of surgical activity using motion-based queries. Finally, we depart from a common yet limiting assumption in nearly all prior work on surgical activity recognition: that annotated training data, which is difficult and expensive to acquire, is available in abundance. We demonstrate for the first time that both gesture recognition and maneuver recognition are feasible even when very few annotated sequences are available; and that future-prediction based representation learning, prior to the recognition phase, yields significant performance improvements when annotated data is scarce.

Speaker Biography

Robert DiPietro is a PhD student at Johns Hopkins University in the Computer Science Department, where he is advised Gregory D. Hager. His current research focuses on unsupervised representation learning and data-efficient segmentation and classification for time-series data, primarily within the domain of robot-assisted surgery. Before joining Hopkins, Robert obtained his BS in applied physics and his MS in electrical engineering at Northeastern University, and worked for 3 years as an associate research staff member at MIT Lincoln Laboratory.