Probably Approximately Correct Robust Policy Search with Applications to Mobile Robotics

This thesis studies the problem of designing reliable control laws of robotic systems operating in uncertain environments. We tackle this issue by using stochastic optimization to iteratively refine the parameters of a control law from a fixed policy class, otherwise known as policy search. We introduce several new approaches to stochastic policy optimization based on probably approximately correct (PAC) bounds on the expected performance of control policies. These algorithms, referred to as PAC Robust Policy Search (PROPS), directly minimize an upper confidence bound on the expected cost of trajectories instead of employing a standard approach based on the expected cost itself. We compare the performance of PROPS to that of existing policy search algorithms on a set of challenging robot control scenarios in simulation: a car with side slip and a quadrotor navigating through obstacle-ridden environments. We show that the optimized bound accurately predicts future performance and results in improved robustness measured by lower average cost and lower probability of collision.

Next, we develop a technique for using robot motion trajectories to create a high quality stochastic dynamics model that is then leveraged in simulation to train control policies with associated performance guarantees. We demonstrate the idea by collecting dynamics data from a 1/5 scale agile ground vehicle, fitting a stochastic dynamics model, and training a policy in simulation to drive around an oval track at up to 6.5 m/s while avoiding obstacles. We show that the control policy can be transferred back to the real vehicle with little loss in predicted performance. Furthermore, we show empirically that simulation-derived performance guarantees transfer to the actual vehicle when executing a policy optimized using a deep stochastic dynamics model fit to vehicle data.

Finally, we develop an actor-critic variation of the PROPS algorithm which allows the use of both episode-based and step-based evaluation and sampling strategies. This variation of PROPS is more data efficient and is expected to compute higher quality policies faster. We empirically evaluate the algorithm in simulation on a challenging robot navigation task using a high-fidelity deep stochastic model of an agile ground vehicle and on a benchmark set of continuous control tasks. We compare its performance to the original trajectory-based PROPS.

Speaker Biography

Matt Sheckells received his B.S. in Computer Science and Physics from the Johns Hopkins University in 2014. He stayed at Johns Hopkins to complete his Ph.D., receiving the Computer Science Department Dean’s Fellowship and the WSE-APL Fellowship. His research in the Autonomous Systems, Control, and Optimization Lab focused on planning and controls for robotic systems, including flying vehicles and high-speed, off-road vehicles. During his Ph.D., Matt worked as a teaching assistant and lectured for the Applied Optimal Control and Non-linear Control and Planning in Robotics courses. As part of JHU’s Team CoSTAR, Matt won the 2016 KUKA Innovation Award. Then, he spent the summer of 2016 working as a Software Intern at Zoox, Inc.