Oblique Sparse Projection Forest Optimization: A Faster Randomer Forest

James Browne, Johns Hopkins University
Host: Randal Burns

The proliferation of scientific and industrial sensors is causing an accelerating deluge of data the processing of which into actionable knowledge requires fast and accurate machine learning methods. A class of algorithms suited to process these large amounts of data is decision forests, widely used methods known for their versatility, state of the art inference, and fast model training. Oblique Sparse Projection Forests — OSPFs — are a subset of decision forests, which provide data inference superior to other methods. Despite providing state of the art inference and having a computational complexity similar to other popular decision forests, there are no SPF implementations that scale beyond trivially sized datasets.

We explore whether OSPF training and inference speeds can compete with other popular decision forest variants despite an algorithmic incompatibility which prevent OSPFs from using traditional forest training optimizations. First, using R, we implement a highly extensible proof of concept version of a recently conceived OSPF, Randomer Forest, shown to provide state of the art results on many datasets and provide this system for general use via CRAN. We then develop and implement a postprocessing method, Forest Packing, to pack the nodes of a trained forest into a novel data structure and modify the ensemble traversal method to accelerate forest based inferences. Finally, we develop FastRerF, an optimized version of Randomer Forest which dynamically performs forest packing during training.

The initial implementation in R provided training speeds inline with other decision forest systems and scaled better with additional resources, but used an excessive amount of memory and provided slow inference speeds. The development of Forest Packing increased inference throughput by almost an order of magnitude as compared to other systems while greatly reducing prediction latency. FastRerF model training is faster than other popular decision forest systems when using similar parameters and trains Random Forests faster than the current state of the art. Overall, we provide data scientists a novel OSPF system with R and Python front ends, which trains and predicts faster than other decision forest implementations.

Speaker Biography

James Browne received a Bachelors degree in Computer Science from the United States Military Academy at West Point in 2002. In 2012 he received a dual Masters of Science Degree in Computer Science and Applied Mathematics from the Naval Postgraduate School in Monterey California where he received the Rear Admiral Grace Murray Hopper Computer Science Award for excellence in computer science research. He enrolled in the Computer Science Ph.D. program at Johns Hopkins University in 2016 and, after graduation, will become an instructor in the Electrical Engineering and Computer Science Department at the United States Military Academy.