The big data revolution has profoundly changed, among many other things, how we perceive business, research, and application. However, in order to fully realize the potential of big data, certain computational and statistical challenges need to be addressed. In this talk, I will present my research in facilitating the deployment of machine learning methodologies and algorithms in big data applications. I will first present robust methods that are capable of accounting for uncertain or abnormal observations. Then I will present a generic regularization scheme that automatically extracts compact and informative representations from heterogeneous, multi-modal, multi-array, time-series, and structured data. Next, I will discuss two gradient algorithms that are computationally very efficient for our regularization scheme, and I will mention their theoretical convergence properties and computational requirements. Finally, I will present a distributed machine learning framework that allows us to process extremely large-scale datasets and models. I conclude my talk by sharing some future directions that I am and will be pursuing.
Yaoliang Yu is currently a research scientist affiliated with the center for machine learning and health, and the machine learning department of Carnegie Mellon University. He obtained his PhD (under Dale Schuurmans and Csaba Szepesvari) in computing science from University of Alberta (Canada, 2013), and he received the PhD Dissertation Award from the Canadian Artificial Intelligence Association in 2015.