BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Department of Computer Science - ECPv5.12.3//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Department of Computer Science
X-ORIGINAL-URL:https://www.cs.jhu.edu
X-WR-CALDESC:Events for Department of Computer Science
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20190310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20191103T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20191105T104500
DTEND;TZID=America/New_York:20191105T114500
DTSTAMP:20220112T094615
CREATED:20210629T210720Z
LAST-MODIFIED:20210629T210720Z
UID:1962355-1572950700-1572954300@www.cs.jhu.edu
SUMMARY:CS Seminar: Michael Mahoney\, ICSI and Department of Statistics\, University of California at Berkeley – “Why Deep Learning Works: Traditional and Heavy-Tailed Implicit Self-Regularization in Deep Neural Networks”
DESCRIPTION:LocationHackerman Hall B-17AbstractWhile deep neural networks (DNNs) have achieved remarkable success in computer vision and natural language processing\, they are complex\, heavily-engineered\, often ad hoc systems\, and progress toward understanding why they work (arguably a prerequisite for using them in consumer-sensitive and scientific applications) has been much more modest.  To understand why deep learning works\, Random Matrix Theory (RMT) has been applied to analyze the weight matrices of DNNs\, including both production quality\, pre-trained models and smaller models trained from scratch.  Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of self-regularization\, implicitly sculpting a more regularized energy or penalty landscape.  Building on results in RMT\, most notably its extension to Universality classes of Heavy-Tailed matrices\, and applying them to these empirical results\, we develop a phenomenological theory to identify 5+1 Phases of Training\, corresponding to increasing amounts of implicit self-regularization.  For smaller and/or older DNNs\, this implicit self-regularization is like traditional Tikhonov regularization\, in that there appears to be a “size scale” separating signal from noise.  For state-of-the-art DNNs\, however\, we identify a novel form of heavy-tailed self-regularization\, similar to the self-organization seen in the statistical physics of disordered but strongly-correlated systems.  We will describe validating predictions of this theory; how this can explain the so-called generalization gap; and how one can use it to develop novel metrics that predict trends in generalization accuracies for pre-trained production-scale DNNs.  Coupled with work on energy landscape theory and heavy-tailed spin glasses\, it also provides an explanation of why deep learning works.BioMichael W. Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI).  He works on algorithmic and statistical aspects of modern large-scale data analysis.  Much of his recent research has focused on large-scale machine learning\, including randomized matrix algorithms and randomized numerical linear algebra\, geometric network analysis tools for structure extraction in large informatics graphs\, scalable implicit regularization methods\, computational methods for neural network analysis\, and applications in genetics\, astronomy\, medical imaging\, social network analysis\, and internet data analysis.  He received him PhD from Yale University with a dissertation in computational statistical mechanics\, and he has worked and taught at Yale University in the mathematics department\, at Yahoo Research\, and at Stanford University in the mathematics department.  Among other things\, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI)\, he was on the National Research Council’s Committee on the Analysis of Massive Data\, he co-organized the Simons Institute’s fall 2013 and 2018 programs on the foundations of data science\, he ran the Park City Mathematics Institute’s 2016 PCMI Summer Session on The Mathematics of Data\, and he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets.  He is currently the Director of the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley.VideoWatch seminar video.
URL:https://www.cs.jhu.edu/event/cs-seminar-michael-mahoney-icsi-and-department-of-statistics-university-of-california-at-berkeley-why-deep-learning-works-traditional-and-heavy-tailed-implicit-self-regularizati/
END:VEVENT
END:VCALENDAR