BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Department of Computer Science - ECPv5.12.3//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Department of Computer Science
X-ORIGINAL-URL:https://www.cs.jhu.edu
X-WR-CALDESC:Events for Department of Computer Science
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20200308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20201101T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20201015T104500
DTEND;TZID=America/New_York:20201015T120000
DTSTAMP:20220112T093414
CREATED:20210629T210724Z
LAST-MODIFIED:20210629T210724Z
UID:1962485-1602758700-1602763200@www.cs.jhu.edu
SUMMARY:CS Seminar Series: Carlee Joe-Wong\, Carnegie Mellon University\, Carnegie Mellon University – “Optimizing the Cost of Distributed Learning”
DESCRIPTION:LocationZoom link: https://wse.zoom.us/j/95467665624AbstractAs machine learning models are trained on ever-larger and more complex datasets\, it has become standard to distribute this training across multiple physical computing devices. Such an approach offers a number of potential benefits\, including reduced training time and storage needs due to parallelization. Distributed stochastic gradient descent (SGD) is a common iterative framework for training machine learning models: in each iteration\, local workers compute parameter updates on a local dataset. These are then sent to a central server\, which aggregates the local updates and pushes global parameters back to local workers to begin a new iteration. Distributed SGD\, however\, can be expensive in practice: training a typical deep learning model might require several days and thousands of dollars on commercial cloud platforms. Cloud-based services that allow occasional worker failures (e.g.\, locating some workers on Amazon spot or Google preemptible instances) can reduce this cost\, but may also reduce the training accuracy. We quantify the effect of worker failure and recovery rates on the model accuracy and wall-clock training time\, and show both analytically and experimentally that these performance bounds can be used to optimize the SGD worker configurations. In particular\, we can optimize the number of workers that utilize spot or preemptible instances. Compared to heuristic worker configuration strategies and standard on-demand instances\, we dramatically reduce the cost of training a model\, with modest increases in training time and the same level of accuracy. Finally\, we discuss implications of our work for federated learning environments\, which use a variant of distributed SGD. Two major challenges in federated learning are unpredictable worker failures and a heterogeneous (non-i.i.d.) distribution of data across the workers\, and we show that our characterization of distributed SGD’s performance under worker failures can be adapted to this setting.BioCarlee Joe-Wong is an Assistant Professor of Electrical and Computer Engineering at Carnegie Mellon University. She receivedher A.B.\, M.A.\, and Ph.D. degrees from Princeton University in 2011\, 2013\, and 2016\, respectively. Dr. Joe-Wong’sresearch is in optimizing networked systems\, particularly on applying machine learning and pricing to data and computing networks. From 2013 to 2014\, she was the Director of Advanced Research at DataMi\, a startup she co-founded from her Ph.D. research on mobiledata pricing. She has received a few awards for her work\, including the ARO Young Investigator Award in 2019\, the NSF CAREER Award in 2018\, and the INFORMS ISS Design Science Award in 2014.Carlee will be available for a Q&A after her talk until 1 PM.HostDepartment of Computer ScienceVideoWatch seminar video.
URL:https://www.cs.jhu.edu/event/cs-seminar-series-carlee-joe-wong-carnegie-mellon-university-carnegie-mellon-university-optimizing-the-cost-of-distributed-learning/
END:VEVENT
END:VCALENDAR