When: Oct 17 2024 @ 10:30 AM
Where: B-17 Hackerman Hall
Categories:
Computer Science Seminar Series.

Refreshments are available starting at 10:30 a.m. The seminar will begin at 10:45 a.m.

Abstract

Prohibitive pretraining costs makes pretraining research a rare sight—however, this is not the case for analyzing, using, and fine-tuning those models. This talk focuses on one option to improve models in a scientific way, in small measurable steps; specifically, it introduces the concept of merging multiple fine-tuned/parameter-efficient fine-tuning models into one and discusses works tackling what we understand about it, how it works, more up-to-date methods, and how iteratively merging models may allow collaborative continual pretraining.

Speaker Biography

Leshem Choshen is a postdoctoral researcher at the Massachusetts Institute of Technology and IBM who aims to study model development openly and collaboratively, allow feasible pretraining research, and evaluate efficiently. To do so, they co-created model merging, TIES merging, the BabyLM Challenge. They were chosen for postdoctoral Rothschild and Fulbright fellowships and received a Best PhD Thesis Award from the Israeli Association for Artificial Intelligence, as well as a Blavatnik Prize for Computer Science. With broad natural language processing and machine learning interests, Choshen has also worked on reinforcement learning, understanding how neural networks learn, and Project Debater, the first machine system capable of holding a formal debate (as of 2019), which was featured on the cover of Nature.

Zoom link >>