BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Department of Computer Science - ECPv6.15.18//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Department of Computer Science
X-ORIGINAL-URL:https://www.cs.jhu.edu
X-WR-CALDESC:Events for Department of Computer Science
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20260308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20261101T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20270314T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20271107T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20260312T103000
DTEND;TZID=America/New_York:20260312T120000
DTSTAMP:20260415T050203
CREATED:20260225T214147Z
LAST-MODIFIED:20260225T214147Z
UID:1993564-1773311400-1773316800@www.cs.jhu.edu
SUMMARY:CS Seminar Series: Data-Centric Machine Learning and Foundation Models for Molecular Discovery
DESCRIPTION:Refreshments are available starting at 10:30 a.m. The seminar will begin at 10:45 a.m. \nAbstract\nFoundation models and generative AI are changing how we search and design molecules across chemistry\, biology\, and materials. However\, progress is limited by a basic mismatch: chemical space is enormous (often estimated to exceed 10^{63} candidates)\, while labeled measurements are scarce (often only a few hundred to a few thousand per property). In addition\, real applications require optimizing multiple\, sometimes conflicting\, properties at once\, such as potency and toxicity for drugs or permeability and selectivity for gas-separation membranes. \nIn this talk\, Gang Liu presents a data-to-discovery workflow for molecular virtual screening and a foundation model for inverse molecular design under multi-property constraints. First\, he develops data-centric learning methods for small and imbalanced datasets. By learning interpretable subgraph rationales and using them for data augmentation and confidence-based self-training\, his models improve prediction accuracy while giving structure-level explanations that scientists can validate. Second\, Liu introduces Graph Diffusion Transformers (Graph DiTs) for multi-conditional molecular generation\, and show how combining Graph DiTs with large language models leads to multimodal foundation models that can interleave text\, molecules\, and multi-step reactions for controllable design and retrosynthesis. Third\, he translates these advances into practical tools and shared resources\, including the open-source library torch-molecule and an open polymer challenge that connects machine learning researchers with domain scientists. \nLiu concludes with case studies in sustainable materials—including gas-separation membranes\, where these methods helped drive experimentally validated discoveries—and he outlines a roadmap toward multi-scale\, multimodal molecular foundation models and agent systems that work in tighter loops with experiments. \nSpeaker Biography\nGang Liu is a fifth-year PhD student at the University of Notre Dame\, working on generative AI and foundation models for molecular discovery. He has published as (co-)first author at the Conference on Neural Information Processing Systems (NeurIPS)\, the ACM SIGKDD Conference on Knowledge Discovery and Data Mining\, and the International Conference on Learning Representations\, as well as in IEEE Transactions on Knowledge and Data Engineering\, ACM Transactions on Knowledge Discovery from Data\, and Cell Reports Physical Science. Liu’s work has been supported by an IBM PhD Fellowship Award and featured by MIT News\, the University of Notre Dame’s College of Engineering\, and Snap Research. He is the author of two books on deep learning for polymers and is the creator of torch-molecule\, an open-source toolkit for molecular discovery. Liu led the NeurIPS 2025 Open Polymer Challenge\, which attracted more than 10\,000 registrations and 50\,000 submissions from over 100 countries. \nZoom link »
URL:https://www.cs.jhu.edu/event/cs-seminar-series-data-centric-machine-learning-and-foundation-models-for-molecular-discovery/
LOCATION:228 Malone Hall
CATEGORIES:Seminars and Lectures
END:VEVENT
END:VCALENDAR