When: Mar 12 2026 @ 10:30 AM
Where: 228 Malone Hall
Categories:
Computer Science Seminar Series.

Refreshments are available starting at 10:30 a.m. The seminar will begin at 10:45 a.m.

Abstract

Foundation models and generative AI are changing how we search and design molecules across chemistry, biology, and materials. However, progress is limited by a basic mismatch: chemical space is enormous (often estimated to exceed 10^{63} candidates), while labeled measurements are scarce (often only a few hundred to a few thousand per property). In addition, real applications require optimizing multiple, sometimes conflicting, properties at once, such as potency and toxicity for drugs or permeability and selectivity for gas-separation membranes.

In this talk, Gang Liu presents a data-to-discovery workflow for molecular virtual screening and a foundation model for inverse molecular design under multi-property constraints. First, he develops data-centric learning methods for small and imbalanced datasets. By learning interpretable subgraph rationales and using them for data augmentation and confidence-based self-training, his models improve prediction accuracy while giving structure-level explanations that scientists can validate. Second, Liu introduces Graph Diffusion Transformers (Graph DiTs) for multi-conditional molecular generation, and show how combining Graph DiTs with large language models leads to multimodal foundation models that can interleave text, molecules, and multi-step reactions for controllable design and retrosynthesis. Third, he translates these advances into practical tools and shared resources, including the open-source library torch-molecule and an open polymer challenge that connects machine learning researchers with domain scientists.

Liu concludes with case studies in sustainable materials—including gas-separation membranes, where these methods helped drive experimentally validated discoveries—and he outlines a roadmap toward multi-scale, multimodal molecular foundation models and agent systems that work in tighter loops with experiments.

Speaker Biography

Gang Liu is a fifth-year PhD student at the University of Notre Dame, working on generative AI and foundation models for molecular discovery. He has published as (co-)first author at the Conference on Neural Information Processing Systems (NeurIPS), the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, and the International Conference on Learning Representations, as well as in IEEE Transactions on Knowledge and Data Engineering, ACM Transactions on Knowledge Discovery from Data, and Cell Reports Physical Science. Liu’s work has been supported by an IBM PhD Fellowship Award and featured by MIT News, the University of Notre Dame’s College of Engineering, and Snap Research. He is the author of two books on deep learning for polymers and is the creator of torch-molecule, an open-source toolkit for molecular discovery. Liu led the NeurIPS 2025 Open Polymer Challenge, which attracted more than 10,000 registrations and 50,000 submissions from over 100 countries.

Zoom link »