When: Apr 27 2026 @ 12:00 PM
Where: 216 Hodson Hall
Categories:
Computer Science & CLSP Seminar Series.

Abstract

Recent “reasoning models” improve LLM performance by generating chains of thought before producing an answer, with the reasoning process itself optimized through reinforcement learning. This paradigm has delivered impressive results on math and coding benchmarks, but three fundamental challenges limit its broader applicability. First, most RL methods require a verifiable reward, which is easy when there is a single correct answer, but far less clear for open-ended tasks like writing or summarization. Second, reasoning traces are expensive: Longer chains mean slower training and slower inference, and models often overthink simple problems. Third, many real-world tasks involve very long inputs such as entire books or scientific papers, yet current reasoning methods were designed for short contexts. In this talk, Mirella Lapata argues that a single principle, reasoning by proxy, can address all three challenges. When a reward cannot be verified directly, a frozen language model’s perplexity reduction can serve as a proxy reward, enabling RL-trained reasoning for open-ended generation without any human labels. When full reasoning traces are too costly, compact latent embeddings can serve as proxy thoughts, enabling the model to “think silently” and nearly match full RL performance with 70–92% fewer tokens. And when the input is too long to reason over efficiently, a minimal informational subset can serve as a proxy context on which reasoning is learned and then transferred to the full input, allowing a small model to match one 10× its size. In each case, the key insight is the same: By learning on something cheaper that preserves the essential signal, we can bring RL-based reasoning to settings where direct approaches are intractable.

Speaker Biography

Mirella Lapata is a professor of natural language processing in the School of Informatics at the University of Edinburgh. Her research focuses on getting computers to understand, reason with, and generate natural language. She is the recipient of the 2025 British Computer Society (BCS) Lovelace Medal for Computing Research and was the inaugural winner of its Karen Spärck Jones Award. Lapata is a Fellow of the Royal Society of Edinburgh, the Association for Computational Linguistics (ACL), and Academia Europaea. She has received a European Research Council Consolidator Grant, a Royal Society Wolfson Research Merit Award, and a Turing AI World-Leading Researcher Fellowship. She served as president of the ACL Special Interest Group on Linguistic Data and Corpus-Based Approaches to Natural Language Processing in 2018 and has received multiple Best Paper Awards at leading NLP venues.

Zoom link »