Speaker: Teodor Marinov

\nAffiliation: Johns Hopkins Un
iversity

Title: Beyond Value-Function Gaps: Improved Instance-Depe ndent Regret Bounds for Episodic Reinforcement Learning

\nAbstract:< br />\nReinforcement Learning (RL) is a general scenario where agents inte ract with the environment to achieve some goal. The environment and an age ntâ€™s interactions are typically modeled as a Markov decision process (MDP) \, which can represent a rich variety of tasks. But\, for which MDPs can a n agent or an RL algorithm succeed? This requires a theoretical analysis o f the complexity of an MDP. In this talk I will present information-theore tic lower bounds for a large class of MDPs. The lower bounds are based on studying a certain semi-infinite LP. Further\, I will show that existing a lgorithms enjoy tighter gap-dependent regret bounds (similar to the stocha stic multi-armed bandit problem)\, however\, they are unable to achieve th e information-theoretic lower bounds even in deterministic transition MDPs \, unless there is a unique optimal policy.

DTSTART;TZID=America/New_York:20210310T120000 DTEND;TZID=America/New_York:20210310T130000 LOCATION:https://wse.zoom.us/j/91450299380 SEQUENCE:0 SUMMARY:[Theory Seminar] Teodor Marinov URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-teodor-mari nov/ X-COST-TYPE:free END:VEVENT END:VCALENDAR