BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//128.220.13.64//NONSGML kigkonsult.se iCalcreator 2.20//
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Johns Hopkins Algorithms and Complexity
X-WR-CALDESC:
X-FROM-URL:https://www.cs.jhu.edu/~mdinitz/theory
X-WR-TIMEZONE:America/New_York
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:STANDARD
DTSTART:20211107T020000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
RDATE:20221106T020000
TZNAME:EST
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20220313T020000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:ai1ec-356@www.cs.jhu.edu/~mdinitz/theory
DTSTAMP:20220520T005549Z
CATEGORIES:
CONTACT:
DESCRIPTION:Speaker: Teodor Marinov\nAffiliation: Johns Hopkins University
\nTitle: Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bo
unds for Episodic Reinforcement Learning\nAbstract:\nReinforcement Learnin
g (RL) is a general scenario where agents interact with the environment to
achieve some goal. The environment and an agent’s interactions are typica
lly modeled as a Markov decision process (MDP)\, which can represent a ric
h variety of tasks. But\, for which MDPs can an agent or an RL algorithm s
ucceed? This requires a theoretical analysis of the complexity of an MDP.
In this talk I will present information-theoretic lower bounds for a large
class of MDPs. The lower bounds are based on studying a certain semi-infi
nite LP. Further\, I will show that existing algorithms enjoy tighter gap-
dependent regret bounds (similar to the stochastic multi-armed bandit prob
lem)\, however\, they are unable to achieve the information-theoretic lowe
r bounds even in deterministic transition MDPs\, unless there is a unique
optimal policy.
DTSTART;TZID=America/New_York:20210310T120000
DTEND;TZID=America/New_York:20210310T130000
LOCATION:https://wse.zoom.us/j/91450299380
SEQUENCE:0
SUMMARY:[Theory Seminar] Teodor Marinov
URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-teodor-mari
nov/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\nSpeaker: Teod
or Marinov

\nAffiliation: Johns Hopkins University

\nTitle: Bey
ond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Epi
sodic Reinforcement Learning

\nAbstract:

\nReinforcement Learni
ng (RL) is a general scenario where agents interact with the environment t
o achieve some goal. The environment and an agent’s interactions are typic
ally modeled as a Markov decision process (MDP)\, which can represent a ri
ch variety of tasks. But\, for which MDPs can an agent or an RL algorithm
succeed? This requires a theoretical analysis of the complexity of an MDP.
In this talk I will present information-theoretic lower bounds for a larg
e class of MDPs. The lower bounds are based on studying a certain semi-inf
inite LP. Further\, I will show that existing algorithms enjoy tighter gap
-dependent regret bounds (similar to the stochastic multi-armed bandit pro
blem)\, however\, they are unable to achieve the information-theoretic low
er bounds even in deterministic transition MDPs\, unless there is a unique
optimal policy.

\n
END:VEVENT
END:VCALENDAR