A Learning Rate Analysis of Reinforcement Learning Algorithms in Fine-Horizon
Garcia, Frédérick , Seydina NdiayeA Learning Rate Analysis of Reinforcement Learning Algorithms in Fine-Horizon
ICML'98
( gzipped Postscript - 96 KB )
Abstract: In this article we consider the particular framework of non-stationary
finite-horizon Markov Decision Processes. After establishing a
relationship between the finite-horizon total reward criterion
and the average-reward criterion in finite-horizon, we define
QH-learning and RH-learning for finite-horizon MDPs. Then we
introduce the Ordinary Differential Equation (ODE) method to
conduct a learning rate analysis of QH-learning and RH-learning.
RH-learning appears to be a version of QH-learning with matrix-valued
stepsizes, the corresponding gain matrix being very close to the
optimal matrix which results from the ODE analysis. Experimental
results confirm that performance hierarchy.