A Learning Rate Analysis of Reinforcement Learning Algorithms in Fine-Horizon

Garcia, Frédérick , Seydina Ndiaye
A Learning Rate Analysis of Reinforcement Learning Algorithms in Fine-Horizon
ICML'98 ( gzipped Postscript - 96 KB )

Abstract: In this article we consider the particular framework of non-stationary finite-horizon Markov Decision Processes. After establishing a relationship between the finite-horizon total reward criterion and the average-reward criterion in finite-horizon, we define QH-learning and RH-learning for finite-horizon MDPs. Then we introduce the Ordinary Differential Equation (ODE) method to conduct a learning rate analysis of QH-learning and RH-learning. RH-learning appears to be a version of QH-learning with matrix-valued stepsizes, the corresponding gain matrix being very close to the optimal matrix which results from the ODE analysis. Experimental results confirm that performance hierarchy.