Kernel-Based Reinforcement Learning

Ormoneit, Dirk , Saunak Sen
Kernel-Based Reinforcement Learning
Department of Statistics, Stanford University, Technical Report No. 1999-8 (Postscript - 240 KB )

Abstract: Kernel-based methods have recently attracted increased attention in the machine learning literature as reliable tools to attack regression and classification tasks. In this work, we consider a kernel-based approach to reinforcement learning that will be shown to produce a consistent estimate of the true value function in a continuous Markov Decision Process. Typically, consistency cannot be obtained using parametric value function estimates such as neural networks. As further contributions, we derive the asymptotic distribution of the kernel-based estimate and establish optimal convergence rates. The asymptotic distribution is then used to derive a formula for the asymptotic bias inherent in the kernel-based approximation. In spite of the fact that reinforcement learning is generally biased due to the involved maximum operator, this is the first theoretical result in this spirit to our knowledge. The suggested bias formulas may serve as the basis for bias correction techniques that can be used in practice to improve the estimate of the value function.