Shaping in Reinforcement Learning by Changing the Physics of the Problem

Randlov, Jette
Shaping in Reinforcement Learning by Changing the Physics of the Problem
ICML-2000 ( gzipped Postscript - 65 )

Abstract: Children learn to ride a bicycle by using training wheels. They are actually trying to learn one task (riding without training wheels) by training another one. In general, solving a difficult problem can be facilitated by training other problems. This is the basic idea of shaping. It is essential to ensure that spending time on the modified task will help solving the original one. In this paper we prove that given a finite MDP with a limited reward signal and gamma < 1, we are guaranteed that if a series of tasks converge to the original one then the optimal value function converges to the original one as well.