Shaping in Reinforcement Learning by Changing the Physics of the Problem
Randlov, JetteShaping in Reinforcement Learning by Changing the Physics of the Problem
ICML-2000
( gzipped Postscript - 65 )
Abstract: Children learn to ride a bicycle by using training wheels.
They are actually trying to learn one task (riding without
training wheels) by training another one. In general,
solving a difficult problem can be facilitated by training
other problems. This is the basic idea of shaping. It is
essential to ensure that spending time on the modified
task will help solving the original one. In this paper we
prove that given a finite MDP with a limited reward signal
and gamma < 1, we are guaranteed that if a series of
tasks converge to the original one then the optimal value
function converges to the original one as well.