Reinforcement Learning With Continuous Action Values
Dimitrakakis, ChristosReinforcement Learning With Continuous Action Values
unpublished
( gzipped Postscript - 120KB )
Abstract: The problem of reinforcement learning in the case of a continuous action set
remains largely unsolved. This paper offers a possible solution by attempting to model
softmax action selection in the continuous case through the use of a distribution whose
moments are modified using the TD-error update. Appropriate updates for all moments of
the distribution are derived and an actor-critic implementation of the method is described.
The effectiveness of this approach is demonstrated by a set of experiments.