Comparing Value-Function Estimation Algorithms in Undiscounted Problems
Beleznay, Ferenc , Tamas Grobler, Csaba SzepesvariComparing Value-Function Estimation Algorithms in Undiscounted Problems
unpublished
( gzipped Postscript - 104 )
Abstract: We compare scaling properties of several value-function estimation algorithms.
In particular, we prove that Q-learning can scale exponentially slowly with the number of states.
We identify the reasons of the slow convergence and show that both TD($\lambda$) and learning with a fixed learning-rate enjoy rather fast convergence, just like the model-based method.