Max-norm Projections for Factored MDPs

Guestrin, Carlos , Daphne Koller, Ronald Parr
Max-norm Projections for Factored MDPs
AAAI Spring Symposium, Stanford, California, March 2001 (Postscript - 323KB )

Abstract: Markov Decision Processes (MDPs) provide a coherent mathematical framework for planning under uncertainty. However, exact MDP solution algorithms require the manipulation of a \emph{value function}, which specifies a value for each state in the system. Most real-world MDPs are too large for such a representation to be feasible, preventing the use of exact MDP algorithms. Various approximate solution algorithms have been proposed, many of which use a linear combination of basis functions to provide a compact approximation to the value function. Almost all of these algorithms use an approximation based on the (weighted) $\cL_2$-norm (Euclidean distance); this approach prevents the application of standard convergence results for MDP algorithms, all of which use max-norm. This paper makes two contributions. First, it presents the first approximate MDP solution algorithms --- both value and policy iteration --- that use max-norm projection, thereby directly optimizing the quantity required to obtain the best error bounds. Second, it shows how these algorithms can be applied efficiently in the context of \emph{factored MDPs}, where the transition model is specified using a dynamic Bayesian network and actions may be taken sequentially or in parallel.