Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes
Wang, Gang , Sridhar MahadevanHierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes
International Conference on Machine Learning (ICML-99)
( gzipped Postscript - 250 )
Abstract: Manufacturing is a challenging real-world domain for applying
MDP-based reinforcement learning algorithms. In this paper we focus on
a class of ``coupled'' semi-Markov decision processes (SMDPs), which
arise in a large number of manufacturing problems. The nature of the
interaction is more complex than problems previously considered: the
available actions and their transition probabilities in each sub-SMDP
are affected by other neighboring sub-SMDPs. This interaction defeats
the standard approach of solving each sub-SMDP in parallel by a
reinforcement learning algorithm, and putting the solutions
together. A novel approach is presented whereby each sub-SMDP is
solved explicitly taking into account the diffferent modes of
interaction, and a greedy algorithm is used to combine these policies
into a solution of the overall problem. We present detailed
experimental results for a 12-machine transfer line. In particular,
we use an average-reward algorithm for semi-Markov decision processes
called SMART for learning low-cost policies for different sub-SMDPs,
and a greedy algorithm for combining the base level policies to obtain
an overall policy for running the transfer line. We show that the
hierarchical approach is not only much faster than a "flat" algorithm,
but also outperforms two well-known heuristics for running transfer
lines used in many factories today.