Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes

Wang, Gang , Sridhar Mahadevan
Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes
International Conference on Machine Learning (ICML-99) ( gzipped Postscript - 250 )

Abstract: Manufacturing is a challenging real-world domain for applying MDP-based reinforcement learning algorithms. In this paper we focus on a class of ``coupled'' semi-Markov decision processes (SMDPs), which arise in a large number of manufacturing problems. The nature of the interaction is more complex than problems previously considered: the available actions and their transition probabilities in each sub-SMDP are affected by other neighboring sub-SMDPs. This interaction defeats the standard approach of solving each sub-SMDP in parallel by a reinforcement learning algorithm, and putting the solutions together. A novel approach is presented whereby each sub-SMDP is solved explicitly taking into account the diffferent modes of interaction, and a greedy algorithm is used to combine these policies into a solution of the overall problem. We present detailed experimental results for a 12-machine transfer line. In particular, we use an average-reward algorithm for semi-Markov decision processes called SMART for learning low-cost policies for different sub-SMDPs, and a greedy algorithm for combining the base level policies to obtain an overall policy for running the transfer line. We show that the hierarchical approach is not only much faster than a "flat" algorithm, but also outperforms two well-known heuristics for running transfer lines used in many factories today.