Reinforcement Learning Repository at MSU

Demos and Implementation (Domains)

This section contains programs which demonstrate reinforcement learning in action, as an illustration of the concepts and common algorithms. These programs might provide a useful starting place for the implementation of reinforcement learning to solve real problems and advance research in this area. Wherever possible, source code is included.

Please note that use of this software is restricted; you must read this license agreement and agree to its terms before downloading any software from this site. Downloading the software is considered consent to the terms.

If you would like to contribute source code or make suggestions for improvement of what is included here, send to Natalia Hernandez or Sridhar Mahadevan.


  • Cart-Pole Problem
    Simulation of the cart and pole dynamic system and a procedure for learning to balance the pole. Both are described in Barto, Sutton, and Anderson, "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems," IEEE Trans. Syst., Man, Cybern., Vol. SMC-13, pp. 834--846, Sept.--Oct. 1983 Written by Rich Sutton.
    Source code: cpole.tar (16 K, requires C compiler)

  • Cell Phone
    Interactive Java demonstration illustrating the improvement gained by applying RL to the problem of Dynamic Channel Allocation in Cellular Telephones, by Satinder Singh at the University of Colorado
  • Elevator
    Fortran simulation of an elevator, written by James Lewis, and provided by Christos Cassandras at UMASS ECE Dept. The reinforcement learning addition to the elevator simulation was implemented by Bob Crites, CS Dept. UMass. and John McNulty
    and is described in the paper Improving Elevator Performance Using Reinforcement Learning. Source code: elevator.tar.gz (284 K) or elevator.tar (814 K). Both require a C compiler and the f2c library to convert Fortran to c, as it incorporates c random number handling routines).

  • Grid World This program is a simulation of learning the goal of moving to a user-defined square of a grid. It uses Q-learning, and was written by Sridhar Mahadevan.
    Source code:grid.tar (72 K; requires C compiler and X11 libraries)

  • Machine Maintenance
    CSIM simulation of a production system which integrates SMART, a model-free average-reward algorithm, to determine the optimal machine maintenance policy. It was written by Nicholas Marchalleck and Abhijit Gosavi, and is described in Self-Improving Factory Simulation using Continuous-Time Average-Reward Reinforcement Learning by Mahadevan et. al.
    Source code: maint.tar (268 K; requires CSIM v.17 and C++ compiler)
  • MDP Q-learning: implements Q-learning on a given MDP, using semi-uniform exploration.
    Source code: mdp-q.tar (64 K, requires GNU C compiler)
  • Mountain-Car Problem:
    Simulation of a car learning the proper acceleration to get up a mountain. It uses Q-learning with CMAC as a function approximator. It is described in (among other papers) Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding by Rich Sutton, and was developed by Sridhar Mahadevan.
    Source code: mcar.tar (157K; requires X11 libraries and C++ compiler)

  • Network Routing: Demonstrates a RL network-routing algorithm written by Justin Boyan and Michael Littman. Described in Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach
    Advances in Neural Information Processing Systems (Postscript - 155KB)
    Source code: network-router.tar (222 K); requires C compiler, wish windowing shell (part of Tcl)
  • Proposed Standard for Reinforcement Learning Software This standard, developed by Rich Sutton and Juan Carlos Santamaria, is intended to facilitate RL research and development, and is available for C++ and Common Lisp.