The objective of this project is to develop a highly adaptive method for hand-eye-head coordinated object manipulation system in an unstructured environment without requiring a complete description of the world and explicit camera calibration. Unlike conventional robot systems that depend very much on the availability of accurate global position of the manipulator and objects, the system under development learns, through interactive visual feedback, the unknown and nonlinear relationships among the sensed objects, the hand, and the visual sensors. The robot manipulator is equipped with a visual recognition system called SHOSLIF-O, which is used to recognize the objects and the robot hand from images. In the learning phase, several hierarchical networks are automatically generated to map the image-plane space to 3-D space and arm control space. In the performance phase, the network controls the manipulator to learn to perform some tasks, such as find an object to be detected, find the grasping position of the object, find the 3-D location of the object, move the arm to the object detected, pick up the object, move an object to another object, pour a cup of water into another cup, etc. Each of the above action is coded as a task command. The human user issues a single command for the machine to learn each task. At the performance phase, the human user issues a series of commands and the machine execute each task one at a time according to what it learned for each task.