Share Email Print

Proceedings Paper

Novel reinforcement learning approach for difficult control problems
Author(s): Georges A. Becus; Edward A. Thompson
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

We review work conducted over the past several years and aimed at developing reinforcement learning architectures for solving difficult control problems and based on and inspired by associative control process (ACP) networks. We briefly review ACP networks able to reproduce many classical instrumental conditioning test results observed in animal research and to engage in real-time, closed-loop, goal-seeking interactions with their environment. Chronologically, our contributions include the ideally interfaced ACP network which is endowed with hierarchical, attention, and failure recognition interface mechanisms which greatly enhanced the capabilities of the original ACP network. When solving the cart-pole problem, it achieves 100 percent reliability and a reduction in training time similar to that of Baird and Klopf's modified ACP network and additionally an order of magnitude reduction in number of failures experienced for successful training. Next we introduced the command and control center/internal drive (Cid) architecture for artificial neural learning systems. It consists of a hierarchy of command and control centers governing motor selection networks. Internal drives, similar hunger, thirst, or reproduction in biological systems, are formed within the controller to facilitate learning. Efficiency, reliability, and adjustability of this architecture were demonstrated on the benchmark cart-pole control problem. A comparison with other artificial learning systems indicates that it learns over 100 times faster than Barto, et al's adaptive search element/adaptive critic element, experiencing less failures by more than an order of magnitude while capable of being fine-tuned by the user, on- line, for improved performance without additional training. Finally we present work in progress on a 'peaks and valleys' scheme which moves away from the one-dimensional learning mechanism currently found in Cid and shows promises in solving even more difficult learning control problems such as the truck backer-upper.

Paper Details

Date Published: 26 September 1997
PDF: 12 pages
Proc. SPIE 3208, Intelligent Robots and Computer Vision XVI: Algorithms, Techniques, Active Vision, and Materials Handling, (26 September 1997); doi: 10.1117/12.290288
Show Author Affiliations
Georges A. Becus, Univ. of Cincinnati (United States)
Edward A. Thompson, Univ. of Cincinnati (United States)

Published in SPIE Proceedings Vol. 3208:
Intelligent Robots and Computer Vision XVI: Algorithms, Techniques, Active Vision, and Materials Handling
David P. Casasent, Editor(s)

© SPIE. Terms of Use
Back to Top