An advantage of the supervised learning model is that it is well-defined. It is detailed enough to be useful but simple enough to be analyzed. Details of specific applications are abstracted away. The model has been criticized as an artificial and limited model of learning, however, amounting to nothing more than nonlinear regression—a way to fit a function to a set of data points. Indeed, in many practical applications neural networks are used mainly for function approximation and nothing more is asked. Perhaps the major limitation is the requirement for a teacher to specify in detail the correct output for each and every input. This is not how people learn to walk, for example.
Obviously, there is much more to learning than function approximation so researchers interested in more realistic learning systems must consider additional factors. The model can be extended in many ways, however, and simplified abstract models like this are likely to be useful as core components in a larger system. Some proposals, for example, surround a supervised learning module with key subsystems designed to translate available information into the detailed signals required by the simplified model. Extensions such as this are fascinating, but beyond the scope of this book. Other abstract models at similar levels of complexity include unsupervised learning and reinforcement learning.
Unsupervised Learning A requirement for supervised learning is presence of a teacher to specify the target output for every input. In unsupervised learning, there is no teacher. The training data is unlabeled and there are no targets. Instead, the system adapts to regularities in the data according to rules implicit in its design. The nature of the regularities found by the system depend on details of its design so the teacher is, in a sense, built into the system. Unsupervised learning is useful because unlabeled data is often more readily available than labeled data.
Some systems extract a set of prototype patterns from the training set; given an input, the most similar prototype is recalled. Parameters of the system determine how similarity is defined. In statistics, unsupervised learning often refers to clustering algorithms or probability density approximation. The k-means algorithm and vector quantization are examples.
Unsupervised learning modules are sometimes used as a component of a supervised learning system. To be useful, the unsupervised model must partition the data in a way that preserves the information needed for supervised learning.
Autoassociative networks are on the borderline between supervised and unsupervised learning. Given an input, the network is trained to reproduce the identical pattern at the output. The network acts as autoencoder, mapping an input pattern to itself. This may seem pointless but if the system is constrained by a bottleneck in a small middle layer, the network is forced to find an efficient internal representation of the pattern that preserves as much information as possible. Ideally, it will strip away nonessentials and reproduce only the significant features of the pattern, perhaps making it more useful for other purposes. Alternatively, the output of the bottleneck layer may be useful in itself as a compressed representation of the input pattern. This is related to principal components analysis (see appendix section B.1)
Reinforcement Learning Reinforcement learning (e.g., , , ) is a more realistic model of low-level learning in humans and animals. Reinforcement learning resembles supervised learning in that there is a defined goal, but the objective is defined more abstractly. Instead of a teacher providing detailed targets for each and every output, the only feedback is a sparse reinforcement signal which grades the system response as "good" or "bad" without providing further details. The reinforcement may be sparse in time as well as space. Game playing is a commonly mentioned example: the outcome of a game of chess is a single win-lose signal rather than a detailed list of which moves should have been made at each step in the game. In general, the system produces outputs that act on an external environment and affect the reinforcement eventually received. The training objective is to maximize the amount of positive reinforcement received over time.
In many reinforcement learning models, the key element is a subsystem which learns to predict the future reinforcement expected given the current network inputs and outputs. If this prediction can be learned accurately, then target signals for supervised training of the action selection module can be derived from changes in the predicted reinforcement.
Supervised Learning with a Distal Teacher As noted, the supervised learning model has been criticized because it puts a heavy burden on the teacher to specify detailed, lowlevel, target signals for every possible input. The model is not completely unrealistic, however, because there may be higher level targets available from which the low level targets needed for network training can be derived. Supervised learning with a distal teacher , ,  is intermediate between regular supervised learning and reinforcement learning (figure 2.4). The system output targets are more informative than in reinforcement learning, but less informative than in regular supervised learning. As in reinforcement learning, the network outputs act as inputs to another system, T, which transforms the network outputs into the final output. When T is well-defined or can be accurately modeled, errors in the overall system-output can be translated backwards to the low level networkoutput error signals needed for network training. When the overall targets can be specified simply, the teacher's job is simpler.
Using the task of throwing a ball as an analogy, the network outputs are the numerous coordinated muscular actions needed to toss the ball and T is the physics that transform these actions into a result. If the overall goal is to land a basketball in a hoop, the sight of it bouncing off the rim may be a high-level error signal. No coach can tell you exactly when and how to move each individual muscle, but they can provide high-level suggestions in terms you already know how to implement, for example, "put more spin on it." Knowledge of the situation then allows you to translate the high-level suggestion back to individual low-level actions.
Simulation results for a simple robot arm controller are described in , , . Given inputs representing a position (x, y), the desired network outputs are the joint angles that put the manipulator in this position (figure 2.5). Physical properties of the arm determine the relationship T between the network outputs (joint angle commands) and the system output where position errors are measured. The physics of the arm are well-known so the position errors can be translated back to joint-angle error signals needed for training.
Referring ahead a few chapters, it is interesting to note that this model covers the problem of training internal layers of a multilayer network if the first layer is viewed as the network and following layers are viewed as the transform T.