Books24x7 Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks

5.6 Remarks

To reiterate, back-propagation refers to (1) an efficient method to calculate derivatives of the training error with respect to the weights, and (2) a training algorithm that uses those derivatives to adjust the weights to minimize the error. Other optimization methods can be used to update the weights, so it is not uncommon to hear of a network trained by, say, the conjugate gradient method using back-propagation to calculate the gradient.

Confusion may arise because the term back-propagation network is sometimes used to refer to a standard multilayer network trained by back-propagation. Although most people understand the term, it is not strictly correct because (1) the same network could be trained by other methods and (2) back-propagation can be used to train other types of networks. Back-propagation is simply one method, albeit the most common, for training these types of networks.

Although the algorithm is usually derived for a fully connected layered network, it can be applied to networks with arbitrary feedforward structure. Any number of weights can be held constant. It is also possible for internal nodes to have targets. This may be useful when it is known that the network must compute some intermediate function in order to calculate the final desired output. In this case, the node delta is the sum of deltas obtained by considering it as both an output node and a hidden node. (Section 16.10 discusses the use of this sort of information as hint functions.)

The thing that makes back-propagation more than a simple application of the derivative chain-rule is the ordering of the calculations. A naive application of the chain-rule separately for each of the W weights in a network could result in an 0(W²) time algorithm: 0(W) time to calculate ∂E/∂W for a single weight multiplied by the W weights in the network. Back-propagation, in contrast, is an 0(W) time algorithm. Bishop [44] likens the practical importance of this difference to that of the fast Fourier transform (FFT).

A side note: In this book we mainly discuss feedforward networks of sigmoidal units and a large part is devoted to back-propagation and its variations. Although back-propagation is one the most popular learning techniques for neural networks, it is a mistake to equate the entire field with back-propagation in layered perceptrons. Even considering only nonbiological networks, there are many optimization methods besides back-propagation and there are many structures in addition to layered sigmoidal networks. Most of the properties that make artificial neural networks attractive (e.g., potential parallelism, fast response, fault tolerance, learning from examples, generalization, etc.) have nothing to do with backpropagation per se. The algorithm is simply one of many possible methods to select the network weights. Ideally, any optimization method minimizing the same error function would produce the same weights so the resulting properties are not attributable to backpropagation alone. Likewise, the neural networks field contains more than just layered perceptrons. Although back-propagation and layered networks are adequate for many applications, there are good reasons to explore alternatives. Back-propagation, for example, often requires very long training times so much research has been devoted to finding faster methods. Similarly, there are applications where it is useful to build more structure into the network rather than using a simple fully connected layered structure. In a sense, backpropagation in layered feedforward networks could be viewed as a local minimum and it is hoped that further research will discover better methods. In any case, biological networks are certainly not simple layered feedforward structures and it is very unlikely that they adapt by back-propagation so we may have more to learn.

Chapter 5 - Back-Propagation

5.6 Remarks