Skip to Book Content
Book cover image

Chapter 4 - MLP Representational Capabilities

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

Chapter 4: MLP Representational Capabilities

Overview

The standard multilayer perceptron (MLP) is a cascade of single-layer perceptrons (figure 4.1). There is a layer of input nodes, a layer of output nodes, and one or more intermediate layers. The interior layers are sometimes called "hidden layers" because they are not directly observable from the system inputs and outputs. Each node has a response f(wTx) where x is the vector of output activations from the preceding layer, w is a vector of weights, and f is a bounded nondecreasing nonlinear function such as the sigmoid. Normally, one of the weights acts as a bias by virtue of connection to a constant input. Nodes in each layer are fully connected to nodes in the preceding and following layers. There are no connections between units in the same layer, connections from one layer back to a previous layer, or "shortcut" connections that skip over intermediate layers. Although back-propagation can be applied to more general networks, this is the most commonly used structure.

The following sections summarize some properties and limitations that result from this structure, independent of methods used to set the weights.

How to Count Layers? A minor digression: there is some disagreement about how to count layers in a network. Some say a network with one hidden layer is a three-layer network because there are three layers of nodes: the inputs, the hidden units, and the outputs. Others say this is a two-layer network because there are only two layers of active nodes, the hidden units and outputs. Inputs are excluded because they do no computation. We tend to follow this convention and say that an L-layer network has L active layers; that is, L 1 hidden layers and an output layer. Conveniently, this is also the number of weight layers. Not everyone uses the same convention, however, so it is often simplest to explicitly specify the number of hidden layers. The network in figure 4.1, for example, would be called a two-hidden-layer network. In spite of the convention, it is natural to refer to the input layer at times; we did so in the first paragraph of this chapter.

The notationN1/ N2//NL is sometimes used to describe the structure of a layered network. This is simply a list of the number of nodes in each layer. A 10/3/2 network, for example, has 10 inputs, 3 nodes in a hidden layer, and 2 outputs. A 16/10/5/1 network would have 16 inputs, 10 nodes in the first hidden layer, 5 nodes in the second hidden layer, and 1 output. Unless otherwise specified, each layer is presumed to be fully connected to the preceding and following layers with no short-cut or feedback connections. Figure 4.1 illustrates a 5/5/3/4 structure.