Skip to Book Content
Book cover image

Chapter 5 - Back-Propagation

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

5.5 Pseudocode Examples

At present, most artificial neural networks exist only as simulations on serial computers. The following samples illustrate the basic steps of back-propagation in 'C' pseudocode. Note, the purpose of the code is only to illustrate the algorithm. Real code would have to include many distracting details!

Forward Propagation In the feedforward step, an input pattern is propagated through the network to obtain an output. In 'C' pseudocode, this might look like

 
void forward_propagate( double input_pattern ) 
{ /* copy pattern to input nodes */ for( i=0; i<number_of_inputs; i++ ) node_output[i+1] 
= pattern[i]; node_output[0] 
= 1; /* set bias to 1 */ /* compute outputs of the remaining nodes */ for( i=first_hidden_index; 
i<number_of_nodes; i++) { double sum = 0; for( j=0; j<i; j++ ) sum += weight[i] 
[j] * node_output[j]; node_output[i] = sigmoid( sum ); } } 

When the function returns, the network outputs are available in the values of the output nodes. Because of the feedforward node indexing scheme, each node_output [j] is ready and available when it is needed as input to following nodesi > j. The bias node has index 0 and input nodes have indices from 1 to number_of_inputs. The rest of the nodes are indexed in feedforward order, with the output nodes last. The entire network can thus be evaluated in a single sweep through the nodes without extra bookkeeping to keep track of layers or short-cut connections.

For simplicity, the weight matrix is square with slots for all possible connections (including unallowed backward connections). Mathematically, weight wij can be treated as zero if there is no connection from node j to node i. In practice, of course, it would be more efficient to store connection information for each node so that only those weights that actually exist are examined. Similar details are ignored here for simplicity.

Backward Error Propagation The derivatives of the error on the current pattern with respect to the weights are calculated in the back-propagation step. The following pseudocode shows how this might be implemented assuming the network response to the pattern has just been calculated by forward propagation. The *targets argument points to an array of target values. For simplicity, assume there is one array element per network node so the same index can be used to access nodes and their target values. A sigmoid node function is assumed.

 
void backprop_node_deltas( double *targets ) { /* calculate node deltas for output 
nodes */ for( i=last_output_node; i>=first_output_node; i-- ) { double err 
= targets[i] - node_value[i]; delta[i] = err * node_value[i]*(1-node_value[i]); 
/* (sigmoid slope term) */ } /* then calculate deltas for hidden nodes, working 
backwards */ for( i=last_hidden_node; i>=first_hidden_node; i-- ) { delta [i] 
= 0; for( k=i+1; k<=last_output_node; k++ ) delta[i] += weight[k][i] * delta[k] 
; delta[i] *= node_value[i]*(1-node_value[i]) ; /* (sigmoid slope term) */ } } 

Batch-Mode Weight Update The code just described would normally be included in a larger loop to add up the weight change contributions from each pattern. One epoch of batch-mode training could be done as follows.

 
void backprop_batch_one_epoch(void) { /* clear the dEdW accumulators*/ for( i=0; 
i<number_of_nodes .; i++ ) for( j=0; j<i; j++ ) dEdW[i] [j] 0; /* add up 
dEdW contributions from each pattern */ for( ip=0; ip<number_of_patterns; ip++ 
) { forward_propagate( pattern[ip] ); backprop_node_deltas( targets[ip] ); for( 
i=0; i<number_of_nodes; i++ ) for( j=0; j<i; j++ ) if ( weight_really_exists(i,j) 
) dEdW[i] [j] += delta[i] * node_value[j] ; } /* change the weights*/ for( i=0; 
i<number_of_nodes; i++ ) for( j=0; j<i; j++ ) weights [i] [j] += learning_rate 
* dEdW[i] [j]; } 

On-Line Weight Update On-line training is even simpler because there is no need to clear and accumulate the single-pattern derivative terms. One pass through the training set in on-line mode could be done as follows.

 void backprop_online_one_epoch(void) 
{ /* for each pattern*/ for(0; ip<number_of_patterns; 
ip++ ) { index = choose_one_randomly(); forward_propagate( pattern [index] ); 
backprop_node_deltas( targets[index] ); /* change the weights */ for( i=0; i<number_of_nodes; 
i++ ) for( j=0; j<i; j++ ) if ( weight_really_exists(i,j) ) weights[i][j] learning_rate* 
delta[i]*node_value[j]; } }