| |||||
| |||||
At present, most artificial neural networks exist only as simulations on serial computers. The following samples illustrate the basic steps of back-propagation in 'C' pseudocode. Note, the purpose of the code is only to illustrate the algorithm. Real code would have to include many distracting details!
Forward Propagation In the feedforward step, an input pattern is propagated through the network to obtain an output. In 'C' pseudocode, this might look like
void forward_propagate( double ∗input_pattern ) { /* copy pattern to input nodes */ for( i=0; i<number_of_inputs; i++ ) node_output[i+1] = pattern[i]; node_output[0] = 1; /* set bias to 1 */ /* compute outputs of the remaining nodes */ for( i=first_hidden_index; i<number_of_nodes; i++) { double sum = 0; for( j=0; j<i; j++ ) sum += weight[i] [j] * node_output[j]; node_output[i] = sigmoid( sum ); } }
When the function returns, the network outputs are available in the values of the output nodes. Because of the feedforward node indexing scheme, each node_output [j] is ready and available when it is needed as input to following nodesi > j. The bias node has index 0 and input nodes have indices from 1 to number_of_inputs. The rest of the nodes are indexed in feedforward order, with the output nodes last. The entire network can thus be evaluated in a single sweep through the nodes without extra bookkeeping to keep track of layers or short-cut connections.
For simplicity, the weight matrix is square with slots for all possible connections (including unallowed backward connections). Mathematically, weight wij can be treated as zero if there is no connection from node j to node i. In practice, of course, it would be more efficient to store connection information for each node so that only those weights that actually exist are examined. Similar details are ignored here for simplicity.
Backward Error Propagation The derivatives of the error on the current pattern with respect to the weights are calculated in the back-propagation step. The following pseudocode shows how this might be implemented assuming the network response to the pattern has just been calculated by forward propagation. The *targets argument points to an array of target values. For simplicity, assume there is one array element per network node so the same index can be used to access nodes and their target values. A sigmoid node function is assumed.
void backprop_node_deltas( double *targets ) { /* calculate node deltas for output nodes */ for( i=last_output_node; i>=first_output_node; i-- ) { double err = targets[i] - node_value[i]; delta[i] = err * node_value[i]*(1-node_value[i]); /* (sigmoid slope term) */ } /* then calculate deltas for hidden nodes, working backwards */ for( i=last_hidden_node; i>=first_hidden_node; i-- ) { delta [i] = 0; for( k=i+1; k<=last_output_node; k++ ) delta[i] += weight[k][i] * delta[k] ; delta[i] *= node_value[i]*(1-node_value[i]) ; /* (sigmoid slope term) */ } }
Batch-Mode Weight Update The code just described would normally be included in a larger loop to add up the weight change contributions from each pattern. One epoch of batch-mode training could be done as follows.
void backprop_batch_one_epoch(void) { /* clear the dEdW accumulators*/ for( i=0; i<number_of_nodes .; i++ ) for( j=0; j<i; j++ ) dEdW[i] [j] 0; /* add up dEdW contributions from each pattern */ for( ip=0; ip<number_of_patterns; ip++ ) { forward_propagate( pattern[ip] ); backprop_node_deltas( targets[ip] ); for( i=0; i<number_of_nodes; i++ ) for( j=0; j<i; j++ ) if ( weight_really_exists(i,j) ) dEdW[i] [j] += delta[i] * node_value[j] ; } /* change the weights*/ for( i=0; i<number_of_nodes; i++ ) for( j=0; j<i; j++ ) weights [i] [j] += learning_rate * dEdW[i] [j]; }
On-Line Weight Update On-line training is even simpler because there is no need to clear and accumulate the single-pattern derivative terms. One pass through the training set in on-line mode could be done as follows.
void backprop_online_one_epoch(void)
{ /* for each pattern…*/ for(0; ip<number_of_patterns;
ip++ ) { index = choose_one_randomly(); forward_propagate( pattern [index] );
backprop_node_deltas( targets[index] ); /* change the weights */ for( i=0; i<number_of_nodes;
i++ ) for( j=0; j<i; j++ ) if ( weight_really_exists(i,j) ) weights[i][j] learning_rate*
delta[i]*node_value[j]; } }