Books24x7 Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks

15.3 Akaike's Final Prediction Error

A standard estimate of the test set error for a linear system is Akaike's final prediction error (FPE) [6], [7]:

(15.11)

where p is the number of training samples and N is the number of parameters in the model.

A related estimate, Akaike's information criterion (AIC) [7], has been used to compare linear models with different numbers of parameters

(15.12)

where θ is a model with k parameters. If equation 15.5 is valid, then

(15.13)

where E is the usual mean-squared error. With this cost function, simple models are preferred over complex models if the increased cost of the additional parameters in the complex models do not result in corresponding decreases in the error. There are a number of similar criteria developed for linear systems.

The Effective Number of Parameters A problem with these estimates is that they are asymptotic approximations valid only for linear systems with large sample sets. The assumptions are invalid for small sample sizes [61]. An extension to linear systems with finite sample sizes is considered by Hansen [152]. Some work has been done to extend this to nonlinear systems by estimating the effective number of parameters from derivatives of the error with respect to the weights [273]. One form, for a single-hidden-layer network, is [365]

(15.14)

where

(15.15)

Here N_w and N_W are the number of weights (including thresholds) in the hidden and output units, respectively, and the λ's are second derivatives λ_ij ≡ ∂² E_train/∂w²_ij, Λ_ij ≡ ∂²E_train/∂W²_ij. This estimate is used to determine when to stop a pruning algorithm in [365].

Chapter 15 - Generalization Prediction and Assessment

15.3 Akaike's Final Prediction Error