Neural Smithing - Supervised Learning in Feedforward Artificial Neural Networks
Table of Contents
Synopsis
Back Cover

Preface

 1 Introduction

MLP Representational Capabilities

  • 4.0 Overview
  • 4,1 Representation Capability
  • 4.2 Univeral Approximation Capibilities
  • 4.3 Size versus Depth
  • 4.4 MLP Representational Capabilities

Weight-Initialization Techniques

  • 7.0 Overview
  • 7.1 Random Initialization
  • 7.2 Weight-Initialization Techniques

Classical Optimization Techniques

  • 10.0 Overview
  • 10.1 The Objective Function
  • 10.2 Factors Affecting the Choice of a Method
  • 10.3 Line Search
  • 10.4 Evaluation-Only Methods
  • 10.5 First-Order Gradient Methods
  • 10.6 Second-Order Gradient Methods
  • 10.7 Stochastic Evaluation-Only Methods
  • 10.8 Discussion

Pruning Algorithms

  • 13.0 Overview
  • 13.1 Pruning Algorithms
  • 13.2 Sensitivity Calculation Methods
  • 13.3
  • 13.4 Other Methods
  • 13.5 Discussion

Heuristics for Improving Generalization

  • 16.0 Overview
  • 16.1 Early Stopping
  • 16.2 Regularization
  • 16.3 Pruning Methods
  • 16.4 Constructive Methods
  • 16.5 Weight Decay
  • 16.6 Information Minimization
  • 16.7 Replicated Networks
  • 16.8 Training with Noisy Data
  • 16.9 Use of Domain-Dependent Prior Information
  • 16.10 Hint Functions
  • 16.11 Knowledge-Based Neural Nets
  • 16.12 Phsical Models to Generate Additional Data

 

Principal Components Analysis

  • B.0 Overview
  • B.1 Autoencoders Networks and Principal Components
  • B.2 Discriminant Analysis Projections
 

Supervised Learning

  • 2.0 Overview
  • 2.1 Objectives
  • 2.2 Alternatives & Extensions
  • 2.3

Back-Propagation

  • 5.0 Overview
  • 5.1 Preliminaries
  • 5.2 Back-Propagation
  • 5.3 Back-Propagation: The Weight Date
  • 5.4 Common Modifications
  • 5.5 Pseudocode Examples
  • 5.6 Remarks
  • 5.7 Training Time

The Error Surface

  • 8.0 Overview
  • 8.1 Characteristic Features
  • 8.2 The Gradient is the Sum of Single-Pattern Gradients
  • 8.3 Weight-Space Symmetries
  • 8.4 Remarks

Genetic Algorithms and Neural Networks

  • 11.0 Overview
  • 11.1 The Basic Algorithm
  • 11.2 Example
  • 11.3 Application to Neural Network Design
  • 11.4 Remarks

Factors Influencing Generalization

  • 14.0
  • 14.1 Definitions
  • 14.2 The Need for Additional Information
  • 14.3 Network Complexity versus Target Complexity
  • 14.4 The Training Data
  • 14.5 The Learning Algorithm
  • 14.6
  • 14.7 Summary

Effects of Training with Noisy Inputs

  • 17.0 Overview
  • 17.1Convolution Property of Training with Jitter
  • 17.2 Error Regularization and Training with Jitter
  • 17.3 Training with Jitter and Sigmoid Scaling
  • 17.4 Extensions to General Layered Neural Networks
  • 17.5 Remarks
  • 17.6 Further Examples

Jitter Calculations

  • C.1 Jitter: Small-Perturbation Approximation
  • C.2 Jitter: CDF-PDF Convolution in n Dimensions
  • C.3 Jitter: CDF-PDF Convolution in One Dimension
 

Single-Layer Network

  • 3.0 Overview
  • 3.1 Hyperplane Geometry
  • 3.2 Linear Separability
  • 3.3 Hyperplane Capacity
  • 3.4 Learning Rules for Single-Layer Networks
  • 3.5 Adalines and the Widrow-Hoff Laearning Algorithm

Learning Rate and Momentum

Faster Variations of Back-Propagation

  • 9.0 Overview
  • 9.1 Adaptive Learning Rate
  • 9.2 Vogl's Method (Bold Driver)
  • 9.3 Delta-Bar-Delta
  • 9.4 Silva & Almeida
  • 9.5 SuperSAB
  • 9.6 Rprop
  • 9.7 Quickprop
  • 9.8 Search Then Converge
  • 9.9 Fuzzy Control of Back-Propagation
  • 9.10 Other Heuristics
  • 9.11 Remarks
  • 9.12 Other Notes

Constructive Methods

  • 12.0 Overview
  • 12.1 Dynamic Node Creation
  • 12.2 Cascade-Correlation
  • 12.3 The Upstart Algorithm
  • 12.4 The Tiling Algorithm
  • 12.5 Marchand's Algorithm
  • 12.6 Meiosis Networks
  • 12.7 Principal Components Node Splitting
  • 12.8 Construction from a Voronoi Diagram
  • 12.9 Other Algorithms

Generalization Prediction and Assessment

  • 15.0 Overview
  • 15.1 Cross-Validation
  • 15.2 The Bayesian Approach
  • 15.3 Akaike's Final Prediction Error
  • 15.4 PAC Learning and the VC Dimension

Linear Regression

  • A.0 Overview
  • A.1 Newton's Method
  • A.2 Gradient Descent
  • A.3 The LMS Algorithm

Sigmoid-like Nonlinear Functions

  • D.0 Sigmoid-like Nonlinear Functions
References
Information