Table of Contents |
| Synopsis |
| Back
Cover |
| Preface |
| | 1 Introduction | MLP
Representational Capabilities - 4.0
Overview
- 4,1
Representation Capability
- 4.2
Univeral Approximation Capibilities
- 4.3
Size versus Depth
- 4.4
MLP Representational Capabilities
| Weight-Initialization
Techniques - 7.0
Overview
- 7.1
Random Initialization
- 7.2
Weight-Initialization Techniques
| Classical
Optimization Techniques - 10.0
Overview
- 10.1
The Objective Function
- 10.2
Factors Affecting the Choice of a Method
- 10.3
Line Search
- 10.4
Evaluation-Only Methods
- 10.5
First-Order Gradient Methods
- 10.6
Second-Order Gradient Methods
- 10.7
Stochastic Evaluation-Only Methods
- 10.8
Discussion
| Pruning
Algorithms - 13.0
Overview
- 13.1
Pruning Algorithms
- 13.2
Sensitivity Calculation Methods
- 13.3
- 13.4
Other Methods
- 13.5
Discussion
| Heuristics
for Improving Generalization - 16.0
Overview
- 16.1
Early Stopping
- 16.2
Regularization
- 16.3
Pruning Methods
- 16.4
Constructive Methods
- 16.5
Weight Decay
- 16.6
Information Minimization
- 16.7
Replicated Networks
- 16.8
Training with Noisy Data
- 16.9
Use of Domain-Dependent Prior Information
- 16.10
Hint Functions
- 16.11
Knowledge-Based Neural Nets
- 16.12
Phsical Models to Generate Additional Data
| Principal
Components Analysis - B.0
Overview
- B.1
Autoencoders Networks and Principal Components
- B.2
Discriminant Analysis Projections
|
| | Supervised
Learning - 2.0
Overview
- 2.1
Objectives
- 2.2
Alternatives & Extensions
- 2.3
| Back-Propagation - 5.0
Overview
- 5.1
Preliminaries
- 5.2
Back-Propagation
- 5.3
Back-Propagation: The Weight Date
- 5.4
Common Modifications
- 5.5
Pseudocode Examples
- 5.6
Remarks
- 5.7
Training Time
| The
Error Surface - 8.0
Overview
- 8.1
Characteristic Features
- 8.2
The Gradient is the Sum of Single-Pattern Gradients
- 8.3
Weight-Space Symmetries
- 8.4
Remarks
| Genetic
Algorithms and Neural Networks - 11.0
Overview
- 11.1
The Basic Algorithm
- 11.2
Example
- 11.3
Application to Neural Network Design
- 11.4
Remarks
| Factors
Influencing Generalization - 14.0
- 14.1
Definitions
- 14.2
The Need for Additional Information
- 14.3
Network Complexity versus Target Complexity
- 14.4
The Training Data
- 14.5
The Learning Algorithm
- 14.6
- 14.7
Summary
| Effects
of Training with Noisy Inputs - 17.0
Overview
- 17.1Convolution
Property of Training with Jitter
- 17.2
Error Regularization and Training with Jitter
- 17.3
Training with Jitter and Sigmoid Scaling
- 17.4
Extensions to General Layered Neural Networks
- 17.5
Remarks
- 17.6
Further Examples
| Jitter
Calculations - C.1
Jitter: Small-Perturbation Approximation
- C.2
Jitter: CDF-PDF Convolution in n Dimensions
- C.3
Jitter: CDF-PDF Convolution in One Dimension
|
| | Single-Layer
Network - 3.0
Overview
- 3.1
Hyperplane Geometry
- 3.2
Linear Separability
- 3.3
Hyperplane Capacity
- 3.4
Learning Rules for Single-Layer Networks
- 3.5
Adalines and the Widrow-Hoff Laearning Algorithm
| Learning
Rate and Momentum | Faster
Variations of Back-Propagation - 9.0
Overview
- 9.1
Adaptive Learning Rate
- 9.2
Vogl's Method (Bold Driver)
- 9.3
Delta-Bar-Delta
- 9.4
Silva & Almeida
- 9.5
SuperSAB
- 9.6
Rprop
- 9.7
Quickprop
- 9.8
Search Then Converge
- 9.9
Fuzzy Control of Back-Propagation
- 9.10
Other Heuristics
- 9.11
Remarks
- 9.12
Other Notes
| Constructive
Methods - 12.0
Overview
- 12.1
Dynamic Node Creation
- 12.2
Cascade-Correlation
- 12.3
The Upstart Algorithm
- 12.4
The Tiling Algorithm
- 12.5
Marchand's Algorithm
- 12.6
Meiosis Networks
- 12.7
Principal Components Node Splitting
- 12.8
Construction from a Voronoi Diagram
- 12.9
Other Algorithms
| Generalization
Prediction and Assessment - 15.0
Overview
- 15.1
Cross-Validation
- 15.2
The Bayesian Approach
- 15.3
Akaike's Final Prediction Error
- 15.4
PAC Learning and the VC Dimension
| Linear
Regression - A.0
Overview
- A.1
Newton's Method
- A.2
Gradient Descent
- A.3
The LMS Algorithm
| Sigmoid-like
Nonlinear Functions - D.0
Sigmoid-like Nonlinear Functions
|
| References |
| Information
|