Rule-based systems, such as expert systems, have been used quite successfully in many applications. These systems use human information efficiently and there is interest in developing hybrid systems combining the high-level information processing abilities of symbolic systems with the adaptability of neural nets. A useful feature of expert systems which neural networks generally lack is the ability to explain the reasoning behind its conclusions.
One approach ,  is to embed symbolic rules in the initial structure of a neural network by translating the AND, OR, and NOT terms into corresponding network structures with appropriate weights. (Simple variable-free propositional rules are easily translated to neural network structures.) Additional links with small random weights are provided to let the system add other terms that may be useful. The network is then trained from examples to improve its performance. Because the embedded symbolic rules are often classifications, the cross-entropy error function may work better than the mean-squared-error function .
Besides faster training due to a good initial solution, improved generalization has been observed in spite of imperfect embedded rules. This is attributed to "(1) focusing attention on relevant input features, and (2) indicating useful intermediate conclusions (which suggest a good network topology)" . Given a sufficient number of examples, a standard network initialized with random weights should converge to the same asymptotic performance, but the knowledge-based networks generalize better when examples are sparse. Evidently "the initial knowledge is ‘worth’ some number of training examples" . Some references for ways of using forms of prior knowledge other than symbolic rules are provided by Shavlik .