Skip to Book Content
Book cover image

References

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

References

[1] Y. S. Abu-Mostafa, The Vapnik-Chervonenkis dimension: Information versus complexity in learning. Neural Computation 1(3): 312—317, 1989.

[2] L. E. K. Achenie, Computational experience with a quasi Newton based training of the feedforward neural network. In World Congress on Neural Networks (San Diego), vol. 3, pp. 607—612. Lawrence Erlbaum, Hillsdale, NJ, 1994.

[3] D. H. Ackley, A Connectionist Machine for Genetic Hillclimbing. Kluwer Boston, 1987.

[4] D. H. Ackley, and M. S. Littman., Learning from natural selection in an artificial environment. In Proceedings of the International Joint Conference on Neural Networks (San Diego), vol. 1 p. 189, IEEE,New York, 1990.

[5] S., Ahmad, and G. Tesauro. Scaling and generalization in neural networks: a case study. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Summer School, pp. 3—10. Morgan Kaufmann, San Mateo, 1988.

[6] H. Akaike, Fitting autoregressive models for prediction. Annals of the Institute of Statistical Mathematics pp. 21: 243—247, 1969.

[7] H. Akaike, A new look at the statistical model identification. IEEE Transactions on Automatic Control AC-19(6): 716—723, 1974.

[8] D., Alpsan, M. Towsey, O. Ozdamar, A. C. Tsoi, and D. N. Ghista., Are modified back-propagation algorithms worth the effort? In IEEE International Conference on Neural Networks (Orlando), vol. 1 pp. 567—571. IEEE,New York, 1994.

[9] D., Alpsan, M. Towsey, O. Ozdamar, A. C. Tsoi, and D. N. Ghista., Efficacy of modified backpropagation and optimisation methods on a real-world medical problem. Neural Networks 8(6): 945—962, 1995.

[10] P., Arabshahi, J. J. Choi, R. J. Marks, II, and T. P. Caudell, Fuzzy control of backpropagation. In Proceedings of the First IEEE International Conference on Fuzzy Systems, (FUZZ-IEEE '92), San Diego, pp. 967—972. IEEE,New York, 1992.

[11] T. Ash, Dynamic node creation in back-propagation networks. Technical Report 8901. Institute for Cognitive Science, UCSD,La Jolla, 1989.

[12] L., Atlas, D. Cohn, R. Ladner, M. A. El-Sharkawi, R. J. Marks II, M. E. Aggoune, and D. C. Park. Training connectionist networks with queries and selective sampling. In Advances in Neural Information Processing Systems (Denver) (2) pp. 566—573, Morgan Kaufmann, San Mateo, 1990.

[13] P., Auer, M. Herbster, and M. K. Warmuth. Exponentially many local minima for single neurons. In D. S. Touretsky, M.C. Mozer, and M. E. Hasselmo, Advances in Neural Information Processing Systems (8), pp. 316—322. MIT Press, Cambridge, 1996.

[14] S., Aylward, D. C. St. Clair, W. Bond, B. Flachsbart, and A. K. Rigler. One-dimensional search strategies for conjugate gradient training of backpropagation neural networks. In Proceedings of the Artificial Neural Networks in Engineering (ANNIE'92) Conference, St. Louis, 2 pp. 192—202. IEEE,New York, 1992.

[15] A. W. Bailey, Automatic evolution of neural net architectures. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1 pp. 589—592. IEEE,New York, 1990.

[16] P., Baldi, Y. Chauvin., Temporal evolution of generalization during learning in linear networks. Neural Computation 3(4): 598—603, 1991.

[17] P., Baldi, Y. Chauvin, and K. Hornik., Backpropagation and unsupervised learning in linear networks. In Y. Chauvin, and D. E. Rumelhart, editors, Backpropagation: Theory, Architectures, and Applications, chapter 12, pp. 389—432. Erlbaum, Hillsdale, NJ, 1994.

[18] P., Baldi, and K. Hornik. Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks 2(1): 53—58, 1989.

[19] E. Barnard, Optimization for training neural nets. IEEE Transactions on Neural Networks 3(2): 232—240, 1992.

[20] E., Barnard, J. E.W. Holm, A comparative study of optimization techniques for backpropagation. Neuro-computing pp. 6: 19—30, 1994.

[21] A. R. Barron, Approximation bounds for superpositions of a sigmoidal function. In Proceedings of the IEEE International Symposium on Information Theory. IEEE Press, New York, 1991.

[22] A. R. Barron, Universal approximation bounds of a sigmoidal function. Technical Report pp. 58. Statistics Department, University of Illinois, Urbana-Champaign, 1991.

[23] A. R. Barron, Universal approximation bounds for superposition of a sigmoidal function. IEEE Transactions on Information Theory 39(3): 930—945, 1993.

[24] A. G. Barto, Reinforcement learning and adaptive critic methods. In D. A. White, and D. A. Sofge, editors Handbook of Intelligent Control, chapter 12.Van Nostrand Reinhold, New York, 1992.

[25] A. G., Barto, R. S. Sutton, and C. W. Anderson., Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics SMC-13(5): 834—846, 1983. (reprinted in J. A. Anderson, and E. Rosenfeld, editors, Neurocomputing. 1988).

[26] R. Battiti, Accelerated backpropagation learning: Two optimization methods. Complex Systems, pp. 3: 331—342, 1989.

[27] R. Battiti, Optimization methods for back-propagation: Automatic parameter tuning and faster convergence. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1 pp. 593—596. IEEE,New York, 1990.

[28] R. Battiti, First- and second-order methods for learning: Between steepest descent and Newton's method. Neural Computation 4(2): 141—166, 1992.

[29] R. Battiti, G. Tecchiolli., Learning with first, second, and no derivatives: A case study in high energy physics. Neurocomputing pp. 6: 181—206, 1994.

[30] E. B. Baum, On the capabiliities of multilayer perceptrons. Journal of Complexity pp. 4: 193—215, 1988.

[31] E. B. Baum, A proposal for more powerful learning algorithms. Neural Computation 1(2): 201—207, 1989.

[32] E. B. Baum, When are k-nearest neighbor and back propagation accurate for feasible sized sets of examples? L. B. Almeida, C. J. Wellekens, Neural Networks, Proceedings EURASIP Workshop, vol. 412 Lecture Notes in Computer Science, pp. 2—25. Springer-Verlag, New York, February 1990..

[33] E. B. Baum, D. Haussler., What size net gives valid generalization? Advances in Neural Information Processing Systems (1) pp. 81—90. Morgan Kaufmann, San Mateo, 1989.

[34] E. B. Baum, D. Haussler., What size net gives valid generalization? Neural Computation pp. 1: 151—160, 1989.

[35] E. B. Baum, R. L. Rivest., Training a 3-node neural network is NP-complete. Proceedings of the 1988 Workshop on Computational Learning Theory, pp. 9—18. Morgan Kaufmann, San Mateo, 1988.

[36] W. G. Baxt, Improving the accuracy of an artificial neural network using multiple differently trained networks. Neural Computation 4(5): 772—780, 1992.

[37] S. Becker, Y. Le Cun., Improving the convergence of back-propagation learning with second order methods. D. Touretzky, G. Hinton, T. Sejnowski, Proceedings of the 1988 Connectionist Models Summer School, pp. 29—37. Morgan Kaufmann, San Mateo, 1988.

[38] M. G. Bello, Enhanced training algorithms, and integrated training/architecture selection for multilayer perceptrons. IEEE Transactions on Neural Networks, 3(6): 864—875, 1992.

[39] C. Bishop, Exact calculation of the Hessian matrix for the multilayer perceptron. Neural Computation 4(4): 494—501, 1992.

[40] C. M. Bishop, Curvature-driven smoothing in backpropagation neural networks. IEEE Transactions on Neural Networks 4(4): 494—501, 1993.

[41] C. M. Bishop, A fast procedure for retraining the multilayer perceptron. International Journal of Neural Systems 2(3): 229—236, 1991.

[42] C. M. Bishop, Improving the generalization properties of radial basis function neural networks. Neural Computation 3(4): 579—588, 1991.

[43] C. M. Bishop, Curvature-driven smoothing: A learning algorithm for feedforward networks. IEEE Transactions on Neural Networks 4(5): 882—884, 1993. [44] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, Oxford, 1995.

[45] C. M. Bishop, Training with noise is equivalent to Tikhonov regularization. Neural Computation, 7(1): 108—116, 1995.

[46] H. D. Block, The perceptron: A model for brain functioning. I. Reviews of Modern Physics, pp. 34: 123—135, 1962. Neurocomputing. MIT Press, Cambridge, 1988.

[47] J. Blue, Training feed-forward neural networks using conjugate gradients. SPIE, vol. 1661 pp. p. 179. Society of Photo-Optical Instrumentation Engineers,Bellingham, WA 1992.

[48] E. K. Blum, L. K. Li., Approximation theory and feedforward networks. Neural Networks 4(4): 511—515, 1991.

[49] A. Blumer, A. Ehrenfeucht, D. Haussler, M. Warmuth., Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery 36(4): 929—965, 1989.

[50] L. Booker, Improving search in genetic algorithms. Genetic Algorithms and Simulated Annealing. Morgan Kaufmann, San Mateo, 1990.

[51] L. Booker, Using classifier systems to implement distributed representations. Proceedings of the International Joint Conference on Neural Networks (San Diego), vol. vol, 1, pp. p. 39. IEEE,New York, 1990.

[52] L. B. Booker, D. E. Goldberg, J. H. Holland, Classifier systems and genetic algorithms. Machine Learning: Paradigms and Methods. MIT Press, Cambridge, 1990.

[53] N. K. Bose, A. K. Garga., Neural network design using Voronoi diagrams: Preliminaries. Proceedings of the International Joint Conference on Neural Networks (Baltimore), vol. 3 pp. 127—132. IEEE Press, New York, 1992.

[54] N. K. Bose, A. K. Garga., Neural network design using Voronoi diagrams. IEEE Transactions on Neural Networks 4(5): 778—787, 1993.

[55] N. Bose, P. Liang., Neural Network Fundamentals with Graphs, Algorithms, and Applications. McGraw-Hill, New York, 1996.

[56] S. H. Botros, C. G. Atkeson., Generalization properties of radial basis functions. R. P. Lippmann, J. E. Moody, D. S. Touretzky, Advances in Neural Information Processing Systems (3), pp. 707—713. Morgan Kaufmann, San Mateo, 1991.

[57] H. Boulard, Y. Kamp., Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics pp. 59: 291—294, 1988.

[58] M. L. Brady, R. Raghavan., Gradient descent fails to separate. Proceedings of the IEEE International Conference on Neural Networks (San Diego), vol. 1 pp. 649—656. IEEE,New York, 1988.

[59] M. L. Brady, R. Raghavan, J. Slawny., Back propagation fails to separate where perceptrons succeed. IEEE Transactions on Circuits and Systems pp. 36: 665—674, 1989.

[60] L. Breiman, Current research. The Mathematics of Generalization, pp. 361—368. Addison-Wesley, Reading, MA 1995.

[61] L. Breiman, Reflections after refereeing papers for NIPS. The Mathematics of Generalization, pp. 11—15. Addison-Wesley, Reading, MA 1995.

[62] W. L. Buntine, A. S. Weigend., Computing second derivatives in feed-forward networks: A review. IEEE Transactions on Neural Networks 5(3): 480—488, 1994.

[63] G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynolds, D. B. Rosen., Fuzzy ARTMAP: An adaptive resonance architecture for incremental learning of analog maps. Proceedings of the International Joint Conference on Neural Networks (Baltimore), vol. 3 pp. 309—314. IEEE,New York, 1992.

[64] S. M. Carroll, B. W. Dickson., Construction of neural nets using the Radon transform. Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1 pp. 607—611. IEEE,New York, 1989.

[65] J. P. Cater, Successfully using peak learning rates of 10 (and greater) in back-propagation networks with the heuristic learning algorithm. Proceedings of the IEEE First International Conference on Neural Networks (San Diego). >IEEE,New York, 1987.

[66] M. Caudill, Evolutionary neural networks. AI Expert, March 1991..

[67] L. W. Chan, F. Fallside., An adaptive training algorithm for back propagation networks. Computer Speech and Language pp. 2: 205—218, 1987.

[68] I. E. Chang, R. P. Lippmann., Using genetic algorithms to improve pattern classification performance. R. P. Lippmann, J. E. Moody, D. S. Touretzky, Advances in Neural Information Processing (Denver, 1990) (3) pp. 797—803. Morgan Kaufmann, San Mateo, 1991.

[69] C. Charalambous, Conjugate gradient algorithm for efficient training of artificial neural networks. IEE Proceedings 139(3): 301—310, 1992.

[70] Y. Chauvin, A back-propagation algorithm with optimal use of hidden units. Advances in Neural Information Processing Systems (Denver, 1988) (1) pp. 519—526. Morgan Kaufmann, San Mateo, 1989.

[71] Y. Chauvin, Principal component analysis by gradient descent on a constrained linear Hebbian cell. Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), vol. 1 pp. 373—380. IEEE,New York, 1989.

[72] Y. Chauvin, Dynamic behavior of constrained back-propagation networks. Advances in Neural Information Processing Systems (2) pp. 642—649. Morgan Kaufmann, San Mateo, 1990.

[73] Y. Chauvin, Generalization performance of overtrained back-propagation networks. L. B. Almeida, C. J. Wellekens, Neural Networks, Proceedings EUROSIP Workshop, vol. 412 Lecture Notes in Computer Science, pp. 46—55. Springer-Verlag, New York, Feb. 1990..

[74] Y. Chauvin, Generalization dynamics in LMS trained linear networks. R. Lippmann, J. Moody, D. Touretzky, Advances in Neural Information Processing Systems (3) pp. 890—896. Morgan Kaufmann, San Mateo, 1991.

[75] C. H. Chen, H. Lai., A comparison study of the gradient descent and the conjugate gradient backpropagation neural networks. World Congress on Neural Networks (Portland), vol. 3 pp. 401—406. Erlbaum, Hillsdale, NJ 1993.

[76] J. R. Chen, P. Mars., Stepsize variation methods for accelerating the back propagation algorithm. Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), vol. 1 pp. 601—604. IEEE,New York, 1990.

[77] O. T.-C. Chen, B. J. Sheu., Optimization schemes for neural network training. IEEE International Conference on Neural Networks (Orlando), vol. 2 pp. 817—822. IEEE,New York, 1994.

[78] S. Chen, S.A. Billings, R. Hecht-Nielsen., On the geometry of feedforward neural network error surfaces. Neural Computation 5(6): 910—927, 1993.

[79] V. Cherkassky, F. Mulier., Statistical and neural network techniques for nonparametric regression. Selecting Models from Data, Lecture Notes in Statistics 89,pp. 383—392. Springer Verlag, New York, 1994.

[80] D. L. Chester, Why two hidden layers are better than one. Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1 pp. 265—268. IEEE,New York, 1990.

[81] J. J. Choi, P. Arabshahi, R. J. Marks II, T. P. Caudell., Fuzzy parameter adaptation in neural systems. Proceedings of the International Joint Conference on Neural Networks (Baltimore), vol. 1 pp. 232—235. IEEE,New York, 1992.

[82] A. Cichocki, R. Unbehauen., Neural Networks for Optimization and Signal Processing. Wiley, 1993.

[83] R. T. Clemen, Combining forecasts: A review and annotated bibliography. International Journal of Forecasting 5(4): 559—583, 1989.

[84] Codrington, C. W. M. F. Tenorio., Adaptive gain networks. IEEE International Conference on Neural Networks (Orlando), vol. 1 pp. 339—344. IEEE,New York, 1994.

[85] D., Cohn, G. Tesauro., Can neural-networks do better than the Vapnik-Chervonenkis bounds? R.P. Lippmann, J.E. Moody, D.S Touretzky, Advances in Neural Information Processing Systems (Denver, 1990) vol. (3), pp. 911—917. Morgan Kaufmann, San Mateo, 1991.

[86] E.M., Corwin, A.M. Logar, W.J.B. Oldham., An iterative method for training multilayer networks with threshold functions. IEEE Transactions on Neural Networks vol. 5(3): pp. 507—508, . 1994.

[87] T. Cover, Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, pp. 14: 326—334, . 1965.

[88] III, R.S. Crowder, Predicting the Mackey-Glass timeseries with cascade-correlation learning. D.S. Touretzky, J.L. Elman, T.J. Sejnowski, G.E. Hinton, Connectionist Models: Proceedings of the 1990 Summer School, pp. 117—123. Morgan Kaufmann, San Mateo, 1991.

[89] Le Cun, Y. Une procédure d'apprentissage pour réseau à seuil assymétrique. COGNITIVA 85: A la Frontière de l'Intelligence Artificielle des Science de la Connaissance des Neurosciences (Paris), pp. 599—604. CESTA, Paris, 1985.

[90] Y. Le Cun, Learning process in an asymmetric threshold network. E. Bienenstock, F. Fogelman Soulié, G. Weisbuch, Disordered Systems and Biological Organization, pp. 233—240. Springer-Verlag, New York, 1986.

[91] Y., Cun, Le, J.S. Denker, S.A. Solla., Optimal brain damage. Advances in Neural Information Processing Systems (Denver) (2), pp. 598—605. Morgan Kaufmann, San Mateo, 1990.

[92] Y., Cun, Le, I. Kanter, S.A. Solla., Eigenvalues of covariance matrices: Application to neural-network learning. Physical Review Letters 66(18): 2396—2399, 1991.

[93] Y., Cun, Le, I.Kanter, S.A. Solla., Second order properties of error surfaces: Learning time and generalization. R.P. Lippmann, J.E. Moody, D.S. Touretzky, Advances in Neural Information Processing Systems (3), pp. 918—924. Morgan Kaufmann, San Mateo, 1991.

[94] Y., Cun, Le, P.Y. Simard, B. Pearlmutter., Automatic learning rate maximization by on-line estimation of the Hessian eigenvectors. S.J. Hanson, J.D. Cowan, C.L. Giles, Advances in Neural Information Processing Systems (5), pp. 156—163. Morgan Kaufmann, San Mateo, 1993.

[95] G. Cybenko, Approximation by superpositions of a sigmoidal function. Technical Report pp. 856, . Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, 1988.

[96] G. Cybenko, Approximation by superpositions of sigmoids. Mathematics of Control, Signals, and Systems pp. 2: 303—314, . 1989.

[97] C., Darken, J. Moody., Note on learning rate schedules for stochastic optimization. R.P. Lippmann, J.E. Moody, D.S. Touretzky, Advances in Neural Information Processing Systems (3), pp. 832—838. Morgan Kaufmann, San Mateo, 1991.

[98] C., Darken, J. Moody., Towards faster stochastic gradient search. J.E. Moody, S.J. Hanson, R.P. Lippmann, Advances in Neural Information Processing Systems (4), pp. 1009—1016. Morgan Kaufmann, San Mateo, 1992.

[99] B., DasGupta, H.T. Siegelmann, E. Sontag., On the intractability of loading neural networks. V. Roychowdhury, K.-Y. Siu, A. Orlitsky, Theoretical Advances in Neural Computation and Learning, pp. 357—389. Kluwer, Boston, 1994.

[100] H.F., Davis, A.D. Snider., Introduction to Vector Analysis, 4th edition. Allyn and Bacon, Boston, 1979.

[101] L. Davis, Mapping classifier systems into neural networks. Advances in Neural Information Processing Systems (1), pp. 49—56. Morgan Kaufmann, San Mateo, 1989.

[102] H. de Garis, Genetic programming: Modular neural evolution for Darwin machines. Proceedings of the International Joint Conference on Neural Networks (San Diego), vol. vol. 1 pp. , p. 194. IEEE, . New York, 1990.

[103] C., Groot, de, D. Wurtz., "Plain backpropagation" and advanced optimization algorithms: A comparative study. Neurocomputing pp. 6: 153—161, . 1994.

[104] J., Villiers, de, E. Barnard., Backpropagation neural nets with one and two hidden layers. IEEE Transactions on Neural Networks 4(1): 136—141, . 1993.

[105] T., Denoeux, R. Lengellé. Initializing back propagation networks with prototypes. Neural Networks 6(3): 351—363, 1993.

[106] K.I., Diamantaras, and S.Y. Kung. Principal Component Neural Networks, Theory and Applications. John Wiley & Sons, New York 1996.

[107] T., Dietterich, and G. Bakiri. Error-correcting output codes: A general method for improving multiclass inductive learning programs. In D.H. Wolpert, editor, The Mathematics of Generalization, pp. 395—407. Addison-Wesley, Reading MA, 1995.

[108] N. Dodd Optimisation of network structure using genetic techniques. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1, pp. 965—970. IEEE, New York 1990.

[109] S., Dominic, R. Das, D. Whitley, and C. Anderson. Genetic reinforcement learning for neural networks. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 2, pp. 71—76. IEEE, New York 1991.

[110] Drago,G.P., and S. Ridella. Statistically controlled activation weight initialization (SCAWI). IEEE Transactions on Neural Networks 3(4): 627—631, 1992.

[111] Drucker,H., C. Cortes, L.D. Jackel, Y. LeCun, and V. Vapnik. Boosting and other ensemble methods. Neural Computation 6(6): 1289—1301, 1994.

[112] Drucker,H., and Y. Le Cun. Double backpropagation and increasing generalization performance. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 2, pp. 145—150. IEEE, New York 1991.

[113] R.O., Duda, and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.

[114] H.A.C., Eaton, and T.L. Olivier. Learning coefficient dependence on training set size. Neural Networks 5(2): 283—288, 1992.

[115] B. Efron The Jackknife, the Bootstrap and Other Resampling Plans. SIAM, Philadelphia, 1982.

[116] B., Efron, and R.J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, New York, 1993.

[117] A., Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. In Proceedings of the 1988 Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, 1988.

[118] J.L., Elman, and D. Zipser. Learning the hidden structure of speech. Journal of the Acoustical Society of America 83(4): 1615—1626, 1988.

[119] S. E. Fahlman, The recurrent cascade-correlation architecture. In R.P. Lippmann, J.E. Moody, D.S. Touretzky, editors, Advances in Neural Information Processing Systems (Denver, 1990) (3), pp. 190—196. Morgan Kaufmann, San Mateo, 1991.

[120] E., S. Fahlman, and C. Lebiere. The cascade-correlation learning architecture. In D.S. Touretzky, editors, Advances in Neural Information Processing Systems (Denver, 1989) (2), pp. 524—532. Morgan Kaufmann, San Mateo, 1990.

[121] S.E. Fahlman Faster-learning variations of back-propagation: An empirical study. In D. Touretzky, G. Hinton, T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pp. 38—51. Morgan Kaufmann, San Mateo, 1988.

[122] W., Finnoff, F. Hergert, and H.G. Zimmermann. Extended regularization methods for nonconvergent model selection. In S.J. Hanson, J.D. Cowan, C.L. Giles, editors, Advances in Neural Information Processing Systems (5), pp. 630—637. Morgan Kaufmann, San Mateo, 1993.

[123] W., Finnoff, F. Hergert, and H.G. Zimmermann. Improving model selection by nonconvergent methods. Neural Networks 6(6): 771—783, 1993.

[124] J.P., Fitch, S.K. Lehman, and F.U. Dowla. Ship wake detection procedure using conjugate gradient trained artificial neural networks. IEEE Transactions on Geoscience and Remote Sensing 29(5): 718, 1991.

[125] B., Flower, and M. Jabri. Summed weight neuron perturbation: An O(N) improvement over weight perturbation. In S.J. Hanson, J.D. Cowan, C.L. Giles, editors, Advances in Neural Information Processing Systems (5), pp. 212—219. Morgan Kaufmann, San Mateo, 1993.

[126] M.A. Franzini Speech recognition with back propagation. In Proceedings of the Ninth Annual Conference of the IEEE Engineering in Medicine and Biology Society, vol. 3, pp. 1702—1703. IEEE Press, IEEE, New York, 1987.

[127] P., Frasconi, M. Gori, and A. Tesi. Backpropagation for linearly-separable patterns: a detailed analysis. In IEEE International Conference on Neural Networks (San Francisco), vol. 3, pp. 1818—1822. IEEE, New York, 1993.

[128] M. Frean, The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation 2(2): 198—209, 1990.

[129] H., J. Friedman, and W. Stuetzle. Projection pursuit regression. Journal of the American Statistical Association 76(376): 817—823, 1981.

[130] K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, New York, 1972.

[131] K. Funahashi, On the approximate realization of continuous mappings by neural networks. Neural Networks 2(3): 183—192, 1989.

[132] A.R., Gallant, and H. White. There exists a neural network that does not make avoidable mistakes. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), vol. 1, pp. 657—664. IEEE, New York, 1988(reprinted in [395]).

[133] S.I. Gallant, Perceptron-based learning algorithms. IEEE Transactions on Neural Networks 1(2): 179—191, June 1990.

[134] P., Gallinari, S. Thiria, F. Badran, and F. Fogelman-Soulie. On the relations between discriminant analysis and multilayer perceptrons. Neural Networks 4(3): 349—360, 1991.

[135] S., Geman, and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6: 721—741, 1984. (reprinted in J.A. Anderson and E. Rosenfeld, editors, Neurocomputing, MIT Press, Cambridge, 1988).

[136] G.M., Georgiou, and C. Koutsougeras. Embedding domain information in backpropagation. In Proceedings of SPIE Conference on Adaptive and Learning Systems, Orlando, Fla. Society of Photo-Optical Instrumentation Engineers, Bellingham, WA1992.

[137] G.J., Gibson, and C.F.N. Cowan. On the decision regions of multilayer perceptrons. Proceedings of the IEEE 78(10): 1590—1594, Oct. 1990.

[138] P.E., Gill, W. Murray, and M.H. Wright. Practical Optimization. Academic Press, New York, 1981.

[139] F., Girosi, and T. Poggio. Representation properties of networks: Kolmogorov's theorem is irrelevant. Neural Computation 1(4): 465—469, 1989.

[140] F., Girosi, and T. Poggio. Networks for learning. In P. Antognetti and V. Milutinović, editors, Neural Networks: Concepts, Applications, and Implementations, chapter 6. Prentice Hall, Englewood Cliffs, NJ, 1991.

[141] D. E. Goldberg, Genetic Algorithms in Search, Optimization, & Machine Learning. Addision-Wesley, Reading, MA, 1989.

[142] M. Goldman Communication complexity and lower bounds for threshold circuits. In V. Roychowdhury, K.-Y. Siu, and A. Orlitsky, Theoretical Advances in Neural Computation and Learning, pp. 85—125. Kluwer Boston, 1994.

[143] M., Goldmann, J. Håstad, and A. Razborov. Majority gates vs. general weighted threshold gates. In Proceedings of the 7th Annual Structure in Complexity Theory Conference, pp. 2—13. IEEE Computer Society Press, Los Alamitos, CA, 1991.

[144] M., Gori, and M. Maggini. Optimal convergence of on-line backpropagation. IEEE Transactions on Neural Networks 7(1): 251—254, 1996.

[145] M., Gori, and A. Tesi. On the problem of local minima in backpropagation. IEEE Transactions on Pattern Analysis and Machine Intelligence 14(1): 76—86, 1992.

[146] R.W., Green, and C.M. DeAngelis. An empirical comparison of backpropagation training algorithms. In World Congress on Neural Networks (Portland), vol. 4, pp. 401—406. Erlbaum Hillsdale, NJ, 1993.

[147] I., Guyon, V. Vapnik, B. Boser, L. Bottou, and S.A. Solla. Structural risk minimization for character recognition. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, Advances in Neural Information Processing Systems (4), pp. 471—478. Morgan Kaufmann, San Mateo, 1992.

[148] M., Hagan, and M. Menhaj. Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks 5(6): 989—993, 1994.

[149] M., Hamamoto, J. Kamruzzaman, and Y. Kumagai. Generalization ability of artificial neural network using Fahlman and Lebiere's learning algorithm. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), vol. 1, pp. 613—618. IEEE Press, New York, 1992.

[150] L.G.C. Hamey, Comments on "Can backpropagation error surface not have local minima." IEEE Transactions on Neural Networks 5(5): 844—845, 1994.

[151] S.E., Hampson, and D.J. Volper. Linear function neurons: Structure and training. Biological Cybernetics 53: 203—217, 1986.

[152] L.K. Hansen Stochastic linear learning: Exact test and training set averages. Neural Networks 6(3): 393—396, 1993.

[153] S.J. Hanson Meiosis networks. In Advances in Neural Information Processing Systems (Denver, 1989) (2), pp. 533—541. Morgan Kaufmann, San Mateo, 1990.

[154] S.J., Hanson, and L.Y. Pratt. Comparing biases for minimal network construction with back-propagation. In D.S. Touretzky,editors, Advances in Neural Information Processing Systems (Denver, 1988) (1), pp. 177—185. Morgan Kaufmann, San Mateo, 1989.

[155] A., S. Harp, and T. Samad. Genetic optimization of self-organizing feature maps. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 1, pp. 341—346, 1991.

[156] A., S. Harp, T. Samad, and A. Guha. Designing application-specific neural networks using the genetic algorithm. In D.S. Touretzky,editors, Advances in Neural Information Processing Systems (2), pp. 447—454. Morgan Kaufmann, San Mateo, 1989.

[157] B., Hassibi, and D.G. Stork. Optimal brain surgeon. S.J. Hanson, J.D. Cowan, and C.L. Giles, Advances in Neural Information Processing Systems (Denver, 1992) (5), pp. 164—171. Morgan Kaufmann, San Mateo, 1993.

[158] B., Hassibi, D.G. Stork, and G. Wolff. Optimal brain surgeon: Extensions and performance comparisons. In J.D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems (6), pp. 263—270. Morgan Kaufmann, San Mateo, 1994.

[159] B., Hassibi, D.G. Stork, and G.J. Wolff. Optimal brain surgeon and general network pruning. In IEEE International Conference on Neural Networks (San Francisco), pp. 293—299. IEEE, New York, 1993.

[160] S. Haykin, Neural Networks, A Comprehensive Foundation. Macmillan, New York, 1993.

[161] Hecht-Nielsen, and R. Kolmogorov's mapping neural network existence theorem. In Proceedings of the IEEE First International Conference on Neural Networks (San Diego) vol. 3, pp. 11—13. IEEE, New York, 1987.

[162] Hecht-Nielsen, and R. Theory of the backpropagation neural network. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1, pp. 593—605. IEEE, New York, 1989.

[163] Hecht-Nielsen, and R. On the algebraic structure of feedforward network weight spaces. In Advanced Neural Computers, pp. 129—135. Elsevier, New York, 1990.

[164] Hecht-Nielsen, and R. The munificence of high dimensionality. In I. Aleksander J. Taylor, editors, Proceedings of the 1992 International Conference on Artificial Neural Networks (ICANN-92), vol. 2, pp. 1017—1030. Elsevier, 1992.

[165] F., Hergert, W. Finnoff, and H.G. Zimmermann. A comparison of weight elimination methods for reducing complexity in neural networks. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), vol. 3, pp. 980—987. IEEE, New York, 1992.

[166] Hertz, J., A. Krogh, and R.G. Palmer. Introduction to the Theory of Neural Computation. Addison-Wesley, Reading, MA, 1991.

[167] J.A., Hertz, and A. Krogh. Statistical dynamics of learning. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors, Artificial Neural Networks (ICANN-91), vol. 1, pp. 125—131. Elsevier, New York, 1991.

[168] J., Higashino, B.L. de Greef, and E.H.J. Persoon. Numerical analysis and adaptation method for learning rate of back propagation. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1, pp. 627—630. IEEE, New York, 1990.

[169] G.E. Hinton Connectionist learning procedures. Artificial Intelligence 40(1): 143—150, 1989.

[170] Y., K. Hirose Yamashita, and S. Hijiya. Back-propagation algorithm which varies the number of hidden units. Neural Networks 4(1): 61—66, 1991.

[171] M., Hoehfeld, and S.E. Fahlman. Learning with limited numerical precision using the cascade-correlation algorithm. IEEE Transactions on Neural Networks 3(4): 602—611, 1992.

[172] S.B. Holden, How practical are VC dimension bounds. In IEEE International Conference on Neural Networks (Orlando), pp. 327—332. IEEE, New York, 1994.

[173] J. H. Holland, Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, 1975.

[174] Holmström, L., and P. Koistinen. Using additive noise in back-propagation training. IEEE Transactions on Neural Networks 3(1): 24—38, Jan. 1992.

[175] K. Hornik, Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2): 251—257, 1991.

[176] K. Hornik Some new results on neural network approximation. Neural Networks 6(8): 1069—1072, 1993.

[177] K., Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks 2(5): 359—366, 1989(reprinted in [395]).

[178] D.W., Hosmer, and S. Lemeshow. Applied Logistic Regression. Wiley, New York, 1989.

[179] J.T., Hsiung, W. Suewatanakul, and D.M. Himmelblau. Should back propagation be replaced by more effective optimization algorithms? In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 1, pp. 353—356. IEEE, New York, 1991.

[180] W.Y., Huang, and R.P. Lippmann. Neural net and traditional classifiers. In D. Anderson,editors, Neural Information Processing Systems (Denver 1987), pp. 387—396. American Institute of Physics, New York, 1988.

[181] D.R., Hush, B. Horne, and J.M. Salas. Error surfaces for multi-layer perceptrons. IEEE Transactions on Systems, Man, and Cybernetics 22(5): 1152—1161, 1992.

[182] D.R., Hush, and J.M. Salas. Improving the learning rate of back-propagation with the gradient reuse algorithm. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), vol. 1, pp. 441—447. IEEE, New York, 1988.

[183] D.R., Hush, J.M. Salas, and B. Horne. Error surfaces for multi-layer perceptrons. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 1, pp. 759—764. IEEE, New York, 1991.

[184] J.-N., Hwang, J.J. Choi, S. Oh, and R.J. Marks II. Query learning based on boundary search and gradient computation of trained multilayer perceptrons. In Proceedings of the International Joint Conference on Neural Networks (San Diego), vol. 3, pp. 57—62. IEEE, New York, June 1990.

[185] J.-N., Hwang, S.-R. Lay, M. Maechler, D. Martin, and J. Schimert. Regression modeling in backpropagation and projection pursuit learning. IEEE Transactions on Neural Networks 5(3): 342—353, 1994.

[186] B., Irie, and S. Miyake. Capabilities of three-layered perceptrons. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), vol. 1, pp. 641—648. IEEE, New York, 1988.

[187] M. Ishikawa, A structural learning algorithm with forgetting of link weights. Technical Report TR-90-7. Electrotechnical Laboratory, Tsukuba-City, Japan1990.

[188] Y. Ito Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory. Neural Networks 4(3): 385—394, 1991.

[189] Y., Izui, and A. Pentland. Analysis of networks with redundancy. Neural Computation 2(2): 226—238, 1990.

[190] Y., Izui, and A. Pentland. Speeding up back propagation. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), vol. 1, pp. 639—642. IEEE, New York, 1990.

[191] M., Jabri, and B. Flower. Weight perturbation: An optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks. Neural Computation 3(4): 546—565, 1991.

[192] M.A., Jabri, and B. Flower. Weight perturbation: An optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks. IEEE Transactions on Neural Networks 3(1): 154—157, 1992.

[193] R.A. Jacobs Initial experiments on constructing domains of expertise and hierarchies in connectionist systems. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pp. 144—1153. Morgan Kaufmann, San Mateo, 1988.

[194] R.A. Jacobs Increased rates of convergence through learning rate adaptation. Neural Networks 1(4): 295—307, 1988.

[195] C., Ji, R.R. Snapp, and D. Psaltis. Generalizing smoothness constraints from discrete samples. Neural Computation 2(2): 188—197, 1990.

[196] Q., Jia, K. Hagiwara, N. Toda, and S. Usui. Equivalence relation between the backpropagation learning process of an FNN and that of an FNNG. Neural Networks 7(2): 411, 1994.

[197] E.M., Johansson, F.U. Dowla, and D.M. Goodman. Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. International Journal of Neural Systems 2(4): 291—301, 1991.

[198] F., Jordan, and G. Clement. Using the symmetries of a multi-layered network to reduce the weight space. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 2, pp. 391—396. IEEE, New York, 1991.

[199] M.I. Jordan Constrained supervised learning. Journal of Mathematical Psychology 36(3): 396—425, 1992.

[200] M.I., Jordan, and R.A. Jacobs. Modularity, unsupervised learning, and supervised learning. In Connectionism: Theory and Practice, pp. 21—29. Oxford University Press, Oxford, 1992.

[201] M.I., Jordan, and D.E. Rumelhart. Forward models: Supervised learning with a distal teacher. Cognitive Science 16(3): 307—354, 1992.

[202] J.S., Judd, editors, Neural Network Design and the Complexity of Learning. MIT Press, Cambridge, 1990.

[203] S. Judd Learning in neural networks. In D. Haussler and L. Pitt, editors, Proceedings of the 1988 Workshop on Computational Learning Theory, pp. 2—8. Morgan Kaufmann, San Mateo, 1988.

[204] S. Judd, On the complexity of loading shallow neural networks. Journal of Complexity 4(3): 177—192, 1988.

[205] R., Kamimura, T. Takagi, and S. Nakanishi. Improving generalization performance by information minimization. IEEE International Conference on Neural Networks (Orlando), pp. 143—147. IEEE, New York, 1994.

[206] E.R., Kandel and J.H. Schwartz, editors, Principles of Neural Science, 2nd edition.Elsevier, New York, 1985.

[207] E.D. Karnin A simple procedure for pruning back-propagation trained neural networks. IEEE Transactions on Neural Networks 1(2): 239—242, 1990.

[208] V., Kasparian, C. Batur, H. Zhang, and J. Padovan. Davidon least squares based learning algorithm for feedforward neural networks. Neural Networks 7(4): 661—670, 1994.

[209] R., Keesing, and D. G. Stork. Evolution and learning in neural networks: The number and distribution of learning trials affect the rate of evolution. In R.P. Lippmann, J.E. Moody, and D.S Touretzky, editors, Advances in Neural Information Processing (Denver) (3), pp. 804—810. Morgan Kaufmann, San Mateo, 1991.

[210] J.A. Kinsella, Comparison and evaluation of variants of the conjugate gradient method for efficient learning in feed-forward neural networks with backward error propagation. Network: Computation in Neural Systems 3(1): 27—35, 1992.

[211] S., Kirkpatrick, Gelatt C.D. Jr. and M.P. Vecchi. Optimization by simulated annealing. Science 220: 671—680, 1983 (reprinted in J.A. Anderson E. Rosenfeld, editors, Neurocomputing, MIT Press, Cambridge, 1988).

[212] T. Kohonen, Self-Organization and Associative Memory, vol. 8 of Springer Series in Information Sciences, 2nd edition.Springer-Verlag, Berlin, 1988.

[213] P., Koiran, and E.D. Sontag. Neural networks with quadratic VC dimension. In D.S. Touretsky M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems (8), pp. 197—203. MIT Press, Cambridge, 1996.

[214] P., Koistinen, and L. Holmström. Kernel regression and backpropagation training with noise. In Proceedings of the International Joint Conference on Neural Networks (Singapore), pp. 367—372. IEEE, New York, 1991.

[215] P., Koistinen, and L. Holmström. Kernel regression and backpropagation training with noise. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, Advances in Neural Information Processing Systems (4), pp. 1035—1039. Morgan Kaufmnn San Mateo, 1992.

[216] J.F., Kolen, and J.B. Pollack. Backpropagation is sensitive to initial conditions. Complex Systems 4(3): 269—280, 1990.

[217] J.F., Kolen, and J.B. Pollack. Back propagation is sensitive to initial conditions. In Advances in Neural Information Processing Systems (3), pp. 860—867. Morgan Kaufmann San Mateo, 1991.

[218] S., Kollias, and D. Anastassiou. An adaptive least squares algorithm for the efficient training of multilayered networks. IEEE Transactions on Circuits and Systems 36: 1092—1101, 1989.

[219] A.N. Kolmogorov, On the representation of continuous functions of several variables by superpositions of continuous functions of one variable and addition. Doklady Akademii Nauk SSSR 114(5): 953—956, 1957(in Russian).

[220] A.N. Kolmogorov, On the representation of continuous functions of several variables by superpositions of continuous functions of one variable and addition. American Mathematical Society Translations 28: 55—59, 1963.

[221] J. Koza, Genetic Programming. MIT Press, Cambridge, 1992.

[222] J.R. Koza, A genetic approach to the truck backer upper problem and the inter-twined spiral problem. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), vol. 4, pp. 310—318. IEEE, New York, 1992.

[223] J.R., Koza, and M.A. Keane. Cart centering and broom balancing by genetically breeding populations of control strategy programs. In Proceedings of the International Joint Conference on Neural Networks (San Diego), vol. 1, p. 198. IEEE, New York, 1990.

[224] J.R., Koza, and J.P. Rice. Genetic generation of both the weights and architecture for a neural network. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 2, p. 397. IEEE, New York, 1991.

[225] A.H., Kramer, and A. Sangiovanni-Vincentelli. Efficient parallel learning algorithms for neural networks. In Advances in Neural Information Processing Systems (1), pp. 40—48. Morgan Kaufmann, San Mateo, 1989.

[226] V.Y. Kreinovich, Arbitrary nonlinearity is sufficient to represent all functions by neural networks: A theorem. Neural Networks 4(3): 381—383, 1991.

[227] A., Krogh, and J.A. Hertz. A simple weight decay can improve generalization. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors, Advances in Neural Information Processing Systems (4), pp. 950—957. Morgan Kaufmann San Mateo, 1992.

[228] J.K. Kruschke, Creating local and distributed bottlenecks in hidden layers of back-propagation networks. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pp. 120—126, Morgan Kaufmann San Mateo, 1989.

[229] J.K. Kruschke Improving generalization in back-propagation networks with distributed bottlenecks. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), vol. 1, pp. 443—447. IEEE, New York, 1989.

[230] J.K., Kruschke, and J.R. Movellan. Benefits of the gain: Speeded learning and minimal hidden layers in back-propagation networks. IEEE Transactions on Systems, Man, and Cybernetics 21(1): 273—280, 1991.

[231] V. Kurkova, Kolmogorov's theorem and multilayer neural networks. Neural Networks 5(3): 501—506, 1992.

[232] V., Kurkova, and P.C. Kainen. Functionally equivalent feedforward neural networks. Neural Computation 6(3): 544—558, 1994.

[233] K.J., Lang, and M.J. Witbrock. Learning to tell two spirals apart. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pp. 52—59. Morgan Kaufmann San Mateo, 1989.

[234] A., Lapedes, and R. Farber. How neural nets work. In D. Anderson, editor, Neural Information Processing Systems (Denver 1987), pp. 442—456. American Institute of Physics, New York, 1988.

[235] Lari-Najafi, H., M. Nasiruddin, and T. Samad. Effect of initial weights on back-propagation and its variations. In IEEE International Conference on Systems, Man, and Cybernetics, vol. 1, pp. 218—219. IEEE, New York, 1989.

[236] Y. Le Cun Generalization and network design strategies. Technical Report CRG-TR-89-4, University of Toronto, Department of Computer Science, 1989.

[237] J., Lee, and Z. Bien. Improvement of function approximation capability of backpropagation neural networks. In Proceedings of the International Joint Conference on Neural Networks (Singapore), vol. 2, pp. 1367—1372. IEEE, New York, 1991.

[238] J.S.-J., Lee, J.-N. Hwang, D.T. Davis, and A.C. Nelson. Integration of neural networks and decision tree classifiers for automated cytology screening. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 1, pp. 257—262. IEEE, New York, 1991.

[239] Y., Lee, and R.P. Lippmann. Practical characteristics of neural network and conventional pattern classifiers on artificial and speech problems. In Advances in Neural Information Processing Systems (2), pp. 168—177. Morgan Kaufmann San Mateo, 1990.

[240] Lee, Y., S.-H. Oh, and M.W. Kim. The effect of initial weights on premature saturation in back-propagation learning. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 1, pp. 765—770. IEEE, New York, 1991.

[241] Lee, Y., S.-H. Oh, and M.W. Kim. An analysis of premature saturation in back propagation learning. Neural Networks 6(5): 719—728, 1993.

[242] A.U., Levin, T.K. Leen, and J.E. Moody. Fast pruning using principal components. In J.D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems (6), pp. 35—42. Morgan Kaufmann, San Mateo, 1994.

[243] E., Levin, N. Tishby, and S.A. Solla. A statistical approach to learning and generalization in layered neural networks. Proceedings of the IEEE 78(10): 1568—1574, Oct.1990.

[244] J.-N., Lin, and R. Unbehauen. On the realization of a Kolmogorov network. Neural Computation 5(1): 18—20, 1993.

[245] W.P., Lincoln, and J. Skrzypek. Synergy of clustering multiple backpropagation networks. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems (Denver, 1989) (2), pp. 650—657. Morgan Kaufmann San Mateo, 1990.

[246] A., Linden, and J. Kindermann. Inversion of multilayer nets. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 2, pp. 425—430. IEEE, New York, 1989.

[247] R.P. Lippmann, An introduction to computing with neural nets. ASSP Magazine, pp. 4—22. April 1987.

[248] E., Littmann, and H. Ritter. Cascade network architectures. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 398—404. IEEE, New York, 1992.

[249] H.C. Lui Analysis of decision contour of neural network with sigmoidal nonlinearity. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), pp. 655—658. IEEE, New York, 1990.

[250] W., Maass, G. Schnitger, and E.D. Sontag. On the computational power of sigmoid versus boolean threshold circuits. In Proceedings of the 32nd Annual Symp. on Foundations of Computer Science, pp. 767—776. IEEE, Computer Society Press, Los Alamitos, CA, 1991.

[251] D.J.C. MacKay, Bayesian interpolation. Neural Computation 4(3): 415—447, 1992.

[252] D.J.C. MacKay, The evidence framework applied to classification networks. Neural Computation 4(5): 720—736, 1992.

[253] D. J. C. MacKay, A practical Bayesian framework for backpropagation networks. Neural Computation 4(3): 448—472, 1992.

[254] J. Makhoul, Pattern recognition properties of neural networks. In B. H. Juang, S. Y. Kung, C. A. Kamm, Neural Networks for Signal Processing: Proceedings of the 1991 IEEE Workshop, pp. 173—187. IEEE Press, New York 1991.

[255] J., Makhoul, A. El-Jaroudi, and R. Schwartz. Formation of disconnected decision regions with a single hidden layer. Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1, pp. 455—460. IEEE, New York 1989.

[256] M., Marchand, M. Golea, and P. Ruján. A convergence theorem for sequential learning in two-layer perceptrons. Europhysics Letters 11(6): 487—492, 1990.

[257] R. J., II. Marks, Introduction to Shannon Sampling and Interpolation Theory. Springer-Verlag, New York 1991.

[258] T., editor. Masters, Advanced Algorithms for Neural Networks, A C++ Sourcebook. Wiley, New York 1995.

[259] K. Matsuoka, An approach to generalization problem in back-propagation learning. International Neural Network Conference (Paris), 2:765—768, 1990.

[260] K. Matsuoka, Noise injection into inputs in back-propagation learning. IEEE Transactions on Systems, Man, and Cybernetics 22(3): 436—440, 1992.

[261] J. L., and McClelland, D. E. Rumelhart. Training hidden units: The generalized delta rule. In Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises, pp. 121—160. MIT Press, Cambridge 1988.

[262] J.M., McInerney, K.G. Haines, S. Biafore, and R. Hecht-Nielsen. Can back propagation error surfaces have non-global minima? Technical report, Department of Electrical and Computer Engineering, University of California at San Diego, August 1988.

[263] J.M., McInerney, K.G. Haines, S. Biafore, and R. Hecht-Nielsen. kack propagation error surfaces can have local minima. Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), vol. 3, pp. 627, 1989(abstract).

[264] N., Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. Equations of state calculations by fast computing machines. Journal of Chemistry and Physics 21: 1087—1091, 1953.

[265] M., and Mézard, J.-P. Nadal. Learning in feedforward layered networks: The tiling algorithm. Journal of Physics A 22: 2191—2203, 1989.

[266] A., A. Minai, R.D. Williams. Acceleration of back-propagation through learning rate and momentum adaptation. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), vol. 1, pp. 676—679. IEEE, New York 1990.

[267] J.I. Minnix, Fault tolerance of the backpropagation neural network trained on noisy inputs. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), vol. 1, pp. 847—852. IEEE, New York 1992.

[268] M., and Minsky, S. Papert. Perceptrons, Expanded Edition. MIT Press, Cambridge 1988.

[269] G.J., and Mitchison, R.M. Durbin. Bounds on the learning capacity of some multi-layer networks. Biological Cybernetics 60: 345—356, 1989.

[270] M.F. Moller, A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks 6(4): 525—533, 1993.

[271] J., and Moody, C. Darken. Learning with localized receptive fields. Proceedings of the 1988 Connectionist Models Summer School, pp. 133—143. Morgan Kaufmann, San Mateo, 1988.

[272] J., and Moody, C. Darken. Fast learning in networks of locally-tuned processing units. Neural Computation 1:281—294, 1989.

[273] J.E. Moody, The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems, In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors, Advances in Neural Information Processing Systems (4), pp. 847—854. Morgan Kaufmann, San Mateo, 1992.

[274] B. Moore, Theory of networks for learning. In SPIE Vol. 1294, Applications of Artificial Neural Networks, pp. 22—30. Society of Photo-Optical Instrumentation Engineers, Bellingham, WA, 1990.

[275] M.C., and Mozer, P. Smolensky. Skeletonization: A technique for trimming the fat from a network via relevance assessment. In D.A. Touretzky, editor, Advances in Neural Information Processing Systems (Denver, 1988) (1), pp. 107—115. Morgan Kaufmann, San Mateo, 1989.

[276] S., Mukhopadhyay, A. Roy, L.S. Kim, and S. Govil. A polynomial time algorithm for generating neural networks for pattern classification: Its stability properties and some test results. Neural Computation 5(2): 317—330, 1993.

[277] B., and Müller, J. Reinhardt. Neural Networks, An Introduction. Springer-Verlag, Berlin, 1990.

[278] S. Muroga, Threshold logic and its Applications. Wiley, New York, 1971.

[279] O.J. Murphy, Nearest neighbor pattern classification perceptrons. Proceedings of the IEEE 78(10): 1595—1598, 1990.

[280] O.J. Murphy, An information theoretic design and training algorithm for neural networks. IEEE Transaction on Circuits and Systems 38(12): 1542—1547, 1991.

[281] M.T., Musavi, K.H. Chan, D.M. Hummels, and K. Kalantri. On the generalization ability of neural network classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(6): 659—663, 1994.

[282] B.K. Natarajan, Machine Learning, A Theoretical Approach. Morgan Kaufmann, San Mateo, 1991.

[283] D.H., and Nguyen, B. Widrow. Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Proceedings of the International Joint Conference on Neural Networks (San Diego), vol. 3, pp. 211—226. IEEE, New York, 1990.

[284] N.J. Nilsson, The Mathematical Foundations of Learning Machines. Morgan-Kaufmann, San Mateo, 1990(reprint).

[285] S.J., and Nowlan, G.E. Hinton. Adaptive soft weight tying using Gaussian mixtures. In J.E. Moody,S.J. Hanson, R.P. Lippmann, editors, Advances in Neural Information Processing Systems (4), pp. 993—1000. Morgan Kaufmann, San Mateo, 1992.

[286] S.J., and Nowlan, G.E. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation 4(4): 473—493, 1992.

[287] S., Oh, R.J. Marks, II, and M.A. El-Sharkawi. Query based learning in a multilayered perceptron in the presence of data jitter. In M.A. El-Sharkawi, and R.J. Marks, II, editors, Applications of Neural Networks to Power Systems (Seattle) pp. 72—75. IEEE Press, New York, 1991.

[288] E. Oja, Neural networks, principal components, and subspaces. International Journal of Neural Systems 1(1): 61—68, 1089.

[289] E. Oja, A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology 15:267—273, 1982.

[290] E., and Oja, J. Karhunen. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Mathematical Analysis and Applications 106:69—84, 1985.

[291] H., Okada, et al. Initializing multilayer neural networks with fuzzy logic. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), vol. 1, pp. 239—244. IEEE, New York, 1992.

[292] A.V., Oppenheim, A.S. Willsky, and I.T. Young. Signals and Systems. Prentice-Hall, Englewood Cliffs, NJ, 1983.

[293] D.A., and Pados, P. Papantoni-Kazakos. A note on the estimation of the generalization error and the prevention of overfitting. In IEEE International Conference on Neural Networks (Orlando), vol. 1, pp. 321—325. IEEE, New York, 1994.

[294] G. Palubinskas, Data-driven weight initialization of back-propagation for pattern recognition. In Proceedings of the International Conferance on Artificial Neural Networks (ICANN'94), pp. 851—854. Springer-Verlag, London, 1994.

[295] D.B. Parker, Learning logic. Technical Report TR-47. Center for Computational Research in Economics and Management Science, Massachusetts Institute of Technology, Cambridge, 1985.

[296] D.B. Parker, Optimal algorithms for adaptive networks: Second-order back propagation, second-order direct propagation, and second-order Hebbian learning. In Proceedings of the IEEE First International Conference on Neural Networks (San Diego), vol. 2, pp. 593—600. IEEE, New York, 1987.

[297] B.A. Pearlmutter, Fast exact multiplication by the Hessian. Neural Computation 6(1): 147—160, 1994.

[298] B.A., and Pearlmutter, R. Rosenfeld. Chaitin-Kolmogorov complexity and generalization in neural networks. In R. Lippmann, J. Moody, and D. Touretzky, editors, Advances in Neural Information Processing Systems (3), pp. 925—931. Morgan Kaufmann, San Mateo, 1991.

[299] D.C., Plaut, S.J. Nowlan, and G.E. Hinton. Experiments on learning by back propagation. Technical Report CMU-CS-86—126, Carnegie-Mellon University, Pittsburgh, 1986.

[300] T., Poggio, and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE 78(9): 1481—1497, Sept.1990.

[301] T., Poston, C.-N. Lee, Y. Choie, and Y. Kwon. Local minima and back propagtion. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 2, pp. 173—176. IEEE, New York, 1991.

[302] W.H., Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling. Numerical Recipes in C. Cambridge University Press, Cambridge, 1988.

[303] S., and Raudys, M. Skurikhina. The role of the number of training samples on weight initialization of artificial neural net classifier. In RNNS/IEEE Symposium on Neuroinformatics and Neurocomputing (Rostov-on-Don, Russia), pp. 343—353. IEEE Press, New York, 1992.

[304] I. Rechenberg, Artificial evolution and artificial intelligence. In R. Forsyth, editor, Machine Learning, Principles and Techniques, pp. 83—103. Chapman and Hall Computing, New York, 1989.

[305] R., Reed, R.J. Marks, II, and S. Oh. An equivalence between sigmoidal gain scaling and training with noisy (jittered) input data. In RNNS/IEEE Symposium on Neuroinformatics and Neurocomputing (Rostov-on-Don, Russia), pp. 120—127. IEEE Press, New York, 1992.

[306] R., Reed, R.J. Marks, II, and S. Oh. Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter. IEEE Transactions on Neural Networks 6(3): 529—538, May 1995.

[307] R., Reed, S. Oh, and II. R.J. Marks, Regularization using jittered training data. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), vol. 3, pp. 147—152. IEEE, New York, 1992.

[308] R.D. Reed, Pruning algorithmsߞa survey. IEEE Transactions on Neural Networks, 4(5): 740—744, 1993.

[309] R.D., Reed, J.E. Sanders, and II. R.J. Marks, Neural network aided prosthetic alignment. In 1995 IEEE International Conference on Systems, Man, and Cybernetics, Vancouver, British Columbia, Oct. 1995, vol. 1, pp. 505—508, IEEE, New York, 1995.

[310] Ph. Refregier, and J.-M. Vignolle. An improved version of the pseudo-inverse solution for classification and neural networks. Europhysics Letters 10(4): 387—392, 1989.

[311] G.V., Reklaitis, A. Ravindran, and K.M. Ragsdell. Engineering Optimization, Methods and Applications. Wiley, 1983.

[312] A., and Rezgui, NazifTepedelenlioglu. The effect of the slope of the activation function on the back propagation algorithm. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1, pp. 707—710. IEEE, New York, 1990.

[313] L.P., Ricotti, S. Ragazzini, and G. Martinelli. Learning of word stress in a sub-optimal second order backpropagation neural network. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), vol. 1, pp. 355—361. IEEE, New York, 1988.

[314] M. Riedmiller, Advanced supervised learning in multi-layer perceptronsߞfrom backpropagation to adaptive learning algorithms. Computer Standards & Interfaces 16, 1994.

[315] M., and Riedmiller, H. Braun. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In IEEE International Conference on Neural Networks (San Francisco), vol. 1, pp. 586—591. IEEE, New York, 1993.

[316] A.K., Rigler, J.M. Irvine, and T.P. Vogl. Rescaling of variables in back propagation learning. Neural Networks 4(2): 225—229, 1991.

[317] B.D. Ripley, Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, 1996.

[318] H., and Robbins, S. Monro. A stochastic optimization method. Annals of Mathematical Statistics, 22:400—407, 1951.

[319] D. Rogers, Predicting weather using a genetic memory: a combination of Kanerva's sparse distributed memory with Holland's genetic algorithms. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems (2), pp. 455—464. Morgan Kaufmann, San Mateo, 1989.

[320] R. Rohwer, Time trials on second-order and variable-learning-rate algorithms. In Advances in Neural Information Processing Systems (3), pp. 977—983. Morgan Kaufmann, San Mateo, 1991.

[321] R. Rojas, Optimal weight initialization for neural networks. In Proceedings of the International Conference on Artificial Neural Networks (ICANN'94), pp. 577—580. Springer-Verlag, London, 1994.

[322] S.G., and Romaniuk, L.O. Hall. Dynamic neural networks with the use of divide and conquer. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), vol. 1, pp. 658—663. IEEE, New York, 1992.

[323] M., Röscheisen, R. Hofmann, and V. Tresp. Neural control for rolling mills: Incorporating domain theories to overcome data deficiency. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors Advances in Neural Information Processing Systems (4), pp. 659—666. Morgan Kaufmann, San Mateo, 1992.

[324] F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 65:386—408, 1958 (reprinted in In J.A. Anderson, and E. Rosenfeld, editors, Neurocomputing. MIT Press, Cambridge, 1988).

[325] F., editor.Rosenblatt, Principles of Neurodynamics. Spartan Books, Washington, D.C., 1962.

[326] S., and Roy, J.J. Shynk. Analysis of the momentum LMS algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processsing 38:2088—2098, 1990.

[327] D.L., and Ruderman, W. Bialek. Seeing beyond the Nyquist limit. Neural Computation 4(5):682—690, 1992.

[328] D.E., Rumelhart, R. Durbin, R. Golden, and Y. Chauvin. Backpropagation: The basic theory. In Y. Chauvin, and D.E. Rumelhart, editors, Backpropagation: Theory, Architectures, and Applications, chapter 1. Erlbaum, Hillsdale, NJ, 1994.

[329] D.E., Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations by back-propagating errors. Nature 323:533—536, 1986 (reprinted in J.A. Anderson, and E. Rosenfeld, editors, Neurocomputing. MIT Press, Cambridge, 1988).

[330] D.E., Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations by error propagation. In D.E. Rumelhart, and J.L. McClelland, editors, Parallel Distributed Processing, Vol. I, chapter 8. MIT Press, Cambridge, 1986(reprinted in J.A. Anderson, and E. Rosenfeld, editors, Neurocomputing. MIT Press, Cambridge, 1988).

[331] S., Saarinen, R. Bramley, and G. Cybenko. Ill-conditioning in neural network training problems. SIAM Journal of Scientific Computing 14(3): 693—714, 1993.

[332] T. Samad, Backpropagation improvements based on heuristic arguments. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1, pp. 565—568. IEEE, New York, 1990.

[333] T.D. Sanger, Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks 2(6): 459—473, 1989.

[334] T.D. Sanger, An optimality principle for unsupervised learning. In D.S. Tourenzky, editor, Advances in Neural Information Processing Systems (1), pp. 11—19. Morgan Kaufmann, San Mateo, 1989.

[335] A., and Sankar, R.J. Mammone. Optimal pruning of neural tree networks for improved generalization. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 2, pp. 219—224. IEEE, New York, 1991.

[336] D.V., and Schreibman, E.M. Norris. Speeding up back propagation by gradient correlation. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), vol. 1, pp. 723—726. IEEE, New York, 1990.

[337] D.B., Schwartz, V.K. Samalan, S.A. Solla, and J.S. Denker. Exhaustive learning. Neural Computation 2(3): 374—385, 1990.

[338] B.E., and Segee, M.J. Carter. Fault tolerance of pruned multilayer networks. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 2, pp. 447—452. IEEE, New York, 1991.

[339] C.H., and Séquin, R.D. Clay. Fault tolerance in feed-forward artificial neural networks. In PaoloAntognetti and Veljko Milutinović, editors, Neural Networks: Concepts, Applications, and Implementations, vol. 4, pp. 111—141. Prentice-Hall, Englewood Cliffs, NJ, 1991.

[340] I.K. Sethi, Entropy nets: From decision trees to neural networks. Proceedings of the IEEE 78(10): 1605—1613, Oct. 1990.

[341] D.F. Shanno, Recent advances in numerical techniques for large scale optimization. In W.T. Miller, III, R.S. Sutton, and P.J. Werbos, editors, Neural Networks for Control, pp. 171—178. MIT Press, Cambridge, 1990.

[342] J.W. Shavlik, A framework for combining symbolic and neural learning. In V. Honavar, and L. Uhr, editors, Artificial Intelligence and Neural Networks: Steps Toward Principled Integration, pp. 561—580. Academic Press, New York, 1994.

[343] J.J., and Shynk, S. Roy. The LMS algorithm with momentum updating. In Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 2651—2654. IEEE, New York, 1988.

[344] J., and Sietsma, R.J.F. Dow. Neural net pruningߞwhy and how. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), vol. 1, pp. 325—333. IEEE, New York, 1988.

[345] J., and Sietsma, R.J.F. Dow. Creating artificial neural networks that generalize. Neural Networks 4(1): 67—79, 1991.

[346] F.M., and Silva, L.B. Almeida. Acceleration techniques for the backpropagation algorithm. In L.B. Almeida andC.J. Wellekens, editors, Neural Networks, Proceedings EURASIP Workshop, vol. 412 of Lecture Notes in Computer Science, pp. 110—119. Springer-Verlag, New York, 1990.

[347] K.-Y., Siu, V. Roychowdhury, and T. Kailath. Discrete Neural Computation, A Theoretical Foundation. Prentice-Hall, Englewood Cliffs, NJ, 1995.

[348] K.-Y., Siu, V.P. Roychowdhury, and T. Kailath. Depth-size tradeoffs for neural computation. IEEE Transactions on Computers 40(12): 1402—1412, 1991.

[349] S.G. Smyth, Designing multi layer perceptrons from nearest neighbor systems. IEEE Transactions on Neural Networks 3(2): 329—333, 1992.

[350] E.D. Sontag, Feedback stabilization using two-hidden-layer nets. Technical Report SYCON-90-11, Rutgers Center for Systems and Control, Princeton, NJ1990.

[351] E.D. Sontag, Feedback stabilization using two-hidden-layer nets. IEEE Transactions on Neural Networks 3(6):981—990, 1992.

[352] E.D., and Sontag, H.J. Sussmann. Backpropagation can give rise to spurious local minima even for networks without hidden layers. Complex Systems 3:91—106, 1989.

[353] E.D., and Sontag, H.J. Sussmann. Backpropagation separates when perceptrons do. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), vol. 1, pp. 639—642. IEEE Press, New York, 1989.

[354] A., Sperduti, and A. Starita. Speed up learning and network optimization with extended back propagation. Neural Networks 6(3): 365—383, 1993.

[355] D.A. Sprecher, On the structure of continuous functions of several variables. Transactions of the American Mathematical Society 115(3):340—355, 1965.

[356] D.A. Sprecher, A universal mapping for Kolmogorov's superposition theorem. Neural Networks 6(8): 1089—1094, 1993.

[357] M., Stevenson, R. Winter, and B. Widrow. Sensitivity of feedforward neural networks to weight errors. IEEE Transactions on Neural Networks 1(1):71—80, 1990.

[358] M., and Stinchcombe, H. White. Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1, pp. 613—617. IEEE, New York, 1989(reprinted in [395]).

[359] G.O. Stone, An analysis of the delta rule and the learning of statistical associations. In D.E. Rumelhart, and J.L. McClelland, editors, Parallel Distributed Processing, vol. I, chapter 11. MIT Press, Cambridge, 1986.

[360] J.V., and Stone, R. Lister. On the relative time complexities of standard and conjugate gradient back propagation. In IEEE International Conference on Neural Networks (Orlando), vol. 1, pp. 84—87. IEEE, New York, 1994.

[361] S.C. Suddarth, The Symbolic-Neural Method For Creating Models and Control Behaviors From Examples. Ph.D. thesis, University of Washington, Seattle, 1988.

[362] H.J. Sussmann, Uniqueness of the weights for minimal feedforward nets with a given input-output map. Neural Networks 5(4):589—593, 1992.

[363] R.S. Sutton, Two problems with backpropagation and other steepest-descent learning procedures for networks. In Proceedings of the 8th Annual Conference of the Cognitive Science Society, pp. 823—831. Erlbaum, Hillsdale, NJ, 1986.

[364] R.S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3:9—44, 1988.

[365] C., Svarer, L.K. Hansen, and J. Larsen. On design and evaluation of tapped-delay neural network architectures. In IEEE International Conference on Neural Networks (San Francisco), pp. 977—984. IEEE, New York, 1993.

[366] R. Tawel, Does the neuron "learn" like the synapse. In Advances in Neural Information Processing Systems (1), pp. 169—176. Morgan KaufmannSan Mateo, 1989.

[367] G., and Tesauro, B. Janssens. Scaling relationships in back-propagation learning. Complex Systems 2:39—44, 1988.

[368] G., and Thimm, E. Fiesler. Neural network initialization. In International Workshop on Artificial Neural Networks, Malaga-Torremolinos, Spain, June 1995, pp. 535—542. Springer-Verlag, New York, 1995.

[369] G., Thimm, P. Moerland, and E. Fiesler. The interchangeability of learning rate and gain in backpropagation neural networks. Neural Computation 8(2):451—460, 1996.

[370] A.N., Tikhonov, and V.Y. Arsenin. Solutions of Ill-Posed Problems. V.H. Winston & Sons, Washington D.C., 1977.

[371] N., Tishby, E. Levin, and S.A. Solla. Consistent inference of probabilities in layered networks: Predictions and generalization. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), vol. 2, p. 403. IEEE, New York, 1989.

[372] T. Tollenaere, SuperSAB: Fast adaptive back propagation with good scaling properties. Neural Networks 3(5): 561—573, 1990.

[373] D.J. Toms, Training binary node feedforward neural networks by backpropagation of error. IEE Electronic Letters 26(21): 1745—1746, 1990.

[374] G., and G. Towell, J.W. Shavlik. Interpretation of artificial neural networks: Mapping knowledge-based neural networks into rules. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors, Advances in Neural Information Processing Systems (4), pp. 977—984. Morgan Kaufmann, San Mateo, 1992.

[375] G.G., and Towell, J.W. Shavlik. Using knowledge-based neural networks to refine roughly-correct information. In S.J. Hanson, T. Petsche, M. Kearns, and R.L. Rivest, editors, Computational Learning Theory and Natural Learning Systems, Vol. 2, Intersections between Theory and Experiment, pp. 65—80. MIT Press, Cambridge, 1994.

[376] L.G. Valiant, A theory of the learnable. Communications of the ACM 27(11):1134—1142, 1984.

[377] P.P. van der Smagt, Minimization methods for training feedforward neural networks. Neural Networks 7(1): 1—11, 1994.

[378] V. Vapnik, Principles of risk minimization for learning theory. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors, Advances in Neural Information Processing Systems (4), pp. 831—838. Morgan Kaufmann, San Mateo, 1992.

[379] V.N., and Vapnik, A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications 16:264—280, 1971.

[380] T.P., Vogl, J.K. Mangis, A.K. Rigler, W.T. Zink, and D.L. Alkon. Accelerating the convergence of the back-propagation method. Biological Cybernetics 59:257—263, 1988.

[381] J.H., Wang, T.F. Krile, and J.F. Walkup. Reduction of interconnection weights in higher order associative memory networks. Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 2, pp. 177—182. IEEE, New York, 1991.

[382] P.D. Wasserman, Experiments in translating chinese characters using backpropagation. In Proceedings of IEEE Computer Society Conference, pp. 399—402, February 1988.

[383] R.L. Watrous, Learning algorithms for connectionist networks: Applied gradient methods of nonlinear optimization. In Proceedings of the IEEE First International Conference on Neural Networks (San Diego), vol. 2, pp. 619—627. IEEE, New York, 1987.

[384] A.R. Webb, Functional approximation by feed-forward networks: A least-squares approach to generalization. R.S.R.E. Memorandum 4453, R.S.R.E., Malvern, U. K., 1991.

[385] A.R., and Webb, D. Lowe. The optimized internal representation of multilayer classifier networks performs nonlinear discriminant analysis. Neural Networks 3(4):367—375, 1990.

[386] A.S., Weigend, D.E. Rumelhart, and B.A. Huberman. Back-propagation, weight-elimination and time series prediction. In D. Touretzky, J. Elman, T. Sejnowski, and G. Hinton, editors, Proceedings of the 1990 Connectionist Models Summer School, pp. 105—116. Morgan Kaufmann, San Mateo, 1991.

[387] A.S., Weigend, D.E. Rumelhart, and B.A. Huberman. Generalization by weight-elimination applied to currency exchange rate prediction. In Proceedings of the International Joint Conference on Neural Networks (Seattle), vol. 1, pp. 837—841. IEEE, New York, 1991.

[388] A.S., Weigend, D.E. Rumelhart, and B.A. Huberman. Generalization by weight-elimination with application to forecasting. In R. Lippmann, J. Moody, and D. Touretzky, editors, Advances in Neural Information Processing Systems (3), pp. 875—882. Morgan Kaufmann, San Mateo, 1991.

[389] S.M., and Weiss, C.A. Kulikowski. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufmann, San Mateo, 1991.

[390] P.J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. dissertation, Harvard University, Cambridge, 1974(published as The Roots of Backpropagation, Wiley, 1994).

[391] P.J. Werbos, Backpropagation: Past and future. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), vol. 1, pp. 343—353. IEEE, New York, 1988.

[392] P.J. Werbos, Backpropagation and neurocontrol: A review and prospectus. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 1, pp. 209—216. IEEE, New York, 1989.

[393] L.F.A., and Wessels, E. Barnard. Avoiding false local minima by proper initialization of connections. IEEE Transactions on Neural Networks 3(6):899—905, 1992.

[394] H. White, Some asymptotic results for learning in single hidden layer feedforward network models. Journal of the American Statistical Association 84:1008—1013, 1989.

[395] H. White, Artificial Neural Networks, Approximation and Learning Theory. Blackwell, Cambridge, MA, 1992.

[396] D. Whitley, The GENITOR algorithm and selection pressure: Why rank-based allocation of reproductive trials is best. In Proceedings Third International Conference on Genetic Algorithms. (Washington D.C.), June 1990.

[397] D., and Whitley, C. Bogart. The evolution of connectivity: Pruning neural networks using genetic algorithms. In Proceedings of the International Joint Conference on Neural Networks (San Diego), vol. 1, p. 134. IEEE, New York, 1990.

[398] D., and Whitley, T. Starkweather. Optimizing small neural networks using a distributed genetic algorithm. In Proceedings of the International Joint Conference on Neural Networks (San Diego), vol. 1, p. 206. IEEE, New York, 1990.

[399] B. Widrow, Generalization and information storage in networks of adaline "Neurons". In M. C. Yovitz, G. T. Jacobi, and G. D. Goldstein, editors, Self Organizing Systems 1962, pp. 435—461. Spartan Books, Washington, ,D.C., 1962.

[400] B. Widrow, Pattern recognition and adaptive control. In 1964 IRE WESCON Convention Record, pp. 269—277. IRE, 1964.

[401] Widrow,B., G.F. Groner, M.J.C.Hu, F.W. Smith, D.F. Specht, and L.R. Talbert. Practical applications for adaptive data-processing systems. In 1963 IRE WESCON Convention Record, pp. 1—14, 1963.

[402] Widrow, B., and M.E. Hoff. Adaptive switching circuits. In 1960 IRE WESCON Convention Record, pp. 96—104. IRE, New York, 1960(reprinted in J.A. Anderson, and E. Rosenfeld, editors Neurocomputing. MIT Press, Cambridge, 1988).

[403] Widrow, B., and M.A. Lehr. 30 years of adaptive neural networks: Perceptron, madaline, and backpropagation. Proceedings of the IEEE 78(9):1415—1442, Sept. 1990.

[404] Widrow, B., and F.W. Smith. Pattern-recognizing control systems. In Computer and Information Sciences (COINS) Symposium Proceedings, pp. 288—316. Spartan Books, Washington D C, 1963.

[405] Widrow, B., and S.D. Stearns. Adaptive Signal Processing. Prentice-Hall, Englewood Cliffs, NJ1985.

[406] Widrow, B., and R. Winter. Neural nets for adaptive filtering and adaptive pattern recognition. Computer, pp. 25—39, March 1988.

[407] A., and Wieland, R. Leighton. Geometric analysis of neural network capabilities. In Proceedings of the IEEE First International Conference on Neural Networks (San Diego), vol. 3, pp. 385—392. IEEE, New York, 1987.

[408] B.S., and Wittner, J.S. Denker. Strategies for teaching layered networks classification tasks. In D. Z. Aanderson, editor, Proceedings of the Conference on Neural Information Processing Systems (Denver 1987), pp. 850—859. American Institute of Physics, New York, 1988.

[409] D.H. Wolpert, Stacked generalization. Neural Networks 5(2): 241—259, 1992.

[410] M. Wynne-Jones, Node splitting: A constructive algorithm for feed-forward neural networks. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems (4), pp. 1072—1079. Morgan Kaufmann, San Mateo, 1992.

[411] K., Yamada, H. Kami, J. Tsukumo, and T. Temma. Handwritten numeral recognition by multi-layered neural network with improved learning algorithm. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), vol. 2, pp. 259—266. IEEE, New York, 1989.

[412] X., Yu, N.K. Loh, and W.C. Miller. Training hard-limiting neurons using backpropagation algorithm by updating steepness factors. In IEEE International Conference on Neural Networks (Orlando), vol. 1, pp. 526—530. IEEE, New York, 1994.

[413] X.-H. Yu, Can backpropagation error surface not have local minima?IEEE Transactions on Neural Networks 3(6):1019—1021, 1992.

[414] Yu, X.-H., and G.-A. Chen. Corrections to "On the local minima free condition of backpropagation learning." IEEE Transactions on Neural Networks 6(5): 1300—1303, 1995.

[415] X.-H., and Yu, G.-A. Chen. Corrections to "On the local minima free condition of backpropagation learning. "IEEE Transactions on Neural Networks 7(1):256—257, 1996.

[416] Yu, Yeong-Ho, and R.F. Simmons. Extra output biased learning. In Proceedings of the International Joint Conference on Neural Networks (San Diego), vol. 2, pp. 161—166. IEEE, New York, 1990.